Text and Data Mining (TDM)

Information for staff and students undertaking text and data mining (TDM) as part of their research.

On

Introduction

Text and data mining is the process of extracting new information from existing files, usually using computational methods. There are several steps to most TDM activity, including data cleansing and indexing, but the first step is to gather the sources to be used. A temporary copy of the files is often made to enable the content to be processed, and this has implications for copyright.


Copyright and TDM

An exception to copyright, introduced into UK law in 2014, allows you to make copies of works to which you have ‘lawful access’ for the purpose of text and data analysis, more commonly referred to as TDM. This must be as part of non-commercial research. The exception does not permit TDM for any directly or indirectly profit-making purpose.


Lawful access

Lawful access to works can take many forms. For example, it may be material we have access to via a Library subscription, or you may own a copy. When using publicly available works for TDM - such as material freely available online - you should ensure the source you use is lawfully available with the rights holder’s consent, and is not an infringing copy posted without authorisation. 

When accessing materials lawfully under contract or licence, content providers can apply reasonable measures to ensure stability and security of their networks, though any terms and conditions that aim to stop you undertaking TDM are unenforceable. 


Transfer of copies

If your research project involves collaboration with researchers at another UK higher education institution, you may be able to rely on our Copyright Licensing Agency licence for limited sharing of materials to be analysed - please contact us if you wish to discuss what is permitted. You should not otherwise share any copies of materials that you have made for the purposes of TDM.

While you cannot share copies of underlying material, the outputs you create from your analysis are under your control. You are usually free to share or publish your new analysis data provided you do not transfer or communicate any copies of the underlying works.


TDM and the public domain

Works in the public domain can usually be freely reused and copied for any purpose, as copyright no longer applies to them. Be aware that if the materials you wish to use for TDM are made available to you under contract terms or licence, and consist of works that are free of copyright and related rights, then the exception for TDM does not apply. Your use must keep to the terms of your contract or licensing agreement in such circumstances.


Process for purchasing datasets for research projects

If you need to acquire a dataset for a project please contact the Library and we will then approach the publishers for a price quote. The cost of purchasing the dataset should then be included in the grant application where possible. Where possible the Library will buy the dataset if the researcher is able to meet the cost from the grant award. Our systems team can then make the dataset available on University networks.

This process ensures that datasets purchased have appropriate licence agreements, are made available for the whole University and that there are not multiple copies being purchased by different research groups.


Ask a question

Email: library@sheffield.ac.uk

Phone: +44 114 222 7200