Skip to Main Content

Text and data mining

Introduction

What is it?

The UK Government defines TDM as “the use of automated analytical techniques to analyse text and data for patterns, trends and other useful information” (HM Government, 2014).

In a research context it can be used to select, search and analyse substantial amounts of scholarly material. This could be used to extract meaning from large data sets or to find patterns, discover relationships and enable semantic analysis in text to understand how the content relates to ideas and needs.

It usually requires copying large amounts of content for analysis and is subject to copyright. It therefore needs either permission from the rightsholders or it should fall within the copyright legislation exception described in the following section.

HM Government. (2014). Exceptions to copyright. Intellectual Property Office.
https://www.gov.uk/guidance/exceptions-to-copyright.

Why is it important?

TDM can help realise the potential of big data to gain new insights, extending the possibilities for research and innovation. This short video from the Royal Society of Chemistry explains the benefits of materials being made available for use in this way: