Libraries, Licenses, Limitations: An Empirical Insight into the Contractual Conditions Regulating Text and Data Mining for Research

TDM_banner

This blog post offers a preview of an as-yet unpublished study conducted by Digital Republic in 2023 as part of the Knowledge Rights 21 (KR21) programme. The final, comprehensive report of the study, authored by Ana Lazarova, is currently pending publication.

Exploring the Extent of Contractual Override of Core Library Missions

The study in question aims to shed light on how contractual terms affect the ability of libraries to fulfil their public missions. It utilises a qualitative and quantitative approach to evaluate licensing conditions governing the use of copyrighted content by institutional users such as public and academic libraries. Between February and August 2023 Digital Republic analysed 100 contracts between information vendors, including publishers and scientific databases, on the one hand, and libraries (in the case of academic libraries, often acting through their parent educational or research institutions, or information consortia) on the other. The report draws on a dataset of licensing agreements sourced from libraries within the KR21 and IFLA networks and information consortia across 14 Council of Europe countries, as well as publicly available publisher terms. While not exhaustive, the sample provides indicative insights into trends in contractual terms and their implications for library operations.

The study systematically organises its findings into categories that reflect key library activities, including personal use, teaching, research, accessibility for persons with disabilities, archiving, and inter-library document supply, as well as broader contractual conditions such as governing law, time limitations, and user restrictions. The results are presented in tabular form, with the potential restrictiveness of contractual provisions evaluated against a common benchmark – mostly EU copyright exceptions – for each category, supported by quotes from the studied contracts. To ensure anonymity and confidentiality, identifying details of institutions and publishers in the contractual provisions were replaced with standardised terms.

This post focuses on a specific—and highly topical—aspect of the study: the data-driven findings regarding contractual conditions for text and data mining (TDM), a critical area for modern research and thanks to the rise of AI, a policy hot topic.

Libraries as Beneficiaries of the EU Text and Data Mining Exceptions

Text and data mining (TDM) is defined by EU law as ‘any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations’ (see Article 2, para 2 of Directive 2019/790).[1]

In 2019, recognising the need for a more consistent approach regarding the use of technologies enabling ‘the automated computational analysis of information in digital form, such as text, sounds, images or data’ vis-à-vis copyrighted content, the EU legislator introduced two new copyright exceptions for TDM. Article 3 of the Directive provides for a mandatory exception allowing research organisations and cultural heritage institutions to make reproductions and extractions, in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access.[2] The exception under Article 3 cannot be overridden by contract or by technical protection measures (TPMs). Article 2 of the Directive nonetheless defines research organisations narrowly in a way that could be read almost as a ‘non-commercial purposes’ clause.

Additionally, Article 4 of the Directive provides an exception concerning both commercial and non-commercial uses by any user, however the exception can be unilaterally overridden by an express reservation of rights by the rightsholder. Under Recital 18 of the Directive, the rightsholders should reserve the rights to make reproductions and extractions for text and data mining ‘in an appropriate manner’. Therefore, Article 4 is subject to contracting-out by rightsholders, in practice allowing publishers’ to prohibit the data extraction or any other TDM activities in the licensing terms and conditions. Similarly, the application of this second exception does not prevent override by TPMs.

It should be acknowledged that due to its limited scope, the study does not include a detailed consideration of the requirement of ‘lawful access’ as per Article 3 of Directive 2019/790. However, it should be taken into account, that the concept of ‘lawful use’[3] and ‘lawful source’[4] in the EU acquis is a complicated one. It requires, in order for the use under an exception to be lawful, that the subject matter was made available with the consent of the rightsholder.

How Licensing Contracts Address TDM for the Purpose of Scientific Research?

For the purpose of the study and given the special quality of beneficiaries, the standard of Article 3 has been used as a benchmark to assess the restrictiveness of the studied contracts. This logical benchmark has resulted in the following specificities in the evaluation of the contractual provisions studied:

Clauses Evaluated as Permissive:

The use of the EU Research TDM exception as benchmark means, first and foremost, that contractual clauses limiting the authorised use to non-commercial one[5] or to use for the purpose of scientific research only, are considered permissive as per the evaluation.

Clauses Evaluated as Prohibitive:

In terms of prohibitive clauses, the study also considers any prohibitions on the use of robots, spiders, crawlers or other automated downloading programmes, or on the continuous and/or automatic search or index of the licensed materials/database, etc. as limitations of library activities.

Clauses Evaluated as Limiting:

Some of the analysed contracts authorise institutional users to perform TDM through a specific service provided by the Licensor. In such cases, where a particular form of automated analysis is explicitly envisaged in the contract, this has been evaluated as offering only limited authorisation for TDM purposes. It should be noted, however, that the use of a SaaS solution or the Licensor’s operator to provide the user with specific information or trends does not constitute a genuine TDM opportunity for the Licensee. This is because the Licensee does not gain access to the raw data required for independent mining.

In addition, under some of the contractual clauses studied, TDM could be performed only on specific content. For example, as per one of the contracts, the Licensee is allowed to:

‘use [Service] for certain types of text and data analysis specified in the Terms and Conditions of Use for the [Service]. [Service] includes selected content items portions of which are contained in the [database].’

Lastly, according to some of the contracts studied, TDM was subject to special access to be granted by the Licensor.

While all these instances may be seen as limiting for the user, they could also be interpreted as effectively prohibitive of TDM activities.

Collectively, these contractual clauses paint the following landscape:

The results concerning text and data mining show that 9 out of 100 contracts allow TDM (Yes). It is worth noting that 31 out of 100 contracts explicitly or implicitly prohibit TDM (No); 11 out of 100 contracts provide for limited opportunities for TDM (Limited) and 6 out of 100 contracts have unclear provisions regarding TDM (Unclear). While 41 out of 100 contracts have no mention on TDM (Silent), 11 of those are combined with an express general contractual override provision, prohibiting uses that are not expressly authorised by the contract (Silent – general override).

As a result of the study, a clear trend can be observed for academic publishers to deny/block the opportunities for automated data analysis that libraries should have been able to enjoy under Article 3 since, at the latest, 2021.

A table containing the respective provisions can be found here:

Interactive table of pseudonymised clauses on TDM by libraries and research institutions

[1] Directive (EU) 2019/790 of the European Parliament and of the Council on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC [2019] OJ L130/19.

[2] The notion of lawful access for the purposes of arts. 3 and 4 of Directive 2019/790 is yet to be interpreted by the judiciary, but unlike the requirements of art. 6, it covers not only works in permanent collections, but licensed works as well.

[3] According to Recital 33 of the Directive 2001/29, ‘A use should be considered lawful where it is authorised by the rightholder or not restricted by law.’

[4] The ‘lawful source’ concept was introduced by the CJEU. See Judgement of the Court (Second Chamber) of 26 April 2017 in the case C-527/15, Stichting Brein (Filmspeler) [2017] EU:C:2017:300, where the Court states that the use of hyperlinks to websites — that are freely accessible to the public — on which copyright-protected works have been made available without the consent of the right holders — is unlawful. See also Judgment of the Court (Fourth Chamber) of 10 April 2014 in the case C‐435/12, ACI Adam BV et al. v. Stichting de Thuiskopie, Stichting Onderhandelingen Thuiskopie vergoeding [2014] ECLI:EU:C:2014:254. In § 38 the Court says that ‘national legislation, such as that at issue in the main proceedings, which does not draw a distinction according to whether the source from which a reproduction for private use is made is lawful or unlawful, may infringe certain conditions laid down by Article 5(5) of Directive 2001/29.’

[5] Even though art. 3 does not contain a non-commercial purpose requirement, the definition of ‘research organisations’ in art. 2 basically implies one. According to this definition, ‘research organisation’ means a university, including its libraries, a research institute or any other entity, the primary goal of which is to conduct scientific research or to carry out educational activities involving also the conduct of scientific research […] on a not-for-profit basis […] pursuant to a public interest mission recognised by a Member State […] in such a way that the access to the results generated by such scientific research cannot be enjoyed on a preferential basis by an undertaking that exercises a decisive influence upon such organisation’.

Related Posts

Leave a comment