Generative Artificial Intelligence (genAI) and your rights as an academic author

Guidance for authors on publisher deals with genAI companies.

On

In brief - your rights as an author

If you are the author or creator of a literary, dramatic, musical or artistic work that represents your own original intellectual creation then it is likely the work will be protected by copyright. In UK law only a minimal level of creativity is required for a work to meet the requirements of copyright subsistence. First ownership of copyright usually rests with the author, though in some cases - for example works made during the course of employment - ownership may be with the creator’s employer. The owner of copyright controls the economic rights in the work and can, with some exceptions, control how others may reuse it. 

More details can be found on our copyright basics web page.


Generative AI training and copyright works

In the early stages of their development, current generative AI large language models (LLMs) require huge amounts of data to train the software. This data will usually take the form of large quantities of human-created texts, images and other works. Where these works are protected by copyright, reproducing them is a restricted act requiring either suitable permission in terms of a licence from the rights owner, or alternatively a suitable defence such as a legal exception to copyright.

We are seeing numerous deals signed between publishers and companies wishing to create datasets of published works for use in AI training. Most of these are being agreed between news media companies and AI developers as shown on the Press Gazette’s AI deals and lawsuits tracker, however there are several deals either already agreed or under negotiation between academic publishers and AI firms, as shown for example on the list maintained by Ithaka S+R.

The rights to use works in AI training may not have been negotiated between authors and publishers at the time the academic writing was published, particularly for older works. Some publishers are therefore contacting authors asking them to sign opt-in deals to allow use of their existing works for this purpose, so that publishers can include the content in future licensing deals with AI developers.


Publisher deals and author contract changes

The topic of LLMs and generative AI training is one that is causing considerable discussion. This is both around the use of copyright works in training, the quality of some AI models and their outputs, and concerns around potential future economic and environmental impacts, as identified in the Library AI Statement.

In line with the University of Sheffield’s Intellectual Property policy, copyright in most longform academic outputs will be owned and controlled by the authors. This means if our authors receive communications from publishers offering updated terms to allow their works to be used for AI training, the ultimate decision on whether to sign such updated agreements lies with the authors themselves.

We cannot tell authors what to decide when they receive offers from publishers, as this is a personal choice. What we can do is highlight some points authors may wish to check for in the proposed terms they are asked to agree to. 

Things to consider when assessing publisher AI licensing deals include:

  • What rights exactly are being licensed, for how long, and who are they being licensed to (one company or more than one)?
  • What effect does the opt-in have on your other economic rights as an author under copyright, and your ability to enforce them (e.g. rights around the creation of adaptations or derivative works, such as translations or updated editions)?
  • If the proposal includes a royalty payment or fee, is this a per-deal amount that may accrue through multiple future deals, or a one-off single payment? Can you negotiate a better share of revenue? What share does the publisher retain?
  • Is any payment shared between co-authors, and if so on what terms?
  • Does your work include third-party works? What effect does the licence have on these? Will this content be excluded, and what effect would that have on the quality and accuracy of any future reproduction of your work by an AI model?
  • Can you terminate the agreement if you change your mind, or object to the use the AI company makes in future, or is it binding and perpetual once signed? Note that once your work has been used in a model, it may be impossible to remove your work's effect and content from within the trained model weights.
  • Has the publisher identified the company (or companies) they intend signing deals with?

Summary and further information

There are likely to be more deals signed between publishers and AI companies, though it is looking increasingly likely that in some territories, for example the USA, it will be considered fair use to use lawfully-acquired copyright works for genAI training. For big tech companies that can afford the scanning infrastructure this will mean they only need to buy one print copy of a book and digitise it. In addition, companies are continuing to explore whether synthetically created datasets can act as substitutes for original human-authored content. 

These factors may make some authors feel pressured to sign publisher AI training agreements, particularly where any financial remuneration is offered, as they may feel if they do not agree to include their works then the opportunity may pass. There is no right or wrong answer for authors faced with this choice, however we encourage our authors to be aware of the above considerations and to ensure their choice is an informed one that meets their needs, and respects both their rights and the integrity of their work.

If you are a University of Sheffield author and want to discuss this or any other copyright question further, please contact us.