Publishers warn tech giants not to use their books as AI sources

The biggest British publishing houses have warned dozens of tech companies that they will need to pay to use the content in books, journals, and papers for their artificial intelligence models.

The Publishers Association expressed “deep concern” for its members, including Penguin Random House HarperCollins, and Oxford University Press. It believes that”vast quantities of copyrighted works” were being fed into their generative AI program without authorisation by tech companies.

Google DeepMind was one of 50 recipients who received the letter last week. Also included were Meta, owner of Facebook and OpenAI (the company behind ChatGPT). We have contacted these three companies for a comment.

The letter stated: “Our members, outside of any licensing agreements to the contrary do not authorise or grant permission for any use of their copyrighted works, including large language models and other generative AI products, in relation to training, development, or operation of AI models, without limitation.

The tension between the technology industry and creative industries is increasing over copyright. The AI model training process requires vast amounts of high-quality information, but data owners would prefer to be compensated for their use. Data quality determines output quality, but it is not easy to discover what data is being fed to the technology.

Dan Conway is the chief executive of Publishers Association. He claimed that content used to train AI model was “ripped off globally”. He added that if the tide of content used to train AI models is not checked, it could cause unprecedented damage to creative industries.

The biggest businesses in the world are investing billions into generative AI. Its proponents claim that it will have a similar impact on society and industry as the introduction of the internet. Generative AI Models are able to produce answers to anything, including requests for poetry or scripts for films to check software code.

Content producers and creative industries have launched several major court cases. These are watershed moments for the copyright dispute, and will set the rules in place for years to come.

Authors Guild in the United States has filed a class action suit against OpenAI, which includes the writers John Grisham, Jodi Picoult and others.

Microsoft and OpenAI claim that they used the information for training their AI models, and “free-riding”, while Universal Music sues Anthropic about the use its song lyrics.

The subject is so controversial in the UK that discussions between creative industries and technology firms, , held by the Intellectual Property Office, to come up with a voluntary code, have broken down.

The Department for Science, Innovation and Technology has been tasked to take up this baton and tread the difficult path of setting a copyright framework and coming up with a solution that is acceptable to all parties. Its goal is to create a system that “will help overcome the barriers AI firms and users face today and ensure that there are rights holders protections”.

Conway said that the decision to send a letter to AI companies shows “publishers’ continued frustration over their copyright-protected work being used to develop AI models.”

This is done without their consent. There is no transparency. And there’s no remuneration. He claimed that authors are aware that their work has been used illegally to train the systems, and that they have no control of the replication of the output. “This can be fundamentally wrong, misleading for readers, or just copy-cat materials that then compete with the original works,” he said.

In its letter, The Publishers Association called on the large technology companies to find a solution to ensure “appropriate compensation and appropriate attribution” to authors and publishers. It said that licensing on a voluntary base is the best way to develop AI models in an ethical, legal and sustainable manner.