ARTificial Intelligence | The Author's Guild vs. OpenAI

#Issue7Team

Apr 22

Like it or not, machine learning models—or A.I.—are quickly seeping into every technological aspect of our lives. And along with that ubiquitousness comes a wave of hype and panic; if it isn’t being positioned to achieve sentience and take over the planet, it’s in headlines everywhere as replacing human workers at every level of industry—and in creative fields in particular. But beyond the hype—the maybes and what-ifs and possible far futures—the fact remains that machine learning models are being used in a growing number of sectors; not least among them publishing.

This series dials back the layers of hype and fear to look not at how important A.I. may be in the far future, but how it’s being used right now in ways that directly impact the publishing industry. Inevitably, some of the talk about A.I. will end up being nothing more than hype, as we’ve seen from a cycle of tech bubbles in the 2020s. But in the meantime, its capabilities right now are impacting the publishing landscape in ways that are unlikely to go away anytime soon.

Since generative A.I. hit the public consciousness, it’s been plagued with concerns of plagiarism and copyright. And recently, those have built to a head. The current class-action lawsuit against OpenAI hinges on a few core grievances: one, that ChatGPT (the large language model owned by OpenAI) was allegedly trained on copyrighted data scrubbed from book pirating websites; two, that it is as a result of this alleged theft capable of producing derivative and copyright-infringing text that poses a real financial threat to the authors its data was trained on; and three, that without these copyrighted works OpenAI would have “no commercial product” capable of posing said financial threat to authors and therefore that it is not only profiting off of the alleged theft but also denying authors the “opportunity to license” said works for the purposes of A.I. training in order to be paid for what they see as a significant role in ChatGPT’s current capabilities.

Some elements of the lawsuit, like the actual, material capabilities of A.I. to compose coherent long-form works that threaten the livelihood of authors, while interesting, will not be the focus of this piece. Instead, we’re going to key in on a single aspect of this lawsuit: the possible ramifications of suing for the A.I. creation of derivative works.

The difficulties in the claim that ChatGPT is infringing on copyright by producing derivative works is that, technically speaking, its creation of derivative works may not be for commercial purposes. Yes, OpenAI is itself a commercial enterprise, but the derivative works described in the lawsuit (namely the creation of outlines for the “next purported installment[s]” of copyrighted works) are created through user prompting and not distributed commercially nor created with commercial intent. In recent legal history, a line has been drawn between derivative works created for commercial and non-commercial purposes, with non-commercial and sufficiently transformative derivative works typically allowed under fair use so long as their creator doesn’t claim ownership of the copyrighted elements; this is the legal gray area that has allowed fanfiction hosting websites to exist without fear of legal takedown notices in the last decade or so.

Although ChatGPT’s commercial “motive” may be harder to make clear given its nature as a tool rather than a legal agent, a verdict agreeing with the plaintiffs that its creation of derivative works infringes on copyright could potentially have lasting legal repercussions in the online fandom world. Other claims made in the lawsuit, such as the claim that ChatGPT’s ability to “mimic” the plaintiff’s work harms their market, are exceptionally broad and could be seen as an attempt to copyright an author’s style, rather than the content of their work. In any case, it’s clear that the verdict of this lawsuit will likely have repercussions beyond just the curtailing of large language model datasets. The ask being made by the plaintiffs that OpenAI license their work is reasonable and a good model to use moving forward if generative text models continue to require data from published books. But we have to wonder whether the potential benefits are worth the other precedents that this case might set.

At the time of the writing of this article, the case has not yet been decided; the verdict will likely depend on whether definitive evidence can be produced during discovery that OpenAI used copyrighted materials to train ChatGPT.

researchJocelyn Disselkoen

Jocelyn Disselkoen

ARTificial Intelligence | The Author's Guild vs. OpenAI

ARTificial Intelligence | Descriptivism and A.I. Writing Assistants

ARTificial Intelligence | The Black Box Problem and A.I. Hiring Software