EU v. USA Approach on Copyright Infringement of AI: Can You Put the Genie Back in the Bottle?

The debate surrounding generative AI is not a recent phenomenon. While AI’s roots trace back to the 1950s, pioneered by Turing and Samuel, today’s explosion in generative AI has given rise to a much wider range of complex issues such as data harvesting, regurgitation, memorization and copyright. With companies harnessing vast datasets to train large language models (LLMs), legal battles in the US now center on the ‘fair use’ defense, with significant ambiguity surrounding the question of whether AI’s reliance on copyrighted material can ever be reined in without stifling innovation.

In the US, the infringement case defenses largely pivot on the doctrine of fair use as established by Section 107 of the Copyright Act. The fair use doctrine allows limited use of copyrighted material without permission for various purposes and Copyright Act outlines factors to determine fair use, including purpose, nature, amount, and market impact. The landmark case, Authors Guild v. Google, Inc. plaintiffs argued that verbatim copies of authors’ works were infringing, yet claims were denied as court held that artificial and nominal numbers of the original work were used in the reproduction. In this case, the defendant’s claim of 16% of original content used from the authors’ work was sufficient enough to meet the definition of a transformative purpose under the fair use doctrine. Authors Guild highlights the inherent ambiguity of the fair use doctrine, which is evaluated on a case-by-case basis rather than by strictly-set standards. This flexibility provides power player AI companies room to maneuver around copyright infringement claims, even as authors contend that such practices devalue their creative output and disrupt traditional revenue models.

Subsequent litigation has continued to test the boundaries of legal boundaries regarding AI. In Kadrey v. Meta Platforms, Inc., plaintiffs sought to expand the legal protection claiming that copyright offers to include not just scanning  publicly available copyrighted works but also made a claim against obtaining authors’ work illegally – extracting it from “shadow libraries.” Here, the issue was twofold: not only did it matter how much material was used, but also whether it was obtained from legally dubious sources. Specifically, the case underscored concerns that companies may be reaping financial benefits from data that was not properly licensed, essentially free-riding on the creative labor of authors.

Another significant issue was brought up in Tremblay v. OpenAI, Inc., which discussed self-conflicting claims made in naming the underlying difference on input vs output. The issue notably being that claims brought rely on the AI output accuracy in the context of copyright infringement, when the information scrapped is rather a discussion of input prompts and data sources. Although the plaintiffs accused OpenAI of and named hallucinated data from its training corpus, the defendant logically argued that since hallucinating is essentially derivative of information in itself, no direct copying had occurred. Claiming on copyright infringement yet relying not copying itself but hallucination widened the debate of input vs output distinction and further complicating claims of copyright infringement that judges would be expected to rule on in the future.

Finally, The New York Times v. Microsoft Corp. brought attention to the issue of AI’s memorization of copyrighted content. Here, the NYT argued that AI’s verbatim recitation of journalistic work not only undermines the original effort of its writers but also risks redirecting audiences away from the source extending the claim on Article I Clause 8, which also leads to the argument of remuneration for author’s work. Seemingly, this triggers disruption to market theory.

In all the above cases, American courts have largely refrained from imposing strict injunctions against AI, creating a legal environment that often favors innovation over rigid copyright enforcement. In contrast, Europe has taken a much more restrictive approach, marked by its strict regulatory framework. The EU’s robust data protection laws–epitomized by the General Data Protection Regulation (GDPR)–empower authorities to act decisively against companies that misuse personal data. For instance, Clearview AI’s aggressive scraping of billions of images led Dutch regulators to impose fines of over 30 million euros, which was later compounded by additional penalties in other EU countries. Similarly, the case against X (formerly Twitter) by the Irish Data Protection Commission underscored how ambiguous user consent practices, such as hiding default “opt-out” options behind convoluted settings, can violate fundamental privacy rights.

The EU’s insistence on clear, explicit consent and the ‘right to be forgotten’ create a legal landscape where data processing for AI training must be justified on legitimate interests to have basis to obtain the training data. The strict provisions of various EU Directives make it near-impossible for companies to scrape large amounts of data, which is necessary to create a training corpus for generative AI in the first place – and thus minimizes AI business traction in the EU compared to the US.  Meanwhile, the US currently grants broad freedom to companies with focus on a definition of fair use and minimal intervention. Ultimately, the “genie” of AI in the US may prove to be too powerful to completely confine, whereas the EU does not allow the “genie” to leave the bottle in the first instance.

While AI companies currently face an unregulated, free market in the US, the fines and legal issues that similar companies in the EU deal with reveal potential challenges these US businesses might face in the future. The fines and legal challenges faced by companies operating in the EU reveal a regulatory challenge to be considered when shaping market’s regulatory future in the USA or taken into considerations by AI players when scaling abroad. While hard policies may not be adopted because of the prominence of the innovation economy and the historical freedom that start-ups have had in the US, some form in regulation will have to be achieved in the future and implications might be alike even at lower scale.

From a venture capitalist’s standpoint, early-stage AI startups already carry a heightened exposure to data and copyright litigation, while rapidly-shifting regulatory requirements make it unclear how hurdling the regulation might become. This risk is magnified if business is also conducted in the EU, given that the GDPR raises the stakes for due diligence and compliance. Private equity investors, who typically focus on AI-layered technologies and established companies, face the added burden of ensuring that established revenue streams remain insulated from privacy fines, reputational damage, and operational disruptions caused by potential technological overrides of regulation or intellectual property disputes. Although these challenges may seem minor because AI currently remains unregulated, the legal risk exposure that these technologies expose investors to should be kept in mind as regulations change in the future and business plans should be drafted with outlook on framework from various authorities such as the National Institute of Standards and Technology AI Risk Management Framework or the Federal Trade Commission Guidelines on AI and Automation.