Reddit’s legal strike on AI startup Anthropic puts AI training practices on trial

Reddit’s legal strike on AI startup Anthropic puts AI training practices on trial

Reddit app is seen on a smartphone. Photograph: (Illustration by Reuters)

Story highlights

While AI companies have traditionally relied on public internet content, including Reddit, to train their models, Reddit argues that such use must follow clearly defined rules.

Reddit has filed a lawsuit against artificial intelligence (AI) startup Anthropic, accusing it of scraping and using millions of Reddit user comments to train its Claude chatbot without consent or compensation.

The lawsuit, filed on June 4 in California Superior Court in San Francisco, marks a new battleground in the growing legal conflicts between content platforms and AI companies over the use of data for AI training.

AI systems rely heavily on vast amounts of data for training and operation, and this data allows the AI models to become more efficient. So, to create a more sophisticated model that is capable of handling complex tasks, AI companies are racing to get more and more data to train their systems on.

Add WION as a Preferred Source

According to Reddit, Anthropic continued to access Reddit’s platform through automated systems despite claiming to have blocked its bots. The lawsuit alleges the AI firm “intentionally trained on the personal data of Reddit users without ever requesting their consent”, in breach of Reddit's publicly accessible content policy and user agreement.

Anthropic, which is valued at $61.5 billion and backed by Amazon, rejected Reddit's claims. In a statement, the company said, “We disagree with Reddit’s claims and will defend ourselves vigorously.”

Licensing rules and alleged violations

Trending Stories

Reddit claims that Anthropic began accessing Reddit data for model training as early as December 2021. The complaint cites a 2021 research paper co-authored by Anthropic CEO Dario Amodei that highlights the usefulness of Reddit forums, such as those on history, gardening, and relationship advice, as high-quality sources of training data for language models.

While AI companies have traditionally relied on public internet content, including Reddit, to train their models, Reddit argues that such use must follow clearly defined rules. The platform introduced a content policy and technical safeguards to limit unauthorised data scraping in recent years. These measures ensure that deleted posts and sensitive user data are not included in any licensing arrangements.

Unlike other lawsuits against AI companies that hinge on copyright infringement, Reddit’s suit emphasises a breach of contract and unfair business practices. It accuses Anthropic of disregarding Reddit’s user agreement and misrepresenting its practices while marketing itself as a responsible AI player.

An industry-wide change on cards?

Reddit is seeking damages and an injunction to stop Anthropic from using its data unless it complies with the platform’s terms. The company has also requested a jury trial. “We believe in an open internet. That does not mean open for exploitation,” Ben Lee, Reddit’s Chief Legal Officer, said in an interview.

This case is being closely watched as it may set an important precedent. While many AI companies defend data use under the umbrella of “fair use”, Reddit’s case sidesteps copyright law entirely, focusing instead on breach of contract and unfair competition.

As regulators, courts, and content owners scrutinise the explosive growth of AI, the outcome of Reddit’s lawsuit could have ripple effects across the industry. It may determine how digital platforms negotiate licensing, protect user data, and interact with AI companies seeking access to large-scale public datasets.

For Reddit, the lawsuit is about more than data—it’s about control, compensation, and protecting its communities. For Anthropic, it’s a legal challenge that may shape the future of how AI firms train their most advanced models.

Stock impact and licensing deals

The lawsuit’s filing appears to have reassured investors. Reddit’s shares rose over 6 per cent on June 4 following the news, underscoring investor confidence in the platform’s assertiveness in protecting its content and user data.

Reddit, which went public in 2024 and now has a market capitalisation of around $22 billion, has been actively working to monetise its vast repository of user-generated content.

In May, Reddit signed a big licensing deal with OpenAI, allowing the ChatGPT maker to access Reddit content under strict terms. A similar agreement exists with Google too. Both deals are structured to ensure user privacy, respect deletion requests, and provide fair compensation to Reddit.

“Other giants in the AI space understand and respect Reddit’s rules,” the complaint said, adding that companies like OpenAI and Google only used Reddit content after agreeing to the platform’s licensing conditions.

Trending Topics