Tsvetkov Fedor

Fedor Tsvetkov

Independent analyst
Amazon is working on a content marketplace where developers can buy data to train their AI models. Microsoft has a similar project. Photo: PJ McDonnell / Shutterstock.com

Amazon is working on a content marketplace where developers can buy data to train their AI models. Microsoft has a similar project. Photo: PJ McDonnell / Shutterstock.com

Two tech giants, Amazon and Microsoft, are working on creating their own marketplaces for AI training content. They could become intermediaries in a marketplace where publishers can legally sell data for training and use by algorithms. Why would they want to do this?

Marketplaces again

According to The Information, Amazon has notified publishing industry executives that it plans to launch a specialized marketplace where media companies can sell their content to AI product developers. The company's cloud division, AWS, sent out presentation slides ahead of a private conference for publishers in New York City, where the content marketplace was listed alongside cloud platform tools including model generator Bedrock and analytics platform Quick Suite. In a comment to TechCrunch, an Amazon spokesperson did not deny the plans to create a content marketplace, but did not share any details.

One of Amazon's biggest competitors, Microsoft, is also working on its own publisher content marketplace (PCM). It began piloting the platform as early as last year and is partnering with The Associated Press, Condé Nast, Vox Media and Yahoo. Instead of disparate and often opaque one-on-one arrangements, Microsoft offers a standardized approach: publishers formulate the terms of use for archives and news feeds in advance, and AI developers choose suitable scenarios and rates.

A lawsuit instead of a click

The interest in developing such marketplaces seems to be the industry's reaction to the fact that the former "social contract" of the open Internet no longer works. It used to be based on a simple exchange logic: publishers put materials into the public domain, search engines and aggregators indexed them, showed links and snippets, and in return brought the audience to the sites - along with advertising displays and subscriptions. But with the proliferation of AI, the user gets a ready-made answer right in the chat and doesn't go to the original source. So "traffic compensation" has become insufficient for many media outlets.

In addition, for a long time, large technology companies have been operating on the logic of "collect data now - deal with rights later. This has led to several legal disputes. In April 2024, a group of eight major U.S. newspapers, including the New York Daily News and Chicago Tribune, filed a lawsuit against OpenAI and Microsoft, alleging misuse of millions of articles for AI training. By October of the same year, Dow Jones (parent company of the Wall Street Journal) and the New York Post had made similar claims against Perplexity AI: in their version, the service engaged in "massive illegal copying." In January 2026, the largest copyright holders in the music industry Universal Music Group, Concord and ABKCO filed a lawsuit against Anthropic, demanding compensation for damages. According to the plaintiffs' estimates, it may exceed $3 billion.

In this context, major players began to look for ways to formalize access to data and reduce legal risks. But the first steps were rather piecemeal arrangements than a sustainable business model. Amazon, for example, struck a deal with The New York Times: according to the WSJ, the publication gets between $20 million and $25 million a year for access to archives as well as Athletic content. OpenAI has signed a five-year agreement with News Corp. worth an estimated $250 million, as well as with Axel Springer and the Financial Times.

But the format of exclusive deals leaves niche publishers out of the game and is most often built on fixed payments, with no connection to exactly how the content is used and how it contributes to the models' responses.

Therefore, AI developers are now discussing a new "contract" - a content marketplace. Under it, copyright holders set the terms and conditions and set flexible tariffs - separately for training, generation, citation or access to updated feeds. Platforms record usage and provide transparent reporting, and payments are tied to actual consumption.

Large players almost always try to enter into direct relations without intermediaries. There is no point in paying a commission if you can negotiate directly, says Denis Smetnev, co-founder of Skyeng and the uForce marketing agency.

The [large] part of the market is small and medium-sized publications. For their sake, large AI providers are unlikely to build a separate infrastructure - it is more convenient to give this work to aggregator marketplaces. As a result, the model will be standard: the top publishers will be involved in direct transactions, while the "long tail" will be through an intermediary.

Denis Smetnev, co-founder of Skyeng and uForce marketing agency

The creation of such marketplaces can help technocompanies to solve, among other things, the problem of "hallucinations" of neural networks. In the corporate sector, on which Amazon's cloud division relies, the price of error is too high, so businesses are willing to pay for generating answers based on verified data. Perplexity AI is already piloting a similar scheme where publishers receive a portion of subscription or advertising revenue if their content was used to respond to a user.

The high demand for quality data for AI is confirmed by Osip Burlov, COO of Youkeeps, a developer of a personalized AI assistant integrated into Copilot and Microsoft 365. He notes: most build agents based on off-the-shelf LLMs and low-code designers.

For a developer, the logic is simple: you are building an agent - for sale or for yourself - and you want to use verified content. You need normal fact-checking, markup, security and, most importantly, legal rights. If there's a marketplace with standards and guarantees, you don't have to "run and search" worrying about possible lawsuits.

Osip Burlov, Chief Operating Officer of Youkeeps

Don't let go of the client

According to venture capitalist Pavel Myasnikov, the creation of such marketplaces is a continuation of the struggle for control of the value chain around AI.

Amazon and Microsoft already control the distribution of cloud computing. Selling data is the next logical step: from microchips and data centers to cloud technologies, then to training and operating models and to AI-based end products.

Pavel Myasnikov, venture capitalist

The Marketplace, Myasnikov continues, will become a customer retention tool within the ecosystem.

In the case of Microsoft, this direction "will definitely become noticeable," continues Osip Burlov. It will allow them to improve their own AI models: Copilot will get more trained agents, and the quality of data will increase.

That's significant, given that earlier this month investors snapped up Microsoft's stock after its quarterly report. Among the reasons: its Azure cloud division had revenue growth that fell short of market expectations. Microsoft is also having trouble promoting Copilot - three years after launch, it has just 15 million users. For a company that has been sued by copyright holders, it is also a tool for managing reputational risks.

For Amazon, this project is no less important. Its quotes in February during the market crash collapsed by almost 15%, they fell for nine consecutive sessions. One of the reasons was that the company in early February announced plans to spend an unprecedented $200 billion in 2026 on data centers, chips and everything related to the development of AI. The market got scared that the bet on hardware and data centers might not play out in the long run.

And the marketplace becomes the very link that connects investments in "hardware" with real revenues from clouds and advertising. One of the beneficiaries of this initiative should be the AWS cloud division, which generated over $128 billion in 2025, 18% of Amazon's total revenue. By turning itself into a licensed data broker, Amazon is creating a one-stop shop for businesses: you rent servers, here you legally buy content to train models, and here you deploy AI agents, a direction that AWS CEO Matt Garman has previously described as a potential multi-billion dollar business.

The initiative could also change Amazon's advertising business, which generated $68 billion in 2025 (plus 21% annualized). Amazon has the Alexa AI assistant as well as the Rufus AI shopping bot. Access to quality content through the marketplace is important for their training so that advice from their own AI assistants is expert, not random. AI is paving the way for a new type of advertising - instead of fighting for a spot on the giveaway page, brands will compete to have AI mention their product as the best choice in the context of a dialog.

"Clean" data vs wild market

The obstacle to the success of such platforms remains the demand: who exactly and in what volumes is willing to pay for what is technically available for free? Until the courts make final rulings on the claims of right holders, the legal status of data collection for model training remains in a gray area.

In this context, the marketplace risks becoming a "compliance showcase" - a place where only the largest corporate customers who are critical to have license agreements on hand for auditing come to, while the bulk of startups will continue to train models on "wild" data.

The Information's sources point to publishers' skepticism: they fear that real buyers for "pure" content will be critically Ma and that the revenue from such sales will not cover the drop in traffic from search engines.

Pavel Myasnikov points out another limitation: the market for data rights is largely determined by legal regulation rather than technical inaccessibility.

As with photo stocks, the rules determine what can be used legally and how. In the long term, many are likely to look for ways to circumvent marketplaces and the very need to buy data.

Pavel Myasnikov, venture capitalist

In addition, the success of content marketplaces will likely depend on whether they can show the value of "clean" data: for example, ensuring that models trained on licensed content hallucinate significantly less often than their "pirated" counterparts.

However, for small players whose content is currently being used without any remuneration, this is a chance to capitalize on the AI boom. If marketplaces can offer a transparent pay-per-use model, it will turn data from a passive asset into a tradable commodity.

At the same time, as Pavel Myasnikov notes, the work of such marketplaces can form new market niches around data turnover. "Such platforms will inevitably open up a market of intermediary agents who will collect and resell data through marketplaces," he explains.

According to Osip Burlov, content marketplaces will be able to partially compensate for the "overflow" of views from websites to AI interfaces - especially for expert and analytical media. "Industry and scientific, legal, financial, analytical publications can benefit if their content starts to be paid for by the fact of use in answers," says Burlov.

This article was AI-translated and verified by a human editor

Share