The 28-year-old founders of TollBit, a New York-based startup that is all of six months old, think we’re living in the “Napster days” of AI. Just like people of a certain generation downloaded digital music, companies are ripping off vast swaths of the internet without paying the rights holders. They want TollBit to be the iTunes of the AI world.
“It’s kind of the Wild West right now,” Olivia Joslin, the company’s co-founder and chief operating officer, told Engadget in an interview. “We want to make it easier for AI companies to pay for the data they need.” Their idea is simple: create a marketplace that connects AI companies that need access to fresh, high-quality data to the publishers who actually spend money creating it.
AI companies have, indeed, only recently started paying for (some of) the data they need from news publishers. OpenAI kicked off an arms race at the end of 2022, but it was only a year ago that the company signed the first of its many licensing deals with the Associated Press. Later that year, OpenAI announced a partnership with German publisher Axel Springer, which operates Business Insider and Politico in the US. Multiple publishers including Vox, the Financial Times, News Corp and TIME, have since signed deals with OpenAI and Google.
But that still leaves countless other publishers and creators out in the cold — without the option to strike this Faustian Bargain even if they want to. This is the “long tail” of publishers that TollBit wants to target.
“Powerful AI models already exist and they have already been trained,” Toshit Panigrahi, TollBit’s co-founder and CEO told Engadget. “And right now, there are thousands of applications just taking these existing models off the shelves. What they need is fresh content. But right now, there’s no infrastructure — neither for them to buy it, nor for content-makers to sell it in a way that is seamless.”
Both Joslin and Panigrahi weren’t particularly knowledgeable about the media industry. But they both knew how online marketplaces and platforms operated – they were colleagues at Toast, a platform that lets restaurants manage billing and reservations. Panigrahi watched both the deals — and the lawsuits — pile up in the AI sector, then called on Joslin.
Their early conversations were about RAG, which stands for Retrieval-Augmented Generation in the AI world. With RAG, AI models first look up information from specific databases (like the scrapable portions of the internet) and use that information to synthesize a response instead of simply relying on training data. Services like ChatGPT don’t know current home prices, or the latest news. Instead, they fetch that data, typically by looking at websites. That absence of fresh data is why AI chatbots are often stumped by queries about breaking news events — if they don’t scrape the latest data, they simply can’t keep up.
“We thought that using content for RAG was something fundamentally different than using it for training,” said Panigrahi.
By some estimations, RAG is the future of search engines. More and more, people are asking questions on the internet and expecting complete answers in return instead of a list of blue links. In just over a year, startups like Perplexity, backed by Jess Bezos and NVIDIA among others, have burst onto the scene with ambitions of taking on Google. Even OpenAI has plans to someday let ChatGPT become your search engine. In response, Google has sprung into action — it now culls relevant information from search results and presents it as a coherent answer at the top of the results page, a feature it calls AI Overviews. (It doesn’t always work well, but is seemingly here to stay).
The rise of RAG-based search engines has publishers shaking in their boots. After all, who would make money if AI reads the internet for us? After Google rolled out AI Overviews earlier this year, at least one report estimated that publishers would lose more than $2 billion in ad revenue because fewer people would have a reason to visit their websites. “AI companies need continuous access to high quality content and data too,” said Joslin, “but if you don’t figure out some economic model here, there will be no incentive for anyone to create content, and that’ll be the end of AI applications too.”
Instead of cutting one-off checks, TollBit’s model aims to compensate publishers on an ongoing basis. Hypothetically, if someone’s content was used in a thousand AI-generated answers, they would get paid a thousand times at a price that they set and which they can change on the fly.
Each time an AI company accesses fresh data from a publisher through TollBit, it can pay a small fee set by the publisher that Panigrahi and Joslin think should be roughly equivalent to whatever a traditional page view would have made the publisher. And the platform can also block AI companies who haven’t signed up from accessing publishers’ data.
So far, the founders claim to have onboarded a hundred publishers and are in pilots with three AI companies since TollBit launched in February. They refused to reveal which publishers or AI companies had signed on so far, citing confidentiality clauses, but did not deny speaking with OpenAI, Anthropic, Google and Meta. So far, they say that no money has changed hands between AI companies and publishers on their platform.
Until that happens, their model is still a giant hypothetical — although one that investors have so far poured $7 million into. TollBit’s investors include Sunflower Capital, Lerer Hippeau, Operator Collective, AIX and Liquid 2 Ventures, and more investors are currently “pounding down their door,” Joslin claimed. In April, TollBit also brought on Campbell Brown as a senior adviser, a former television anchor who previously acted as Meta’s head of news partnerships for the better part of a decade.
In spite of some high-profile lawsuits, AI companies are still scraping the internet for free and largely getting away with it. Why would they have any incentive to actually pay publishers for this data? There are three big reasons, the founders say: more websites are taking steps to prevent their content from being scraped ever since generative AI went mainstream, which means that scraping the web is getting harder and more expensive; no one wants to deal with ongoing copyright lawsuits; and, crucially, being able to easily pay for content on an as-needed basis lets AI companies tap into smaller and more niche publications because it isn’t possible to strike individual licensing deals with every single website. Joslin also pointed out that multiple TollBit investors have also invested in AI companies which they worry might face litigation for using content without permission.
Getting AI companies to pay for content could provide a recurring revenue stream for not just large publishers but to potentially anyone who publishes anything online. Last month, Perplexity — which was accused of illegally scraping content from Forbes, Wired and Condé Nast — launched a Publishers’ Program under which it plans to share a cut of any revenue it earns with publishers if it uses their content to generate answers with AI. The success of the program, however, hinges on how much money Perplexity makes when it introduces ads in the app later this year. Like Tollbit, it's another complete hypothetical.
“Our thesis with TollBit is that if you lose a page view today, you should be compensated for it immediately rather than a few years after when a tech company figures out its ads program,” said Panigrahi about Perplexity’s initiative.
Despite all the existing licensing deals and technical advances, AI-powered chatbots still make for terrible news sources. They still make up facts and confidently conjure up entire links to stories that don’t actually exist. But technology companies are now stuffing AI chatbots in every crevice they can, which means that many people will still get their news from one of these products in the not-so-distant future.
A more cynical take on TollBit’s premise is that the startup is effectively offering hush money to publishers whose work is more likely than not to be sausaged into misinformation. Its founders, naturally, don’t agree with the characterization. “We are careful about the AI partners we onboard,” Panigrahi said. “These companies are very mindful about the quality of input material and correctness of responses. We’re seeing that paying for content – even nominal amounts – creates incentive to respect the raw inputs into their systems instead of treating it as a free, replaceable commodity.”
This article originally appeared on Engadget at https://www.engadget.com/ai/this-startup-wants-to-be-the-itunes-of-ai-content-licensing-162942714.html?src=rss