Global news platform block GPTBot web crawler from accessing content

none@none.com (Web Desk) — Fri, 25 Aug 2023 15:13:57 +0500

Global news outlets including The New York Times, CNN, Reuters and the Australian Broadcasting Corporation (ABC) have blocked the tool GPTBot from web crawlers to limit artificial intelligence accessing their original content, reported The Guardian.

The New York Times first blocked the GPTBot on its website. Subsequently, other major news websites, including CNN, Reuters, the Chicago Tribune, ABC and Australian Community Media (ACM) brands such as the Canberra Times and the Newcastle Herald, also disallowed the web crawler.

OpenAI said in a blog post: “Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” It has also given instructions on how to disallow the crawler.

CNN confirmed to The Guardian Australia that it has blocked GPTBot across its titles, but the company did not comment on whether the brand plans to take further action about the use of its content in AI systems.

A Reuters spokesperson said it regularly reviews its robots.txt and site terms and conditions. “Because intellectual property is the lifeblood of our business, it is imperative that we protect the copyright of our content,” she said.

The New York Times’ terms of service were recently updated to make the prohibition against “the scraping of our content for AI training and development … even more clear,” according to a spokesperson.

As of August 3, its website rules explicitly prohibit the publisher’s content from being used for “the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system” without consent.

International news platforms were faced with decisions about whether to use AI as part of news gathering and also how to deal with their content potentially being sucked into training pools by companies developing AI systems.

Research from OriginalityAI, a company that checks for the presence of AI content, shared this week found that major websites including Amazon and Shutterstock had also blocked GPTBot.

The Guardian’s robot.txt file does not disallow GPTBot.

Aaj TV English News - Technology

Global news platform block GPTBot web crawler from accessing content