Google to explore alternatives to robots.txt in wake of generative AI and other emerging technologies

Google is considering new ways of controlling crawling and indexing on the web beyond the existing robots.txt protocol, which has been in use for 30 years. The company believes that it is time for the web and AI communities to explore additional machine-readable means that can offer web publishers more choice and control for emerging AI and research use cases.

To engage with the community, Google is inviting members from the web and AI communities to participate in a public discussion on this topic. The company wants to involve a diverse range of voices, including web publishers, civil society, academia, and various other fields from around the world.

These discussions are planned to take place over the coming months, indicating that any changes to the existing protocol will not be implemented immediately. Google wants to ensure that ample time is given for discussion and collaboration before any decisions are made.

One reason for Google’s exploration of alternative protocols is the recent issue surrounding paywalled content. Open AI disabled the browse with Bing feature in ChatGPT after it was discovered that the AI system was able to access paywalled content without permission from publishers. This incident highlights the need for more effective methods of controlling access to certain types of content.

The current approach of using robots.txt and other forms of structured data to control bot access to websites has become widely adopted. However, as technology and AI continue to advance, it is important to explore new methods and protocols that can provide better control and protect the interests of web publishers.

The specifics of these alternative methods and protocols are still unknown. The purpose of the public discussion initiated by Google is to gather input and ideas from experts, researchers, and industry professionals. By involving a wide range of perspectives, Google aims to develop more comprehensive and effective solutions for controlling crawling and indexing on the web.

This development has implications for both web publishers and users. Web publishers will have more options and control over who can access their content, particularly in cases where paywall restrictions need to be enforced. Users, on the other hand, may experience changes in the way they access and discover content as new protocols are introduced.

In summary, Google is proactively seeking alternative means of controlling crawling and indexing on the web. By engaging with the web and AI communities, the company aims to explore new protocols that can offer web publishers more choice and control. The recent issue of accessing paywalled content without permission emphasizes the need for improved methods. The public discussion will play a crucial role in shaping the future of web crawling and indexing protocols, ensuring a more secure and controlled digital ecosystem for both publishers and users.


There are no comments yet.

Leave a comment