Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.

You can block access to sites using robots.txt. But as far as I’m concerned, the jury’s still out as to whether that’s actually a net-benefit when compared to just letting them index your content.

(Via @olivierlacan@ruby.social.)

➝ Source: platform.openai.com