Frequently Asked Questions

Everything you need to know about blocking AI crawlers and protecting your website content

Getting Started

Q:What is CheckAIBots?

CheckAIBots is a free online tool that helps website owners detect which AI crawlers can access their content. We analyze your robots.txt file and perform actual access tests to check 29+ AI bots including GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended, CCBot, Bytespider, and more.

Q:How do I use CheckAIBots?

Simply enter your website URL in the search box on our homepage. We'll automatically fetch your robots.txt file, analyze it, and show you which AI crawlers are allowed or blocked. You'll get a detailed report with recommendations and ready-to-use blocking configurations.

Q:Is CheckAIBots really free?

Yes! CheckAIBots is 100% free with no hidden costs, no registration required, and no credit card needed. We believe every website owner should have the tools to control AI access to their content.

Q:Do I need to create an account?

No. CheckAIBots works without any signup or account creation. Just enter your URL and get instant results.

AI Crawlers & Detection

Q:Which AI crawlers does CheckAIBots detect?

We detect 29 different AI crawlers including: GPTBot (OpenAI/ChatGPT), ClaudeBot (Anthropic), Google-Extended (Google Bard/Gemini), CCBot (Common Crawl), Bytespider (ByteDance/TikTok), PerplexityBot, OAI-SearchBot, Baiduspider, ChatGLM-Spider, Applebot-Extended, anthropic-ai, Diffbot, FacebookBot, ImagesiftBot, cohere-ai, Meta-ExternalAgent, Omgilibot, YouBot, and many more.

Q:What's the difference between LLM training bots and AI search bots?

LLM training bots (like GPTBot, ClaudeBot, Google-Extended) scrape your content to train AI models. AI search bots (like PerplexityBot, OAI-SearchBot) crawl your site to include it in AI-powered search results. Training bots use your content without attribution, while search bots may drive traffic to your site.

Q:How accurate is the detection?

We use two methods: (1) robots.txt analysis to check your configuration, and (2) actual HTTP requests with real crawler user agents to verify if bots are truly blocked. This dual approach catches configuration errors and identifies crawlers that ignore robots.txt.

Q:What are "dangerous" crawlers?

Dangerous crawlers are aggressive bots known to ignore robots.txt rules, consume excessive bandwidth, or harvest content without permission. Examples include Bytespider (ByteDance), which is notorious for aggressive crawling despite robots.txt blocks. These require server-level blocking.

Blocking AI Crawlers

Q:How do I block AI bots from my website?

There are two main methods: (1) robots.txt file - Add "Disallow" rules for specific user agents. Simple but not foolproof. (2) Server-level blocking - Block crawlers at nginx/Apache/Cloudflare level using User-Agent rules. More reliable. CheckAIBots generates both types of configurations for you.

Q:Will blocking AI bots affect my SEO?

No. AI training bots (GPTBot, ClaudeBot) are completely separate from traditional search engine crawlers (Googlebot, Bingbot). Blocking AI bots will NOT affect your Google rankings, search visibility, or SEO. They serve different purposes and use different user agents.

Q:Should I block all AI crawlers or just some?

It depends on your goals. Block LLM training bots if you want to protect content from unauthorized AI training. Consider allowing AI search bots (PerplexityBot, OAI-SearchBot) if you want visibility in AI search results. Use our selective blocking feature to customize your approach based on your strategy.

Q:What's the difference between robots.txt and server-level blocking?

robots.txt is a voluntary protocol - bots can choose to ignore it. Server-level blocking (nginx/Apache/Cloudflare) actively prevents bots from accessing your site by blocking their requests at the infrastructure level. It's more reliable but requires server configuration access.

Q:Can AI bots bypass robots.txt blocking?

Yes. robots.txt is not legally binding and some crawlers (like Bytespider) are known to ignore it. For critical protection, use server-level blocking via nginx, Apache, Cloudflare WAF, or firewall rules. CheckAIBots provides configuration generators for all these methods.

Technical Questions

Q:What is robots.txt and how does it work?

robots.txt is a text file placed at your website's root (e.g., example.com/robots.txt) that tells crawlers which pages they should or shouldn't access. It uses "User-agent" to specify bots and "Disallow" to specify blocked paths. While most legitimate bots respect it, compliance is voluntary.

Q:How do I implement the generated blocking rules?

For robots.txt: Copy the generated rules and add them to your robots.txt file at your domain root. For nginx/Apache: Add the configuration to your server config file and reload. For Cloudflare: Create WAF rules in your dashboard. We provide step-by-step instructions for each method.

Q:Will blocking AI crawlers reduce my bandwidth costs?

Potentially yes. Some websites report 60-75% reduction in bandwidth usage after blocking AI crawlers, especially large content sites. Use our bandwidth calculator to estimate your potential savings based on your traffic patterns.

Q:How often should I check my AI crawler settings?

Check monthly. New AI crawlers appear regularly as AI companies launch new services. We update our database frequently to include emerging bots. Subscribe to updates or check back periodically to ensure your blocking list is current.

Q:Can I block specific AI companies but allow others?

Yes! Use our selective blocking feature to choose exactly which crawlers to block. For example, you might block GPTBot and ClaudeBot (training) but allow PerplexityBot (search). We generate custom robots.txt rules based on your selection.

Privacy & Data

Q:Do you store the URLs I check?

No. We do not store URLs, create user profiles, or track which websites you check. All analysis is done in real-time and results are temporarily stored in your browser's session storage only. See our Privacy Policy for details.

Q:What data do you collect?

We collect minimal analytics (browser type, pages visited) to improve the service. We do NOT collect: personal information, email addresses, or the URLs you check. We don't use cookies for tracking. See our Privacy Policy for full details.

Q:Is my robots.txt file private?

robots.txt files are publicly accessible by design - anyone can view example.com/robots.txt. CheckAIBots simply fetches this already-public file to analyze it. We don't access any private or non-public parts of your website.

Troubleshooting

Q:Why can't CheckAIBots find my robots.txt file?

Common reasons: (1) Your site doesn't have a robots.txt file (create one at example.com/robots.txt), (2) The file isn't at the root domain, (3) Server is blocking our requests, (4) DNS or hosting issues. Check that example.com/robots.txt loads in your browser.

Q:The tool says a bot is "allowed" but I want it blocked. What do I do?

Use our robots.txt generator to create blocking rules, then add them to your robots.txt file. For aggressive crawlers, use our server config generator to implement nginx/Apache/Cloudflare blocking. We provide copy-paste ready configurations.

Q:I blocked bots in robots.txt but they're still crawling. Why?

Some bots ignore robots.txt (particularly Bytespider and some Chinese crawlers). Use server-level blocking instead. Check our "Actual Access Testing" feature to verify if bots are truly blocked, and use the generated nginx/Apache configs for enforcement.

Q:Can I test if my blocking actually works?

Yes! Use our "Verify Crawler Access" feature which sends actual HTTP requests with real crawler user agents to test if they're blocked. This helps you verify that your robots.txt or server configs are working correctly.

Still Have Questions?

Try CheckAIBots now to see which AI crawlers can access your website

Check Your Website Now