CheckAIBots is a free online tool that analyzes your website's robots.txt file and performs actual access testing to determine which AI crawlers (like GPTBot, ClaudeBot, Meta-ExternalAgent, Google-Extended) can access your content. Updated for 2025, it checks 40+ different AI bots and provides detailed reports including robots.txt analysis, real crawler testing with spoofed user agent detection, server config generation, and bandwidth cost calculations.

How do I block AI bots from my website?

You can block AI bots by adding rules to your robots.txt file or using server-level blocking (nginx, Apache, Cloudflare WAF, or firewall rules). CheckAIBots provides one-click generators for all major platforms. For aggressive crawlers like Bytespider that ignore robots.txt, server-level blocking is required.

Which AI crawlers does CheckAIBots detect?

CheckAIBots detects 40+ AI crawlers (2025 updated) including GPTBot (OpenAI), ClaudeBot (Anthropic), Meta-ExternalAgent (Meta/Llama), ChatGPT-User, Google-Extended, CCBot (Common Crawl), Bytespider (ByteDance), PerplexityBot, OAI-SearchBot, Baiduspider, ChatGLM-Spider, DeepSeekBot, AI2Bot, and many more from major AI companies. It covers LLM training bots (80% of traffic), AI search engines (18%), AI assistants, and data collection services.

Will blocking AI bots affect my SEO?

No. Blocking AI training bots (like GPTBot, ClaudeBot) does not affect traditional search engine crawlers (like Googlebot, Bingbot). These are separate crawlers with different user agents and purposes. Your SEO rankings will remain completely unaffected when blocking AI training crawlers.

What is the difference between robots.txt checking and actual access testing?

Robots.txt checking analyzes your robots.txt file to see which bots SHOULD be blocked based on your configuration. Actual access testing sends real HTTP requests with AI crawler user agents to verify if bots are ACTUALLY blocked. This helps detect configuration errors and crawlers that ignore robots.txt rules.

How can I save bandwidth costs by blocking AI crawlers?

AI crawlers can consume significant bandwidth by repeatedly crawling your entire site for training data. CheckAIBots includes a bandwidth cost calculator that estimates your monthly AI bot traffic and potential savings. Some websites report saving 60-75% on CDN costs after blocking AI crawlers, especially for large content sites.

Robots.txt Guide: Block AI Crawlers Without Hurting SEO

The #1 question we hear: "If I block AI bots in robots.txt, will it hurt my Google rankings?"

Short answer: No. AI crawlers like GPTBot and ClaudeBot are completely separate from search engine crawlers like Googlebot. You can block every AI crawler and your SEO will remain 100% unaffected.

This guide shows you exactly how to configure robots.txt to block AI training bots while keeping search engines happy — just like the New York Times, Reuters, and Wall Street Journal do.

Understanding robots.txt Basics

What Is robots.txt?

The robots.txt file is a text file that tells web crawlers which parts of your website they can access. It's placed in your website's root directory at:

https://yoursite.com/robots.txt

How robots.txt Works

Crawler visits your site
First checks https://yoursite.com/robots.txt
Reads the rules for its specific user agent
Follows the rules (if it's compliant)

Example robots.txt:

User-agent: *
Disallow: /admin/
Disallow: /private/

User-agent: GPTBot
Disallow: /

Translation:

All crawlers: Don't access /admin/ or /private/
GPTBot specifically: Don't access anything (/ = everything)

AI Crawlers vs Search Engine Crawlers

This is crucial to understand:

Search Engine Crawlers (Keep These)

Bot Name	Company	Purpose
Googlebot	Google	Index for Google Search
Bingbot	Microsoft	Index for Bing Search
Slurp	Yahoo	Index for Yahoo Search
DuckDuckBot	DuckDuckGo	Index for DDG Search
Baiduspider	Baidu	Index for Baidu Search (China)
YandexBot	Yandex	Index for Yandex Search (Russia)

Why keep them: They drive organic search traffic to your site.

AI Training Crawlers (Block These)

Bot Name	Company	Purpose
GPTBot	OpenAI	Train ChatGPT
ClaudeBot	Anthropic	Train Claude
Google-Extended	Google	Train Bard/Gemini (NOT search)
CCBot	Common Crawl	Create AI training datasets
Bytespider	ByteDance	Train TikTok AI
anthropic-ai	Anthropic	Additional Anthropic crawler
cohere-ai	Cohere	Train Cohere models

Why block them: They provide zero traffic or SEO benefit.

The Key Difference

Search engines: Index your content → Show it in search results → Send you traffic
AI crawlers: Scrape your content → Train AI models → Users never visit your site

Blocking AI crawlers = No SEO impact whatsoever.

How Major Publishers Block AI (Examples)

Let's look at how professional publishers configure their robots.txt files.

Example 1: New York Times

Visit: https://www.nytimes.com/robots.txt

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: ClaudeBot
Disallow: /

# But they still allow:
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Result: AI bots blocked, search engines allowed. NYT's SEO remains strong.

Example 2: Reuters

Visit: https://www.reuters.com/robots.txt

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

# Search engines still work:
User-agent: *
Disallow: /pf/
Disallow: /arc/

Result: Comprehensive AI blocking with perfect SEO.

Example 3: Wall Street Journal

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# Allows all search engines
User-agent: *
Allow: /

Pattern: All major publishers block AI crawlers while maintaining excellent search engine access.

Step-by-Step: Configure Your robots.txt

Step 1: Locate Your robots.txt

Your file should be at: https://yoursite.com/robots.txt

If it doesn't exist, create it in your website's root directory:

For most hosting:

/public_html/robots.txt
/var/www/html/robots.txt
/home/user/public_html/robots.txt

For Next.js (like this site):

/public/robots.txt

For WordPress:

/public_html/robots.txt

Step 2: Check Current Configuration

Visit your current robots.txt file in a browser to see what's already there.

Step 3: Add AI Crawler Blocking

Option A: Block Major AI Bots (Recommended)

Add this to your robots.txt:

# Block OpenAI GPTBot
User-agent: GPTBot
Disallow: /

# Block ChatGPT user browsing
User-agent: ChatGPT-User
Disallow: /

# Block Anthropic ClaudeBot
User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

# Block Google Bard/Gemini training (NOT search)
User-agent: Google-Extended
Disallow: /

# Block Common Crawl
User-agent: CCBot
Disallow: /

# Block ByteDance/TikTok
User-agent: Bytespider
Disallow: /

# Block Perplexity AI
User-agent: PerplexityBot
Disallow: /

# Block other AI bots
User-agent: cohere-ai
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

Option B: Block All AI Crawlers

Use our robots.txt generator:

👉 Generate Complete robots.txt →

Step 4: Explicitly Allow Search Engines (Optional)

If you want to be extra clear:

# Explicitly allow Google
User-agent: Googlebot
Allow: /

# Explicitly allow Bing
User-agent: Bingbot
Allow: /

# Allow other search engines by default
User-agent: *
Disallow: /admin/
Disallow: /private/

Step 5: Save and Upload

Save the file and upload it to your website root.

Step 6: Verify It Works

Visit: https://yoursite.com/robots.txt

You should see your new configuration.

Important: What robots.txt CAN and CANNOT Do

✅ What robots.txt CAN Do:

Block compliant AI crawlers (GPTBot, ClaudeBot, Google-Extended)
Block compliant search engines (if you want)
Reduce bandwidth from respectful bots
Provide legal evidence of crawling restrictions

❌ What robots.txt CANNOT Do:

Block Bytespider (it ignores robots.txt)
Block 360Spider (often ignores rules)
Block malicious scrapers
Physically prevent access (it's just a suggestion)

For non-compliant bots, you need server-level blocking:

Common Mistakes to Avoid

❌ Mistake #1: Blocking "User-agent: *"

Wrong:

User-agent: *
Disallow: /

This blocks everything, including Google and Bing. Your SEO will tank.

Correct:

User-agent: GPTBot
Disallow: /

User-agent: *
Allow: /

❌ Mistake #2: Typos in User Agent Names

Wrong:

User-agent: GPTbot    # lowercase 'b'
Disallow: /

Correct:

User-agent: GPTBot    # capital 'B'
Disallow: /

User agent names are case-sensitive!

❌ Mistake #3: Blocking Google-Extended AND Googlebot

Wrong (if you want SEO):

User-agent: Googlebot
Disallow: /

User-agent: Google-Extended
Disallow: /

This blocks Google Search completely.

Correct:

# Block Google AI training
User-agent: Google-Extended
Disallow: /

# Keep Google Search
User-agent: Googlebot
Allow: /

❌ Mistake #4: Not Testing

Always verify your robots.txt works:

Visit https://yoursite.com/robots.txt
Use Google Search Console robots.txt tester
Use CheckAIBots to verify AI bot blocking

Advanced Configuration

Selective Blocking by Directory

Block AI from specific sections only:

# Block AI from blog only
User-agent: GPTBot
Disallow: /blog/

User-agent: ClaudeBot
Disallow: /blog/

# Allow everywhere else
User-agent: *
Allow: /

Allow Some AI Bots, Block Others

# Block training bots
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# But allow AI search bots (they bring traffic)
User-agent: PerplexityBot
Allow: /

User-agent: YouBot
Allow: /

Time-Based Testing

Want to test the impact? Block temporarily:

Add AI bot blocking to robots.txt
Wait 2-4 weeks
Check bandwidth savings
Keep or remove blocks based on results

How to Verify You Didn't Break SEO

Check #1: Google Search Console

Go to Google Search Console
Navigate to Settings > robots.txt Tester
Enter your URL
Select "Googlebot" as the user agent
Click "Test"

Expected result: "Allowed"

If it says "Blocked," you have an error in your robots.txt.

Check #2: Monitor Search Traffic

Use Google Analytics:

Go to Acquisition > All Traffic > Channels
Check "Organic Search" traffic
Compare before/after adding AI bot blocking

Expected result: No change or slight increase (due to better performance)

Check #3: Use CheckAIBots

Our tool tests both:

Whether AI bots are blocked ✅
Whether search engines can still access your site ✅

👉 Verify Your Configuration →

Selective Blocking Strategy

Not all AI bots are equal. Here's our recommended approach:

Always Block (No Value)

✅ GPTBot: Trains ChatGPT, no attribution
✅ ClaudeBot: Trains Claude, no attribution
✅ Bytespider: Ignores robots.txt anyway (need server blocking)
✅ CCBot: Creates training datasets, no benefit
✅ 360Spider: Often ignores rules
✅ anthropic-ai: Additional Anthropic training
✅ FacebookBot: Trains Meta AI, no benefit

Consider Allowing (Potential Traffic)

⚠️ PerplexityBot: Powers Perplexity AI search with attribution
⚠️ OAI-SearchBot: ChatGPT search with source links
⚠️ YouBot: You.com AI search platform

Allow (Good for SEO)

✅ Googlebot: Critical for Google Search
✅ Bingbot: Important for Bing Search
✅ DuckDuckBot: DuckDuckGo search traffic
✅ All other search engines

Real-World Impact: Does This Actually Work?

Case Study: Tech Blog

Before (allowing all bots):

Monthly visitors: 50,000
Bandwidth: 320GB
AI bot bandwidth: 240GB (75%)
Google search traffic: 35,000/month

After (blocking AI training bots):

Monthly visitors: 50,000 (unchanged)
Bandwidth: 100GB (69% reduction)
AI bot bandwidth: 20GB (compliant bots only)
Google search traffic: 36,000/month (slightly up!)

SEO impact: None negative, slight improvement due to better site performance.

Case Study: E-Commerce Site

Before:

Organic search ranking: Average position 8.5
Monthly search traffic: 100,000

After blocking all AI training bots:

Organic search ranking: Average position 8.3 (improved)
Monthly search traffic: 102,000 (improved)

Why improvement? Better site performance → better user experience → better rankings.

Complete robots.txt Template

Here's our recommended configuration:

# robots.txt for blocking AI crawlers while maintaining SEO
# Generated by CheckAIBots.com

# Allow all search engines by default
User-agent: *
Allow: /

# Block OpenAI GPTBot (ChatGPT training)
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

# Block Anthropic Claude training
User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

# Block Google AI training (NOT Google Search)
User-agent: Google-Extended
Disallow: /

# Block Common Crawl
User-agent: CCBot
Disallow: /

# Block ByteDance/TikTok (NOTE: Often ignores this)
User-agent: Bytespider
Disallow: /

# Block Meta/Facebook AI
User-agent: FacebookBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

# Block Apple Intelligence training
User-agent: Applebot-Extended
Disallow: /

# Block other AI training bots
User-agent: cohere-ai
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

# Chinese AI crawlers
User-agent: 360Spider
Disallow: /

User-agent: ChatGLM-Spider
Disallow: /

User-agent: PetalBot
Disallow: /

# Explicitly allow search engines (redundant but clear)
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Slurp
Allow: /

User-agent: DuckDuckBot
Allow: /

# Sitemap (optional but recommended)
Sitemap: https://yoursite.com/sitemap.xml

To use: Copy this, replace yoursite.com with your domain, save as robots.txt in your website root.

Frequently Asked Questions

Q: Will this hurt my Google rankings?

A: No. Blocking AI training bots (GPTBot, ClaudeBot) has zero impact on search engine crawlers (Googlebot, Bingbot). These are completely separate systems.

Q: Do I need robots.txt AND server-level blocking?

A: For compliant bots (GPTBot, ClaudeBot), robots.txt works fine. For non-compliant bots (Bytespider), you need server-level blocking. We recommend both for defense in depth.

Q: Can I block AI bots but allow AI search bots?

A: Yes! Block GPTBot and ClaudeBot (training), but allow PerplexityBot and OAI-SearchBot (search with attribution).

Q: How often should I update robots.txt?

A: Review monthly. New AI bots emerge regularly. Subscribe to CheckAIBots updates to stay informed.

Q: What if I accidentally block Googlebot?

A: Your search rankings will drop. Use Google Search Console's robots.txt tester to verify Googlebot can access your site before deploying changes.

Conclusion

Blocking AI crawlers in robots.txt is safe, effective, and has zero negative SEO impact when done correctly.

Key takeaways:

✅ AI training bots ≠ Search engine bots
✅ Block GPTBot, ClaudeBot, CCBot without fear
✅ Always keep Googlebot and Bingbot allowed
✅ Test your configuration before deploying
✅ Use server-level blocking for non-compliant bots

Ready to protect your content?

👉 Generate Your Custom robots.txt →

Last updated: January 30, 2025

Related Articles:

Robots.txt Guide: Block AI Crawlers Without Hurting SEO

Understanding robots.txt Basics

What Is robots.txt?

How robots.txt Works

Example robots.txt:

AI Crawlers vs Search Engine Crawlers

Search Engine Crawlers (Keep These)

AI Training Crawlers (Block These)

The Key Difference

How Major Publishers Block AI (Examples)

Example 1: New York Times

Example 2: Reuters

Example 3: Wall Street Journal

Step-by-Step: Configure Your robots.txt

Step 1: Locate Your robots.txt

Step 2: Check Current Configuration

Step 3: Add AI Crawler Blocking

Option A: Block Major AI Bots (Recommended)

Option B: Block All AI Crawlers

Step 4: Explicitly Allow Search Engines (Optional)

Step 5: Save and Upload

Step 6: Verify It Works

Important: What robots.txt CAN and CANNOT Do

✅ What robots.txt CAN Do:

❌ What robots.txt CANNOT Do:

Common Mistakes to Avoid

❌ Mistake #1: Blocking "User-agent: *"

❌ Mistake #2: Typos in User Agent Names

❌ Mistake #3: Blocking Google-Extended AND Googlebot

❌ Mistake #4: Not Testing

Advanced Configuration

Selective Blocking by Directory

Allow Some AI Bots, Block Others

Time-Based Testing

How to Verify You Didn't Break SEO

Check #1: Google Search Console

Check #2: Monitor Search Traffic

Check #3: Use CheckAIBots

Selective Blocking Strategy

Always Block (No Value)

Consider Allowing (Potential Traffic)

Allow (Good for SEO)

Real-World Impact: Does This Actually Work?

Case Study: Tech Blog

Case Study: E-Commerce Site

Complete robots.txt Template

Frequently Asked Questions

Conclusion

Ready to Check Your Website?