Tutorial

Robots.txt Guide: Block AI Crawlers Without Hurting SEO (2025)

9 min read

Robots.txt Guide: Block AI Crawlers Without Hurting SEO

The #1 question we hear: "If I block AI bots in robots.txt, will it hurt my Google rankings?"

Short answer: No. AI crawlers like GPTBot and ClaudeBot are completely separate from search engine crawlers like Googlebot. You can block every AI crawler and your SEO will remain 100% unaffected.

This guide shows you exactly how to configure robots.txt to block AI training bots while keeping search engines happy — just like the New York Times, Reuters, and Wall Street Journal do.

Understanding robots.txt Basics

What Is robots.txt?

The robots.txt file is a text file that tells web crawlers which parts of your website they can access. It's placed in your website's root directory at:

https://yoursite.com/robots.txt

How robots.txt Works

  1. Crawler visits your site
  2. First checks https://yoursite.com/robots.txt
  3. Reads the rules for its specific user agent
  4. Follows the rules (if it's compliant)

Example robots.txt:

User-agent: *
Disallow: /admin/
Disallow: /private/

User-agent: GPTBot
Disallow: /

Translation:

  • All crawlers: Don't access /admin/ or /private/
  • GPTBot specifically: Don't access anything (/ = everything)

AI Crawlers vs Search Engine Crawlers

This is crucial to understand:

Search Engine Crawlers (Keep These)

Bot Name Company Purpose
Googlebot Google Index for Google Search
Bingbot Microsoft Index for Bing Search
Slurp Yahoo Index for Yahoo Search
DuckDuckBot DuckDuckGo Index for DDG Search
Baiduspider Baidu Index for Baidu Search (China)
YandexBot Yandex Index for Yandex Search (Russia)

Why keep them: They drive organic search traffic to your site.

AI Training Crawlers (Block These)

Bot Name Company Purpose
GPTBot OpenAI Train ChatGPT
ClaudeBot Anthropic Train Claude
Google-Extended Google Train Bard/Gemini (NOT search)
CCBot Common Crawl Create AI training datasets
Bytespider ByteDance Train TikTok AI
anthropic-ai Anthropic Additional Anthropic crawler
cohere-ai Cohere Train Cohere models

Why block them: They provide zero traffic or SEO benefit.

The Key Difference

  • Search engines: Index your content → Show it in search results → Send you traffic
  • AI crawlers: Scrape your content → Train AI models → Users never visit your site

Blocking AI crawlers = No SEO impact whatsoever.


How Major Publishers Block AI (Examples)

Let's look at how professional publishers configure their robots.txt files.

Example 1: New York Times

Visit: https://www.nytimes.com/robots.txt

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: ClaudeBot
Disallow: /

# But they still allow:
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

Result: AI bots blocked, search engines allowed. NYT's SEO remains strong.

Example 2: Reuters

Visit: https://www.reuters.com/robots.txt

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

# Search engines still work:
User-agent: *
Disallow: /pf/
Disallow: /arc/

Result: Comprehensive AI blocking with perfect SEO.

Example 3: Wall Street Journal

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# Allows all search engines
User-agent: *
Allow: /

Pattern: All major publishers block AI crawlers while maintaining excellent search engine access.


Step-by-Step: Configure Your robots.txt

Step 1: Locate Your robots.txt

Your file should be at: https://yoursite.com/robots.txt

If it doesn't exist, create it in your website's root directory:

For most hosting:

/public_html/robots.txt
/var/www/html/robots.txt
/home/user/public_html/robots.txt

For Next.js (like this site):

/public/robots.txt

For WordPress:

/public_html/robots.txt

Step 2: Check Current Configuration

Visit your current robots.txt file in a browser to see what's already there.

Step 3: Add AI Crawler Blocking

Option A: Block Major AI Bots (Recommended)

Add this to your robots.txt:

# Block OpenAI GPTBot
User-agent: GPTBot
Disallow: /

# Block ChatGPT user browsing
User-agent: ChatGPT-User
Disallow: /

# Block Anthropic ClaudeBot
User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

# Block Google Bard/Gemini training (NOT search)
User-agent: Google-Extended
Disallow: /

# Block Common Crawl
User-agent: CCBot
Disallow: /

# Block ByteDance/TikTok
User-agent: Bytespider
Disallow: /

# Block Perplexity AI
User-agent: PerplexityBot
Disallow: /

# Block other AI bots
User-agent: cohere-ai
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

Option B: Block All 29 AI Crawlers

Use our robots.txt generator:

👉 Generate Complete robots.txt →

Step 4: Explicitly Allow Search Engines (Optional)

If you want to be extra clear:

# Explicitly allow Google
User-agent: Googlebot
Allow: /

# Explicitly allow Bing
User-agent: Bingbot
Allow: /

# Allow other search engines by default
User-agent: *
Disallow: /admin/
Disallow: /private/

Step 5: Save and Upload

Save the file and upload it to your website root.

Step 6: Verify It Works

Visit: https://yoursite.com/robots.txt

You should see your new configuration.


Important: What robots.txt CAN and CANNOT Do

✅ What robots.txt CAN Do:

  • Block compliant AI crawlers (GPTBot, ClaudeBot, Google-Extended)
  • Block compliant search engines (if you want)
  • Reduce bandwidth from respectful bots
  • Provide legal evidence of crawling restrictions

❌ What robots.txt CANNOT Do:

  • Block Bytespider (it ignores robots.txt)
  • Block 360Spider (often ignores rules)
  • Block malicious scrapers
  • Physically prevent access (it's just a suggestion)

For non-compliant bots, you need server-level blocking:


Common Mistakes to Avoid

❌ Mistake #1: Blocking "User-agent: *"

Wrong:

User-agent: *
Disallow: /

This blocks everything, including Google and Bing. Your SEO will tank.

Correct:

User-agent: GPTBot
Disallow: /

User-agent: *
Allow: /

❌ Mistake #2: Typos in User Agent Names

Wrong:

User-agent: GPTbot    # lowercase 'b'
Disallow: /

Correct:

User-agent: GPTBot    # capital 'B'
Disallow: /

User agent names are case-sensitive!

❌ Mistake #3: Blocking Google-Extended AND Googlebot

Wrong (if you want SEO):

User-agent: Googlebot
Disallow: /

User-agent: Google-Extended
Disallow: /

This blocks Google Search completely.

Correct:

# Block Google AI training
User-agent: Google-Extended
Disallow: /

# Keep Google Search
User-agent: Googlebot
Allow: /

❌ Mistake #4: Not Testing

Always verify your robots.txt works:

  1. Visit https://yoursite.com/robots.txt
  2. Use Google Search Console robots.txt tester
  3. Use CheckAIBots to verify AI bot blocking

Advanced Configuration

Selective Blocking by Directory

Block AI from specific sections only:

# Block AI from blog only
User-agent: GPTBot
Disallow: /blog/

User-agent: ClaudeBot
Disallow: /blog/

# Allow everywhere else
User-agent: *
Allow: /

Allow Some AI Bots, Block Others

# Block training bots
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# But allow AI search bots (they bring traffic)
User-agent: PerplexityBot
Allow: /

User-agent: YouBot
Allow: /

Time-Based Testing

Want to test the impact? Block temporarily:

  1. Add AI bot blocking to robots.txt
  2. Wait 2-4 weeks
  3. Check bandwidth savings
  4. Keep or remove blocks based on results

How to Verify You Didn't Break SEO

Check #1: Google Search Console

  1. Go to Google Search Console
  2. Navigate to Settings > robots.txt Tester
  3. Enter your URL
  4. Select "Googlebot" as the user agent
  5. Click "Test"

Expected result: "Allowed"

If it says "Blocked," you have an error in your robots.txt.

Check #2: Monitor Search Traffic

Use Google Analytics:

  1. Go to Acquisition > All Traffic > Channels
  2. Check "Organic Search" traffic
  3. Compare before/after adding AI bot blocking

Expected result: No change or slight increase (due to better performance)

Check #3: Use CheckAIBots

Our tool tests both:

  • Whether AI bots are blocked ✅
  • Whether search engines can still access your site ✅

👉 Verify Your Configuration →


Selective Blocking Strategy

Not all AI bots are equal. Here's our recommended approach:

Always Block (No Value)

GPTBot: Trains ChatGPT, no attribution
ClaudeBot: Trains Claude, no attribution
Bytespider: Ignores robots.txt anyway (need server blocking)
CCBot: Creates training datasets, no benefit
360Spider: Often ignores rules
anthropic-ai: Additional Anthropic training
FacebookBot: Trains Meta AI, no benefit

Consider Allowing (Potential Traffic)

⚠️ PerplexityBot: Powers Perplexity AI search with attribution
⚠️ OAI-SearchBot: ChatGPT search with source links
⚠️ YouBot: You.com AI search platform

Allow (Good for SEO)

Googlebot: Critical for Google Search
Bingbot: Important for Bing Search
DuckDuckBot: DuckDuckGo search traffic
All other search engines


Real-World Impact: Does This Actually Work?

Case Study: Tech Blog

Before (allowing all bots):

  • Monthly visitors: 50,000
  • Bandwidth: 320GB
  • AI bot bandwidth: 240GB (75%)
  • Google search traffic: 35,000/month

After (blocking AI training bots):

  • Monthly visitors: 50,000 (unchanged)
  • Bandwidth: 100GB (69% reduction)
  • AI bot bandwidth: 20GB (compliant bots only)
  • Google search traffic: 36,000/month (slightly up!)

SEO impact: None negative, slight improvement due to better site performance.

Case Study: E-Commerce Site

Before:

  • Organic search ranking: Average position 8.5
  • Monthly search traffic: 100,000

After blocking all AI training bots:

  • Organic search ranking: Average position 8.3 (improved)
  • Monthly search traffic: 102,000 (improved)

Why improvement? Better site performance → better user experience → better rankings.


Complete robots.txt Template

Here's our recommended configuration:

# robots.txt for blocking AI crawlers while maintaining SEO
# Generated by CheckAIBots.com

# Allow all search engines by default
User-agent: *
Allow: /

# Block OpenAI GPTBot (ChatGPT training)
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

# Block Anthropic Claude training
User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: anthropic-ai
Disallow: /

# Block Google AI training (NOT Google Search)
User-agent: Google-Extended
Disallow: /

# Block Common Crawl
User-agent: CCBot
Disallow: /

# Block ByteDance/TikTok (NOTE: Often ignores this)
User-agent: Bytespider
Disallow: /

# Block Meta/Facebook AI
User-agent: FacebookBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

# Block Apple Intelligence training
User-agent: Applebot-Extended
Disallow: /

# Block other AI training bots
User-agent: cohere-ai
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

# Chinese AI crawlers
User-agent: 360Spider
Disallow: /

User-agent: ChatGLM-Spider
Disallow: /

User-agent: PetalBot
Disallow: /

# Explicitly allow search engines (redundant but clear)
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Slurp
Allow: /

User-agent: DuckDuckBot
Allow: /

# Sitemap (optional but recommended)
Sitemap: https://yoursite.com/sitemap.xml

To use: Copy this, replace yoursite.com with your domain, save as robots.txt in your website root.


Frequently Asked Questions

Q: Will this hurt my Google rankings?

A: No. Blocking AI training bots (GPTBot, ClaudeBot) has zero impact on search engine crawlers (Googlebot, Bingbot). These are completely separate systems.

Q: Do I need robots.txt AND server-level blocking?

A: For compliant bots (GPTBot, ClaudeBot), robots.txt works fine. For non-compliant bots (Bytespider), you need server-level blocking. We recommend both for defense in depth.

Q: Can I block AI bots but allow AI search bots?

A: Yes! Block GPTBot and ClaudeBot (training), but allow PerplexityBot and OAI-SearchBot (search with attribution).

Q: How often should I update robots.txt?

A: Review monthly. New AI bots emerge regularly. Subscribe to CheckAIBots updates to stay informed.

Q: What if I accidentally block Googlebot?

A: Your search rankings will drop. Use Google Search Console's robots.txt tester to verify Googlebot can access your site before deploying changes.


Conclusion

Blocking AI crawlers in robots.txt is safe, effective, and has zero negative SEO impact when done correctly.

Key takeaways:

  1. ✅ AI training bots ≠ Search engine bots
  2. ✅ Block GPTBot, ClaudeBot, CCBot without fear
  3. ✅ Always keep Googlebot and Bingbot allowed
  4. ✅ Test your configuration before deploying
  5. ✅ Use server-level blocking for non-compliant bots

Ready to protect your content?

👉 Generate Your Custom robots.txt →


Last updated: January 30, 2025

Related Articles:

Ready to Check Your Website?

Use CheckAIBots to instantly discover which AI crawlers can access your website and get actionable blocking recommendations

Free AI Crawler Check