Robots.txt Guide: Block AI Crawlers Without Hurting SEO (2025)
Robots.txt Guide: Block AI Crawlers Without Hurting SEO
The #1 question we hear: "If I block AI bots in robots.txt, will it hurt my Google rankings?"
Short answer: No. AI crawlers like GPTBot and ClaudeBot are completely separate from search engine crawlers like Googlebot. You can block every AI crawler and your SEO will remain 100% unaffected.
This guide shows you exactly how to configure robots.txt to block AI training bots while keeping search engines happy — just like the New York Times, Reuters, and Wall Street Journal do.
Understanding robots.txt Basics
What Is robots.txt?
The robots.txt file is a text file that tells web crawlers which parts of your website they can access. It's placed in your website's root directory at:
https://yoursite.com/robots.txt
How robots.txt Works
- Crawler visits your site
- First checks
https://yoursite.com/robots.txt - Reads the rules for its specific user agent
- Follows the rules (if it's compliant)
Example robots.txt:
User-agent: *
Disallow: /admin/
Disallow: /private/
User-agent: GPTBot
Disallow: /
Translation:
- All crawlers: Don't access
/admin/or/private/ - GPTBot specifically: Don't access anything (
/= everything)
AI Crawlers vs Search Engine Crawlers
This is crucial to understand:
Search Engine Crawlers (Keep These)
| Bot Name | Company | Purpose |
|---|---|---|
| Googlebot | Index for Google Search | |
| Bingbot | Microsoft | Index for Bing Search |
| Slurp | Yahoo | Index for Yahoo Search |
| DuckDuckBot | DuckDuckGo | Index for DDG Search |
| Baiduspider | Baidu | Index for Baidu Search (China) |
| YandexBot | Yandex | Index for Yandex Search (Russia) |
Why keep them: They drive organic search traffic to your site.
AI Training Crawlers (Block These)
| Bot Name | Company | Purpose |
|---|---|---|
| GPTBot | OpenAI | Train ChatGPT |
| ClaudeBot | Anthropic | Train Claude |
| Google-Extended | Train Bard/Gemini (NOT search) | |
| CCBot | Common Crawl | Create AI training datasets |
| Bytespider | ByteDance | Train TikTok AI |
| anthropic-ai | Anthropic | Additional Anthropic crawler |
| cohere-ai | Cohere | Train Cohere models |
Why block them: They provide zero traffic or SEO benefit.
The Key Difference
- Search engines: Index your content → Show it in search results → Send you traffic
- AI crawlers: Scrape your content → Train AI models → Users never visit your site
Blocking AI crawlers = No SEO impact whatsoever.
How Major Publishers Block AI (Examples)
Let's look at how professional publishers configure their robots.txt files.
Example 1: New York Times
Visit: https://www.nytimes.com/robots.txt
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: ClaudeBot
Disallow: /
# But they still allow:
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
Result: AI bots blocked, search engines allowed. NYT's SEO remains strong.
Example 2: Reuters
Visit: https://www.reuters.com/robots.txt
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: FacebookBot
Disallow: /
# Search engines still work:
User-agent: *
Disallow: /pf/
Disallow: /arc/
Result: Comprehensive AI blocking with perfect SEO.
Example 3: Wall Street Journal
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
# Allows all search engines
User-agent: *
Allow: /
Pattern: All major publishers block AI crawlers while maintaining excellent search engine access.
Step-by-Step: Configure Your robots.txt
Step 1: Locate Your robots.txt
Your file should be at: https://yoursite.com/robots.txt
If it doesn't exist, create it in your website's root directory:
For most hosting:
/public_html/robots.txt
/var/www/html/robots.txt
/home/user/public_html/robots.txt
For Next.js (like this site):
/public/robots.txt
For WordPress:
/public_html/robots.txt
Step 2: Check Current Configuration
Visit your current robots.txt file in a browser to see what's already there.
Step 3: Add AI Crawler Blocking
Option A: Block Major AI Bots (Recommended)
Add this to your robots.txt:
# Block OpenAI GPTBot
User-agent: GPTBot
Disallow: /
# Block ChatGPT user browsing
User-agent: ChatGPT-User
Disallow: /
# Block Anthropic ClaudeBot
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
# Block Google Bard/Gemini training (NOT search)
User-agent: Google-Extended
Disallow: /
# Block Common Crawl
User-agent: CCBot
Disallow: /
# Block ByteDance/TikTok
User-agent: Bytespider
Disallow: /
# Block Perplexity AI
User-agent: PerplexityBot
Disallow: /
# Block other AI bots
User-agent: cohere-ai
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
Option B: Block All 29 AI Crawlers
Use our robots.txt generator:
👉 Generate Complete robots.txt →
Step 4: Explicitly Allow Search Engines (Optional)
If you want to be extra clear:
# Explicitly allow Google
User-agent: Googlebot
Allow: /
# Explicitly allow Bing
User-agent: Bingbot
Allow: /
# Allow other search engines by default
User-agent: *
Disallow: /admin/
Disallow: /private/
Step 5: Save and Upload
Save the file and upload it to your website root.
Step 6: Verify It Works
Visit: https://yoursite.com/robots.txt
You should see your new configuration.
Important: What robots.txt CAN and CANNOT Do
✅ What robots.txt CAN Do:
- Block compliant AI crawlers (GPTBot, ClaudeBot, Google-Extended)
- Block compliant search engines (if you want)
- Reduce bandwidth from respectful bots
- Provide legal evidence of crawling restrictions
❌ What robots.txt CANNOT Do:
- Block Bytespider (it ignores robots.txt)
- Block 360Spider (often ignores rules)
- Block malicious scrapers
- Physically prevent access (it's just a suggestion)
For non-compliant bots, you need server-level blocking:
Common Mistakes to Avoid
❌ Mistake #1: Blocking "User-agent: *"
Wrong:
User-agent: *
Disallow: /
This blocks everything, including Google and Bing. Your SEO will tank.
Correct:
User-agent: GPTBot
Disallow: /
User-agent: *
Allow: /
❌ Mistake #2: Typos in User Agent Names
Wrong:
User-agent: GPTbot # lowercase 'b'
Disallow: /
Correct:
User-agent: GPTBot # capital 'B'
Disallow: /
User agent names are case-sensitive!
❌ Mistake #3: Blocking Google-Extended AND Googlebot
Wrong (if you want SEO):
User-agent: Googlebot
Disallow: /
User-agent: Google-Extended
Disallow: /
This blocks Google Search completely.
Correct:
# Block Google AI training
User-agent: Google-Extended
Disallow: /
# Keep Google Search
User-agent: Googlebot
Allow: /
❌ Mistake #4: Not Testing
Always verify your robots.txt works:
- Visit
https://yoursite.com/robots.txt - Use Google Search Console robots.txt tester
- Use CheckAIBots to verify AI bot blocking
Advanced Configuration
Selective Blocking by Directory
Block AI from specific sections only:
# Block AI from blog only
User-agent: GPTBot
Disallow: /blog/
User-agent: ClaudeBot
Disallow: /blog/
# Allow everywhere else
User-agent: *
Allow: /
Allow Some AI Bots, Block Others
# Block training bots
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
# But allow AI search bots (they bring traffic)
User-agent: PerplexityBot
Allow: /
User-agent: YouBot
Allow: /
Time-Based Testing
Want to test the impact? Block temporarily:
- Add AI bot blocking to robots.txt
- Wait 2-4 weeks
- Check bandwidth savings
- Keep or remove blocks based on results
How to Verify You Didn't Break SEO
Check #1: Google Search Console
- Go to Google Search Console
- Navigate to Settings > robots.txt Tester
- Enter your URL
- Select "Googlebot" as the user agent
- Click "Test"
Expected result: "Allowed"
If it says "Blocked," you have an error in your robots.txt.
Check #2: Monitor Search Traffic
Use Google Analytics:
- Go to Acquisition > All Traffic > Channels
- Check "Organic Search" traffic
- Compare before/after adding AI bot blocking
Expected result: No change or slight increase (due to better performance)
Check #3: Use CheckAIBots
Our tool tests both:
- Whether AI bots are blocked ✅
- Whether search engines can still access your site ✅
Selective Blocking Strategy
Not all AI bots are equal. Here's our recommended approach:
Always Block (No Value)
✅ GPTBot: Trains ChatGPT, no attribution
✅ ClaudeBot: Trains Claude, no attribution
✅ Bytespider: Ignores robots.txt anyway (need server blocking)
✅ CCBot: Creates training datasets, no benefit
✅ 360Spider: Often ignores rules
✅ anthropic-ai: Additional Anthropic training
✅ FacebookBot: Trains Meta AI, no benefit
Consider Allowing (Potential Traffic)
⚠️ PerplexityBot: Powers Perplexity AI search with attribution
⚠️ OAI-SearchBot: ChatGPT search with source links
⚠️ YouBot: You.com AI search platform
Allow (Good for SEO)
✅ Googlebot: Critical for Google Search
✅ Bingbot: Important for Bing Search
✅ DuckDuckBot: DuckDuckGo search traffic
✅ All other search engines
Real-World Impact: Does This Actually Work?
Case Study: Tech Blog
Before (allowing all bots):
- Monthly visitors: 50,000
- Bandwidth: 320GB
- AI bot bandwidth: 240GB (75%)
- Google search traffic: 35,000/month
After (blocking AI training bots):
- Monthly visitors: 50,000 (unchanged)
- Bandwidth: 100GB (69% reduction)
- AI bot bandwidth: 20GB (compliant bots only)
- Google search traffic: 36,000/month (slightly up!)
SEO impact: None negative, slight improvement due to better site performance.
Case Study: E-Commerce Site
Before:
- Organic search ranking: Average position 8.5
- Monthly search traffic: 100,000
After blocking all AI training bots:
- Organic search ranking: Average position 8.3 (improved)
- Monthly search traffic: 102,000 (improved)
Why improvement? Better site performance → better user experience → better rankings.
Complete robots.txt Template
Here's our recommended configuration:
# robots.txt for blocking AI crawlers while maintaining SEO
# Generated by CheckAIBots.com
# Allow all search engines by default
User-agent: *
Allow: /
# Block OpenAI GPTBot (ChatGPT training)
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
# Block Anthropic Claude training
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: anthropic-ai
Disallow: /
# Block Google AI training (NOT Google Search)
User-agent: Google-Extended
Disallow: /
# Block Common Crawl
User-agent: CCBot
Disallow: /
# Block ByteDance/TikTok (NOTE: Often ignores this)
User-agent: Bytespider
Disallow: /
# Block Meta/Facebook AI
User-agent: FacebookBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
# Block Apple Intelligence training
User-agent: Applebot-Extended
Disallow: /
# Block other AI training bots
User-agent: cohere-ai
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: YouBot
Disallow: /
# Chinese AI crawlers
User-agent: 360Spider
Disallow: /
User-agent: ChatGLM-Spider
Disallow: /
User-agent: PetalBot
Disallow: /
# Explicitly allow search engines (redundant but clear)
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: Slurp
Allow: /
User-agent: DuckDuckBot
Allow: /
# Sitemap (optional but recommended)
Sitemap: https://yoursite.com/sitemap.xml
To use: Copy this, replace yoursite.com with your domain, save as robots.txt in your website root.
Frequently Asked Questions
Q: Will this hurt my Google rankings?
A: No. Blocking AI training bots (GPTBot, ClaudeBot) has zero impact on search engine crawlers (Googlebot, Bingbot). These are completely separate systems.
Q: Do I need robots.txt AND server-level blocking?
A: For compliant bots (GPTBot, ClaudeBot), robots.txt works fine. For non-compliant bots (Bytespider), you need server-level blocking. We recommend both for defense in depth.
Q: Can I block AI bots but allow AI search bots?
A: Yes! Block GPTBot and ClaudeBot (training), but allow PerplexityBot and OAI-SearchBot (search with attribution).
Q: How often should I update robots.txt?
A: Review monthly. New AI bots emerge regularly. Subscribe to CheckAIBots updates to stay informed.
Q: What if I accidentally block Googlebot?
A: Your search rankings will drop. Use Google Search Console's robots.txt tester to verify Googlebot can access your site before deploying changes.
Conclusion
Blocking AI crawlers in robots.txt is safe, effective, and has zero negative SEO impact when done correctly.
Key takeaways:
- ✅ AI training bots ≠ Search engine bots
- ✅ Block GPTBot, ClaudeBot, CCBot without fear
- ✅ Always keep Googlebot and Bingbot allowed
- ✅ Test your configuration before deploying
- ✅ Use server-level blocking for non-compliant bots
Ready to protect your content?
👉 Generate Your Custom robots.txt →
Last updated: January 30, 2025
Related Articles:
Ready to Check Your Website?
Use CheckAIBots to instantly discover which AI crawlers can access your website and get actionable blocking recommendations
Free AI Crawler Check