How to Block AI Crawlers: Complete 2025 Guide (4 Methods)
How to Block AI Crawlers: Complete 2025 Guide
35% of the world's top 1,000 websites now block AI crawlers — and for good reason. AI bots like GPTBot and ClaudeBot are scraping web content to train models like ChatGPT and Claude, costing website owners thousands in bandwidth and stealing traffic. 48% of news sites are already blocking AI crawlers.
This comprehensive guide shows you exactly how to block AI crawlers using 4 different methods, from beginner-friendly robots.txt to advanced server-level blocking.
Quick Navigation
- Method 1: robots.txt (Easiest, works for compliant bots)
- Method 2: Nginx (Recommended for most)
- Method 3: Apache (For Apache servers)
- Method 4: Cloudflare (One-click solution)
Why Block AI Crawlers?
Before we dive into the "how," let's quickly cover the "why":
Cost Savings:
- Reduce bandwidth by 50-75%
- Save $1,500-$5,000/month in CDN costs
- Wikimedia saw 50% bandwidth increase from AI bots alone
Content Protection:
- Prevent ChatGPT from training on your content
- Stop content theft without attribution
- Maintain control over intellectual property
Traffic Recovery:
- Users visit your site instead of getting AI-generated answers
- Protect advertising revenue
- Maintain direct customer relationships
First, check which AI bots are currently crawling your site or use our free detection tool →
Method 1: robots.txt Configuration
Difficulty: ⭐ Easy
Effectiveness: 70% (compliant bots only)
Time: 5 minutes
How It Works
The robots.txt file tells crawlers which parts of your site they can access. Most legitimate AI crawlers (GPTBot, ClaudeBot) respect this file.
Step 1: Locate Your robots.txt
Your robots.txt file should be at: https://yoursite.com/robots.txt
If it doesn't exist, create it in your website's root directory.
Step 2: Add Blocking Rules
Add these lines to block major AI crawlers:
# Block OpenAI GPTBot
User-agent: GPTBot
Disallow: /
# Block Anthropic ClaudeBot
User-agent: ClaudeBot
Disallow: /
# Block Google Bard/Gemini training
User-agent: Google-Extended
Disallow: /
# Block Common Crawl
User-agent: CCBot
Disallow: /
# Block Bytespider (TikTok/ByteDance)
User-agent: Bytespider
Disallow: /
# Block Perplexity AI
User-agent: PerplexityBot
Disallow: /
# Block other major AI bots
User-agent: anthropic-ai
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: ChatGLM-Spider
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
Step 3: Block All 29 AI Crawlers
For complete protection, use our robots.txt generator:
👉 Generate Custom robots.txt →
Step 4: Verify It Works
- Save the file
- Visit
https://yoursite.com/robots.txt - Use CheckAIBots to verify blocking
Important Limitations
❌ robots.txt CANNOT block:
- Bytespider (ignores the file completely)
- 360Spider (often doesn't respect rules)
- Malicious scrapers
- Bots with bugs in their code
For these bots, use Method 2 or 3 (server-level blocking).
Method 2: Nginx Configuration
Difficulty: ⭐⭐ Intermediate
Effectiveness: 95%
Time: 10 minutes
Server-level blocking prevents bots from accessing your site at all, regardless of whether they respect robots.txt.
Step 1: Open Nginx Config
sudo nano /etc/nginx/nginx.conf
# or
sudo nano /etc/nginx/sites-available/default
Step 2: Add User-Agent Blocking
Add this inside your server block:
# Block AI crawlers
if ($http_user_agent ~* (GPTBot|ClaudeBot|Claude-Web|anthropic-ai|cohere-ai|Omgilibot|FacebookBot|Applebot-Extended|Bytespider|YouBot|PerplexityBot|Google-Extended|CCBot|ChatGPT-User|OAI-SearchBot|Diffbot|ImagesiftBot|ChatGLM-Spider|360Spider|Baiduspider|PetalBot)) {
return 403;
}
Step 3: Test Configuration
sudo nginx -t
If you see "test is successful", proceed to Step 4.
Step 4: Reload Nginx
sudo systemctl reload nginx
Advanced: Separate Config File
For cleaner organization:
# Create file
sudo nano /etc/nginx/snippets/block-ai-bots.conf
# Add this content:
if ($http_user_agent ~* (GPTBot|ClaudeBot|Claude-Web|anthropic-ai|cohere-ai|Omgilibot|FacebookBot|Applebot-Extended|Bytespider|YouBot|PerplexityBot|Google-Extended|CCBot|ChatGPT-User|OAI-SearchBot|Diffbot|ImagesiftBot|ChatGLM-Spider|360Spider|Baiduspider|PetalBot)) {
return 403;
}
# Include in your server block:
include snippets/block-ai-bots.conf;
Verify Blocking Works
Test with curl:
curl -A "GPTBot" https://yoursite.com
# Should return: 403 Forbidden
Method 3: Apache/.htaccess
Difficulty: ⭐⭐ Intermediate
Effectiveness: 95%
Time: 10 minutes
Step 1: Locate .htaccess
Find your .htaccess file in your website root (where index.php/index.html is).
If it doesn't exist, create it:
touch .htaccess
Step 2: Add Blocking Rules
Add this to your .htaccess:
# Block AI Crawlers
RewriteEngine On
# Block GPTBot
RewriteCond %{HTTP_USER_AGENT} GPTBot [NC]
RewriteRule .* - [F,L]
# Block ClaudeBot
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Claude-Web [NC,OR]
RewriteCond %{HTTP_USER_AGENT} anthropic-ai [NC]
RewriteRule .* - [F,L]
# Block Google Bard/Gemini
RewriteCond %{HTTP_USER_AGENT} Google-Extended [NC]
RewriteRule .* - [F,L]
# Block other major AI bots
RewriteCond %{HTTP_USER_AGENT} CCBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bytespider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PerplexityBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} cohere-ai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Omgilibot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FacebookBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Applebot-Extended [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Diffbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ImagesiftBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChatGLM-Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} 360Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]
RewriteRule .* - [F,L]
Step 3: Test Configuration
Visit your website. It should load normally. Then test:
curl -A "GPTBot" https://yoursite.com
# Should return: 403 Forbidden
Method 4: Cloudflare WAF Rules
Difficulty: ⭐ Easy
Effectiveness: 99%
Time: 3 minutes
If you use Cloudflare, this is the easiest and most effective method.
Option A: One-Click Blocking (All Cloudflare Plans)
- Log in to Cloudflare Dashboard
- Select your domain
- Go to Security > Bots
- Find "AI Scrapers and Crawlers"
- Toggle it ON
Done! This blocks all known AI crawlers automatically.
Option B: Custom WAF Rule (More Control)
For selective blocking:
- Go to Security > WAF
- Click Create Rule
- Rule name:
Block AI Crawlers - Expression:
(http.user_agent contains "GPTBot") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "Claude-Web") or
(http.user_agent contains "anthropic-ai") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "CCBot") or
(http.user_agent contains "Bytespider") or
(http.user_agent contains "PerplexityBot") or
(http.user_agent contains "cohere-ai") or
(http.user_agent contains "Omgilibot") or
(http.user_agent contains "FacebookBot") or
(http.user_agent contains "Applebot-Extended") or
(http.user_agent contains "ChatGPT-User")
- Action: Block
- Click Deploy
Selective Blocking Strategy
Not all AI bots are bad. Here's a recommended approach:
Always Block (No Value)
- ✅ Bytespider (wastes bandwidth, ignores robots.txt)
- ✅ 360Spider (often doesn't respect rules)
- ✅ CCBot (Common Crawl - no direct benefit)
- ✅ GPTBot (if you don't want ChatGPT training on your content)
- ✅ ClaudeBot (if you don't want Claude training on your content)
Consider Allowing (Potential Traffic)
- ⚠️ PerplexityBot (drives AI search traffic)
- ⚠️ OAI-SearchBot (ChatGPT search with attribution)
- ⚠️ YouBot (AI search platform)
Use CheckAIBots to generate a customized blocking strategy.
How to Verify Your Blocking Works
Method 1: Use CheckAIBots
The easiest way:
- Go to CheckAIBots.com
- Enter your website URL
- Click "Check Now"
- See which bots are blocked/allowed
- Get recommendations
Method 2: Test with Curl
# Test GPTBot
curl -A "GPTBot" https://yoursite.com
# Test ClaudeBot
curl -A "ClaudeBot" https://yoursite.com
# Test Bytespider
curl -A "Bytespider" https://yoursite.com
You should see 403 Forbidden or 451 Unavailable For Legal Reasons.
Method 3: Check Server Logs
Monitor your access logs for AI bot user agents:
grep -i "GPTBot\|ClaudeBot\|Bytespider" /var/log/nginx/access.log
If your blocking works, you should see 403 status codes.
Common Mistakes to Avoid
❌ Mistake #1: Only Using robots.txt
Problem: Bytespider and other aggressive bots ignore robots.txt
Solution: Use server-level blocking (nginx/Apache/Cloudflare)
❌ Mistake #2: Blocking Googlebot
Problem: You'll lose Google search rankings
Solution: Only block AI crawlers, not search engine bots
❌ Mistake #3: Typos in User Agent Names
Problem: GPTbot (lowercase 'b') won't block GPTBot
Solution: Use case-insensitive matching ([NC] in Apache, ~* in nginx)
❌ Mistake #4: Not Testing
Problem: You think you're protected but bots still get through
Solution: Always verify with CheckAIBots or curl tests
Real Results: Before & After Blocking
Case Study 1: Medium-Sized Blog
Before blocking:
- Bandwidth: 320GB/month
- CDN cost: $180/month
- AI bot requests: 1.2M/month
After blocking (server-level):
- Bandwidth: 80GB/month (75% reduction)
- CDN cost: $45/month (saved $135/month)
- AI bot requests: Blocked successfully
Case Study 2: E-Commerce Site
Before:
- Bytespider requests: 50,000/day
- Server load: High
- Page load times: 2.8s
After blocking Bytespider:
- Requests reduced to 0
- Server load: Normal
- Page load times: 1.2s (57% faster)
Next Steps
1. Check Your Current Status
👉 Use CheckAIBots to see which bots access your site
2. Choose Your Method
- Beginner: robots.txt + Cloudflare one-click
- Recommended: robots.txt + nginx/Apache blocking
- Best: All methods combined
3. Implement & Verify
Follow the guides above, then verify it works.
4. Monitor Regularly
AI crawlers are constantly evolving. Check monthly for new bots.
Frequently Asked Questions
Q: Will this hurt my SEO?
A: No. AI crawlers are separate from search engine crawlers like Googlebot. Blocking GPTBot has zero impact on Google rankings.
Q: Can I block some bots but allow others?
A: Yes! Use selective blocking to allow AI search bots (potential traffic) while blocking training bots.
Q: What if new AI bots appear?
A: We update our database monthly. Subscribe to get updates about new crawlers.
Q: Is this legal?
A: Yes. You have the right to control who accesses your servers. Major publishers like NYT do this.
Conclusion
Blocking AI crawlers is essential for:
- ✅ Reducing bandwidth costs by 50-75%
- ✅ Protecting your original content
- ✅ Maintaining direct customer relationships
- ✅ Controlling your intellectual property
The best approach combines multiple methods:
- robots.txt for compliant bots
- Server-level blocking for aggressive bots
- Regular monitoring to catch new crawlers
Start now: Check which AI bots can access your website →
Last updated: January 27, 2025
Related Articles:
Ready to Check Your Website?
Use CheckAIBots to instantly discover which AI crawlers can access your website and get actionable blocking recommendations
Free AI Crawler Check