CheckAIBots is a free online tool that analyzes your website's robots.txt file and performs actual access testing to determine which AI crawlers (like GPTBot, ClaudeBot, Meta-ExternalAgent, Google-Extended) can access your content. Updated for 2025, it checks 40+ different AI bots and provides detailed reports including robots.txt analysis, real crawler testing with spoofed user agent detection, server config generation, and bandwidth cost calculations.

How do I block AI bots from my website?

You can block AI bots by adding rules to your robots.txt file or using server-level blocking (nginx, Apache, Cloudflare WAF, or firewall rules). CheckAIBots provides one-click generators for all major platforms. For aggressive crawlers like Bytespider that ignore robots.txt, server-level blocking is required.

Which AI crawlers does CheckAIBots detect?

CheckAIBots detects 40+ AI crawlers (2025 updated) including GPTBot (OpenAI), ClaudeBot (Anthropic), Meta-ExternalAgent (Meta/Llama), ChatGPT-User, Google-Extended, CCBot (Common Crawl), Bytespider (ByteDance), PerplexityBot, OAI-SearchBot, Baiduspider, ChatGLM-Spider, DeepSeekBot, AI2Bot, and many more from major AI companies. It covers LLM training bots (80% of traffic), AI search engines (18%), AI assistants, and data collection services.

Will blocking AI bots affect my SEO?

No. Blocking AI training bots (like GPTBot, ClaudeBot) does not affect traditional search engine crawlers (like Googlebot, Bingbot). These are separate crawlers with different user agents and purposes. Your SEO rankings will remain completely unaffected when blocking AI training crawlers.

What is the difference between robots.txt checking and actual access testing?

Robots.txt checking analyzes your robots.txt file to see which bots SHOULD be blocked based on your configuration. Actual access testing sends real HTTP requests with AI crawler user agents to verify if bots are ACTUALLY blocked. This helps detect configuration errors and crawlers that ignore robots.txt rules.

How can I save bandwidth costs by blocking AI crawlers?

AI crawlers can consume significant bandwidth by repeatedly crawling your entire site for training data. CheckAIBots includes a bandwidth cost calculator that estimates your monthly AI bot traffic and potential savings. Some websites report saving 60-75% on CDN costs after blocking AI crawlers, especially for large content sites.

How to Block AI Crawlers: Complete 2025 Guide

35% of the world's top 1,000 websites now block AI crawlers — and for good reason. AI bots like GPTBot and ClaudeBot are scraping web content to train models like ChatGPT and Claude, costing website owners thousands in bandwidth and stealing traffic. 48% of news sites are already blocking AI crawlers.

This comprehensive guide shows you exactly how to block AI crawlers using 4 different methods, from beginner-friendly robots.txt to advanced server-level blocking.

Quick Navigation

Method 1: robots.txt (Easiest, works for compliant bots)
Method 2: Nginx (Recommended for most)
Method 3: Apache (For Apache servers)
Method 4: Cloudflare (One-click solution)

Why Block AI Crawlers?

Before we dive into the "how," let's quickly cover the "why":

Cost Savings:

Reduce bandwidth by 50-75%
Save $1,500-$5,000/month in CDN costs
Wikimedia saw 50% bandwidth increase from AI bots alone

Content Protection:

Prevent ChatGPT from training on your content
Stop content theft without attribution
Maintain control over intellectual property

Traffic Recovery:

Users visit your site instead of getting AI-generated answers
Protect advertising revenue
Maintain direct customer relationships

First, check which AI bots are currently crawling your site or use our free detection tool →

Method 1: robots.txt Configuration

Difficulty: ⭐ Easy
Effectiveness: 70% (compliant bots only)
Time: 5 minutes

How It Works

The robots.txt file tells crawlers which parts of your site they can access. Most legitimate AI crawlers (GPTBot, ClaudeBot) respect this file.

Step 1: Locate Your robots.txt

Your robots.txt file should be at: https://yoursite.com/robots.txt

If it doesn't exist, create it in your website's root directory.

Step 2: Add Blocking Rules

Add these lines to block major AI crawlers:

# Block OpenAI GPTBot
User-agent: GPTBot
Disallow: /

# Block Anthropic ClaudeBot
User-agent: ClaudeBot
Disallow: /

# Block Google Bard/Gemini training
User-agent: Google-Extended
Disallow: /

# Block Common Crawl
User-agent: CCBot
Disallow: /

# Block Bytespider (TikTok/ByteDance)
User-agent: Bytespider
Disallow: /

# Block Perplexity AI
User-agent: PerplexityBot
Disallow: /

# Block other major AI bots
User-agent: anthropic-ai
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: ChatGLM-Spider
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

Step 3: Block All AI Crawlers

For complete protection, use our robots.txt generator:

👉 Generate Custom robots.txt →

Step 4: Verify It Works

Save the file
Visit https://yoursite.com/robots.txt
Use CheckAIBots to verify blocking

Important Limitations

❌ robots.txt CANNOT block:

Bytespider (ignores the file completely)
360Spider (often doesn't respect rules)
Malicious scrapers
Bots with bugs in their code

For these bots, use Method 2 or 3 (server-level blocking).

Method 2: Nginx Configuration

Difficulty: ⭐⭐ Intermediate
Effectiveness: 95%
Time: 10 minutes

Server-level blocking prevents bots from accessing your site at all, regardless of whether they respect robots.txt.

Step 1: Open Nginx Config

sudo nano /etc/nginx/nginx.conf
# or
sudo nano /etc/nginx/sites-available/default

Step 2: Add User-Agent Blocking

Add this inside your server block:

# Block AI crawlers
if ($http_user_agent ~* (GPTBot|ClaudeBot|Claude-Web|anthropic-ai|cohere-ai|Omgilibot|FacebookBot|Applebot-Extended|Bytespider|YouBot|PerplexityBot|Google-Extended|CCBot|ChatGPT-User|OAI-SearchBot|Diffbot|ImagesiftBot|ChatGLM-Spider|360Spider|Baiduspider|PetalBot)) {
    return 403;
}

Step 3: Test Configuration

sudo nginx -t

If you see "test is successful", proceed to Step 4.

Step 4: Reload Nginx

sudo systemctl reload nginx

Advanced: Separate Config File

For cleaner organization:

# Create file
sudo nano /etc/nginx/snippets/block-ai-bots.conf

# Add this content:
if ($http_user_agent ~* (GPTBot|ClaudeBot|Claude-Web|anthropic-ai|cohere-ai|Omgilibot|FacebookBot|Applebot-Extended|Bytespider|YouBot|PerplexityBot|Google-Extended|CCBot|ChatGPT-User|OAI-SearchBot|Diffbot|ImagesiftBot|ChatGLM-Spider|360Spider|Baiduspider|PetalBot)) {
    return 403;
}

# Include in your server block:
include snippets/block-ai-bots.conf;

Verify Blocking Works

Test with curl:

curl -A "GPTBot" https://yoursite.com
# Should return: 403 Forbidden

Method 3: Apache/.htaccess

Difficulty: ⭐⭐ Intermediate
Effectiveness: 95%
Time: 10 minutes

Step 1: Locate .htaccess

Find your .htaccess file in your website root (where index.php/index.html is).

If it doesn't exist, create it:

touch .htaccess

Step 2: Add Blocking Rules

Add this to your .htaccess:

# Block AI Crawlers
RewriteEngine On

# Block GPTBot
RewriteCond %{HTTP_USER_AGENT} GPTBot [NC]
RewriteRule .* - [F,L]

# Block ClaudeBot
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Claude-Web [NC,OR]
RewriteCond %{HTTP_USER_AGENT} anthropic-ai [NC]
RewriteRule .* - [F,L]

# Block Google Bard/Gemini
RewriteCond %{HTTP_USER_AGENT} Google-Extended [NC]
RewriteRule .* - [F,L]

# Block other major AI bots
RewriteCond %{HTTP_USER_AGENT} CCBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bytespider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PerplexityBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} cohere-ai [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Omgilibot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FacebookBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Applebot-Extended [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Diffbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ImagesiftBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChatGLM-Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} 360Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]
RewriteRule .* - [F,L]

Step 3: Test Configuration

Visit your website. It should load normally. Then test:

curl -A "GPTBot" https://yoursite.com
# Should return: 403 Forbidden

Method 4: Cloudflare WAF Rules

Difficulty: ⭐ Easy
Effectiveness: 99%
Time: 3 minutes

If you use Cloudflare, this is the easiest and most effective method.

Option A: One-Click Blocking (All Cloudflare Plans)

Log in to Cloudflare Dashboard
Select your domain
Go to Security > Bots
Find "AI Scrapers and Crawlers"
Toggle it ON

Done! This blocks all known AI crawlers automatically.

Option B: Custom WAF Rule (More Control)

For selective blocking:

Go to Security > WAF
Click Create Rule
Rule name: Block AI Crawlers
Expression:

(http.user_agent contains "GPTBot") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "Claude-Web") or
(http.user_agent contains "anthropic-ai") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "CCBot") or
(http.user_agent contains "Bytespider") or
(http.user_agent contains "PerplexityBot") or
(http.user_agent contains "cohere-ai") or
(http.user_agent contains "Omgilibot") or
(http.user_agent contains "FacebookBot") or
(http.user_agent contains "Applebot-Extended") or
(http.user_agent contains "ChatGPT-User")

Action: Block
Click Deploy

Selective Blocking Strategy

Not all AI bots are bad. Here's a recommended approach:

Always Block (No Value)

✅ Bytespider (wastes bandwidth, ignores robots.txt)
✅ 360Spider (often doesn't respect rules)
✅ CCBot (Common Crawl - no direct benefit)
✅ GPTBot (if you don't want ChatGPT training on your content)
✅ ClaudeBot (if you don't want Claude training on your content)

Consider Allowing (Potential Traffic)

⚠️ PerplexityBot (drives AI search traffic)
⚠️ OAI-SearchBot (ChatGPT search with attribution)
⚠️ YouBot (AI search platform)

Use CheckAIBots to generate a customized blocking strategy.

How to Verify Your Blocking Works

Method 1: Use CheckAIBots

The easiest way:

Go to CheckAIBots.com
Enter your website URL
Click "Check Now"
See which bots are blocked/allowed
Get recommendations

Method 2: Test with Curl

# Test GPTBot
curl -A "GPTBot" https://yoursite.com

# Test ClaudeBot
curl -A "ClaudeBot" https://yoursite.com

# Test Bytespider
curl -A "Bytespider" https://yoursite.com

You should see 403 Forbidden or 451 Unavailable For Legal Reasons.

Method 3: Check Server Logs

Monitor your access logs for AI bot user agents:

grep -i "GPTBot\|ClaudeBot\|Bytespider" /var/log/nginx/access.log

If your blocking works, you should see 403 status codes.

Common Mistakes to Avoid

❌ Mistake #1: Only Using robots.txt

Problem: Bytespider and other aggressive bots ignore robots.txt
Solution: Use server-level blocking (nginx/Apache/Cloudflare)

❌ Mistake #2: Blocking Googlebot

Problem: You'll lose Google search rankings
Solution: Only block AI crawlers, not search engine bots

❌ Mistake #3: Typos in User Agent Names

Problem: GPTbot (lowercase 'b') won't block GPTBot
Solution: Use case-insensitive matching ([NC] in Apache, ~* in nginx)

❌ Mistake #4: Not Testing

Problem: You think you're protected but bots still get through
Solution: Always verify with CheckAIBots or curl tests

Real Results: Before & After Blocking

Case Study 1: Medium-Sized Blog

Before blocking:

Bandwidth: 320GB/month
CDN cost: $180/month
AI bot requests: 1.2M/month

After blocking (server-level):

Bandwidth: 80GB/month (75% reduction)
CDN cost: $45/month (saved $135/month)
AI bot requests: Blocked successfully

Case Study 2: E-Commerce Site

Before:

Bytespider requests: 50,000/day
Server load: High
Page load times: 2.8s

After blocking Bytespider:

Requests reduced to 0
Server load: Normal
Page load times: 1.2s (57% faster)

Next Steps

1. Check Your Current Status

👉 Use CheckAIBots to see which bots access your site

2. Choose Your Method

Beginner: robots.txt + Cloudflare one-click
Recommended: robots.txt + nginx/Apache blocking
Best: All methods combined

3. Implement & Verify

Follow the guides above, then verify it works.

4. Monitor Regularly

AI crawlers are constantly evolving. Check monthly for new bots.

Frequently Asked Questions

Q: Will this hurt my SEO?

A: No. AI crawlers are separate from search engine crawlers like Googlebot. Blocking GPTBot has zero impact on Google rankings.

Q: Can I block some bots but allow others?

A: Yes! Use selective blocking to allow AI search bots (potential traffic) while blocking training bots.

Q: What if new AI bots appear?

A: We update our database monthly. Subscribe to get updates about new crawlers.

Q: Is this legal?

A: Yes. You have the right to control who accesses your servers. Major publishers like NYT do this.

Conclusion

Blocking AI crawlers is essential for:

✅ Reducing bandwidth costs by 50-75%
✅ Protecting your original content
✅ Maintaining direct customer relationships
✅ Controlling your intellectual property

The best approach combines multiple methods:

robots.txt for compliant bots
Server-level blocking for aggressive bots
Regular monitoring to catch new crawlers

Start now: Check which AI bots can access your website →

Last updated: January 27, 2025

Related Articles:

How to Block AI Crawlers: Complete 2025 Guide

Quick Navigation

Why Block AI Crawlers?

Method 1: robots.txt Configuration

How It Works

Step 1: Locate Your robots.txt

Step 2: Add Blocking Rules

Step 3: Block All AI Crawlers

Step 4: Verify It Works

Important Limitations

Method 2: Nginx Configuration

Step 1: Open Nginx Config

Step 2: Add User-Agent Blocking

Step 3: Test Configuration

Step 4: Reload Nginx

Advanced: Separate Config File

Verify Blocking Works

Method 3: Apache/.htaccess

Step 1: Locate .htaccess

Step 2: Add Blocking Rules

Step 3: Test Configuration

Method 4: Cloudflare WAF Rules

Option A: One-Click Blocking (All Cloudflare Plans)

Option B: Custom WAF Rule (More Control)

Selective Blocking Strategy

Always Block (No Value)

Consider Allowing (Potential Traffic)

How to Verify Your Blocking Works

Method 1: Use CheckAIBots

Method 2: Test with Curl

Method 3: Check Server Logs

Common Mistakes to Avoid

❌ Mistake #1: Only Using robots.txt

❌ Mistake #2: Blocking Googlebot

❌ Mistake #3: Typos in User Agent Names

❌ Mistake #4: Not Testing

Real Results: Before & After Blocking

Case Study 1: Medium-Sized Blog

Case Study 2: E-Commerce Site

Next Steps

1. Check Your Current Status

2. Choose Your Method

3. Implement & Verify

4. Monitor Regularly

Frequently Asked Questions

Conclusion

Ready to Check Your Website?