Research & Data

Who Blocks Anthropic's ClaudeBot? 39 Sites of 107 Do

Jun 13, 2026

Anthropic runs 5 distinct web-crawling user-agents under its umbrella: ClaudeBot, anthropic-ai, Claude-Web, Claude-User, and Claude-SearchBot. Of the 107 prominent sites that returned a parseable robots.txt file in our June 13, 2026 snapshot, 39 — 36.4% — block at least one of them.

39 of 107 top sites block at least one Anthropic crawler in June 2026.

That makes Anthropic one of the most heavily blocked AI operators in a field of 12 tracked. For teams building AI products that depend on web-sourced training data or retrieval pipelines, the Anthropic blocking rate is a concrete operational constraint, not an abstract policy concern.

Snapshot Methodology

US Tech Automations fetched robots.txt files from 122 prominent sites on June 13, 2026. Of those 122, 107 returned a parseable file; all percentages in this report are computed over that 107-site base. The snapshot is point-in-time and sealed — nothing is estimated, modeled, or extrapolated. Every numeral in this report is a verbatim count from public robots.txt directives as they existed on that date.

The snapshot sha is 741353c4304216ee, which pins the exact state of the dataset. robots.txt is an honor-system standard — it measures a site operator's stated intent, not a technical firewall. These numbers will not change as sites later edit their files; they describe a specific moment.

The 122-site panel spans 10 content categories and 21 tracked bot user-agents across 12 AI operators. Across the full corpus, 48 of 107 sites (44.9%) block some AI crawler, 20 of 107 (18.7%) have adopted llms.txt, and 9 sites (8.4%) earned "star" status for the most comprehensive AI-access restrictions in the dataset.

How Often Anthropic Is Refused

Across Anthropic's 5 user-agents, the headline crawler ClaudeBot draws the most refusals — 38 of 107 sites block it. The legacy anthropic-ai token is blocked by 32 sites. Claude-Web, which powers web-browsing features, is refused by 26 sites. Claude-User and Claude-SearchBot each hit walls at 19 sites.

Anthropic User-Agent	Sites Blocking (of 107)
ClaudeBot	38
anthropic-ai	32
Claude-Web	26
Claude-User	19
Claude-SearchBot	19

ClaudeBot is blocked by 38 of 107 sites — the primary Anthropic crawling target.

The fact that the operator-level count (39) barely exceeds ClaudeBot's individual count (38) tells its own story: nearly every site that blocks Anthropic at all is blocking its primary crawler. The variance across agents is wide — 38 vs. 19 — suggesting that some publishers have updated their robots.txt with the newer user-agent strings while others still rely on the original entries. A site blocking only anthropic-ai but not ClaudeBot may be running a stale policy.

Sealed finding: 39 of 107 top sites (36.4%) blocked at least one Anthropic crawler as of June 13, 2026 — among the highest operator-level block counts in this 12-operator corpus.

The gap between Claude-Web (26 blocks) and Claude-SearchBot (19 blocks) mirrors what is seen in the OpenAI data: publishers are more concerned about browsing-mode retrieval than about search indexing. Allowing a search crawler may drive referral traffic; allowing a browsing agent feeds live content directly into AI product responses.

Sealed finding: 39 of those 48 total AI-blocking sites have named at least one Anthropic agent — making Anthropic the operator most broadly targeted by sites that restrict AI crawling.

Which Industries Block Anthropic

News publishers are by far the heaviest resisters, with 14 sites in that category blocking Anthropic crawlers. Tech follows at 7, then Entertainment at 5, and both Social and Reference at 4. Retail and Travel each contribute 2 blockers; Government adds 1.

Category	Sites Blocking Anthropic
News	14
Tech	7
Entertainment	5
Social	4
Reference	4
Retail	2
Travel	2
Government	1

News (14 sites) accounts for more than one-third of all 39 Anthropic refusals.

News accounts for 14 of 39 total Anthropic blockers. This concentration reflects the journalism sector's particular sensitivity to AI training. Outlets like the Washington Post, The Guardian, The Atlantic, and APNews have encoded opposition in their robots.txt while also publishing extensive reporting on AI data practices.

Tech media's 7 blockers (Wired, Ars Technica, CNET, ZDNet, Mashable, The Verge, TechCrunch) form a coherent bloc. These sites produce the analysis that AI companies' customers and executives read, and they have chosen to withhold it from training ingestion. Entertainment's 5 blockers are consistent with what we observe for other operators in this corpus — the major trade publications treat their archives as proprietary assets.

The Social category's 4 blockers are worth noting because they include platforms where user-generated content is the product. LinkedIn, Tumblr, Medium, and Vimeo each have users who never consented to AI training. The presence of 4 Social blockers for Anthropic — vs. 3 for both OpenAI and Common Crawl — reflects the breadth of Anthropic's named-crawler footprint. See how those same Social sites handle Common Crawl CCBot for a comparison with the oldest and most widely used web-crawling corpus.

The Named Sites That Block Anthropic

All 39 sites that block Anthropic are named in the sealed dataset. The table below shows 12 representative blockers, sorted by their overall headline-crawlers-blocked score — a count of how many of the 9 highest-volume tracked bots each site refuses.

Site	Category	Headline Crawlers Blocked (of 9)
bbc.com	News	9
bloomberg.com	News	9
usatoday.com	News	9
nytimes.com	News	8
cnn.com	News	8
theatlantic.com	News	8
wired.com	Tech	8
arstechnica.com	Tech	8
ebay.com	Retail	8
congress.gov	Government	8
rollingstone.com	Entertainment	8
washingtonpost.com	News	7

BBC, Bloomberg, and USA Today score a perfect 9 of 9 tracked headline bots blocked — they are not singling out Anthropic; they have closed the door to every major AI crawler. At the other end of the spectrum, WebMD, Business Insider, and the LA Times each block just 3 of the 9 tracked headline bots, suggesting more selective policies that may target specific use cases rather than AI broadly.

The Atlantic (8 headline bots blocked) blocks more broadly than some news peers. Its presence alongside legacy heavyweights like the NYT and CNN signals that the premium editorial segment — long-form, subscription-supported journalism — has largely converged on AI restriction as a default posture.

Other full-list members include Forbes (8), Vox (7), Newsweek (7), The Guardian (7), TechCrunch (6), Tumblr (6), APNews (6), Yelp (6), Quora (5), Vimeo (5), Investopedia (4), and ESPN (4). For contrast with how OpenAI is treated by these same publishers, see who blocks the GPTBot crawler and why the count is lower.

Per-Industry Analysis: Reading the Patterns

The category breakdown reveals two structural findings that distinguish Anthropic from its peers. First, the News count of 14 is the highest News-category block count for any operator in the 4-operator comparison set for this edition. Common Crawl shows 13 News blockers, OpenAI 10, and Google-Extended 7.

Second, the Social count of 4 is the highest among all operators for this corpus. LinkedIn, Tumblr, Medium, and Vimeo have each taken the position that user-generated content must not feed AI training pipelines without consent. That posture is meaningful regardless of whether robots.txt constitutes a legally binding restriction.

Reference's 4 blockers (Healthline, Quora, Investopedia, WebMD) represent sites where AI-generated answers can substitute for a human visit. For these publishers, AI training access is a traffic displacement question with direct revenue implications.

Retail's 2 blockers — eBay and Amazon — are significant because they are marketplace platforms where AI-trained product recommendations and descriptions could compete with their own paid advertising inventory. Government's 1 blocker, congress.gov, raises a distinct question: even data in the public domain can be subject to explicit access restrictions. Both eBay and Amazon block 7 or 8 headline bots each, signaling comprehensive AI-access policies rather than narrow, Anthropic-specific ones.

Put This Data to Work

For a content operations lead or retrieval-pipeline engineer at a company using Anthropic Claude products, this data has direct workflow implications. If your RAG pipeline or knowledge base depends on content from News or Tech publishers, 14 and 7 of those publishers respectively have explicitly refused Anthropic crawlers.

US Tech Automations can build a scheduled robots.txt monitoring system tailored to your domain watchlist. The workflow: nightly fetch of robots.txt for each tracked domain, parse all User-agent/Disallow pairs, diff against the prior run, push a structured alert (Slack, email, or webhook) when any Anthropic user-agent entry changes.

This is especially valuable for teams where legal or compliance needs to know the moment a key publisher modifies their AI access policy. The 39 blockers as of June 13, 2026 represent a point-in-time reading. Some of those publishers are in active licensing discussions with AI companies; others have sued or are in arbitration.

Frequently Asked Questions

Q: Does blocking ClaudeBot actually stop Anthropic from seeing my content?

A: No enforcement is guaranteed. robots.txt is an honor-system protocol. Anthropic has publicly stated it respects Disallow directives. There is no technical enforcement mechanism, but non-compliance would be a significant reputational and legal risk for a company of this profile.

Q: If I block ClaudeBot, does it affect Claude-SearchBot separately?

A: Yes. Each user-agent string is evaluated independently by the robots.txt parser. A rule blocking ClaudeBot does not automatically block Claude-SearchBot unless the file explicitly names that agent or uses the wildcard User-agent: * — which is why the sealed counts differ: 38 for ClaudeBot vs. 19 for Claude-SearchBot.

Q: Does blocking Anthropic crawlers affect my Google Search ranking?

A: No. Googlebot and Google-Extended are entirely separate user-agents from any Anthropic crawler. Blocking any Anthropic agent has no bearing on Google Search indexing or ranking. See who blocks Google-Extended for the comparative block count.

Q: Why does Anthropic have 5 separate crawlers?

A: Each agent serves a distinct product function: ClaudeBot for training, anthropic-ai as a legacy identifier, Claude-Web for browsing-mode retrieval, Claude-User for user-initiated fetches, and Claude-SearchBot for search indexing. Publishers who want fine-grained control can allow some while blocking others, which is why counts differ across the 5 agents.

Q: Is 39 sites a high blocking count compared to other operators?

A: Among the 4 largest AI operators in this corpus, Anthropic at 39 sites is the highest. Common Crawl follows at 40 (the absolute highest), OpenAI at 35 sites, and Google-Extended at 25 sites. The 4-operator comparison context is provided here: 39 of those 48 total AI-blocking sites have targeted at least one Anthropic agent.

Key Takeaways

39 of 107 top sites block at least one of Anthropic's 5 crawlers — among the highest operator-level block rates in this 12-operator corpus.
ClaudeBot is blocked by 38 sites; Claude-SearchBot by 19 — the gap suggests publishers distinguish training ingestion from search retrieval.
News (14 sites) accounts for more than one-third of all Anthropic refusals; Tech (7) and Entertainment (5) follow.
BBC, Bloomberg, and USA Today each block 9 of 9 tracked headline bots — Anthropic is part of a broad, category-wide rejection by top-tier publishers.
48 of 107 sites (44.9%) block some AI crawler; 39 of those 48 have named at least one Anthropic agent in their robots.txt.
The sealed snapshot sha 741353c4304216ee pins the exact dataset; nothing is derived or estimated from secondary sources.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 13, 2026 (snapshot sha 741353c4304216ee).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Who Blocks Anthropic's ClaudeBot? 39 Sites of 107 Do.” https://ustechautomations.com/resources/blog/who-blocks-anthropic-claudebot-2026

Sealed snapshot sha256: 741353c4304216ee

Machine-readable data: CSV · JSON · All research & methodology