Research & Data

Do Cybersecurity Sites Block AI Crawlers? 1 of 9 Do

Jun 14, 2026

The Cybersecurity category tells a story about the tension between vendor openness and media content protection. 1 of 9 Cybersecurity sites with a parseable robots.txt blocks at least one AI crawler — an 11.1% block rate. That one site is darkreading.com, a long-running security news and editorial publication. The remaining 8 — a mix of endpoint security vendors, threat intelligence firms, and researcher-focused publications — allow all tracked AI crawlers without restriction.

This is the June 2026 Closing Web edition, a sealed point-in-time snapshot of public robots.txt files across 493 sites and 48 categories, taken on June 14, 2026. The figures here are verbatim counts from that sealed dataset, identified by sha c5960481aa465ad3.

1 of 9 Cybersecurity sites block at least one AI crawler.

Cybersecurity sites post a 11.1% AI-crawler block rate.

Corpus-wide, 150 of 417 sites block at least one AI crawler.

Key Takeaways

1 of 9 Cybersecurity sites blocks at least one AI crawler — an 11.1% block rate.
The sole blocker is darkreading.com, a security media publication.
8 Cybersecurity sites allow all tracked AI crawlers: crowdstrike.com, paloaltonetworks.com, fortinet.com, kaspersky.com, norton.com, malwarebytes.com, sentinelone.com, krebsonsecurity.com.
mcafee.com returned no parseable robots.txt in this edition.
Corpus-wide, 150 of 417 sites block at least one AI crawler — a 36% rate.

1 of 9 Cybersecurity sites with a policy blocks AI crawlers — an 11.1% rate, well below the 36% corpus average across all 417 sites.

Who Blocks, Who Does Not — and What the Split Reveals

The distinction between darkreading.com and the other 8 sites in this category is not subtle: it maps almost perfectly onto the vendor-versus-media divide. darkreading.com is a content-first publication — it produces original journalism, threat reports, analysis, and feature writing. Its editorial output is the product. AI training on that content without permission potentially devalues the journalism pipeline that makes the publication viable.

The 8 sites that allow all crawlers share a different commercial model. crowdstrike.com, paloaltonetworks.com, fortinet.com, kaspersky.com, norton.com, malwarebytes.com, and sentinelone.com are security software vendors. Their public-facing web content consists primarily of product descriptions, documentation, threat library entries, and marketing material — content designed to maximize distribution. krebsonsecurity.com occupies interesting middle ground: it is a well-known independent security blog and newsletter operated by journalist Brian Krebs. Its open posture toward AI crawlers is notable given that it is content-first, like Dark Reading.

mcafee.com returned no parseable robots.txt file in this edition, which means it does not contribute to the blocking count in either direction.

darkreading.com is the only Cybersecurity site in this corpus that blocks any AI crawler, as of the June 14, 2026 sealed snapshot.

Where Cybersecurity Sits in the Category Landscape

The following table shows a focused window of the category ranking around Cybersecurity's 11.1% position. These are verbatim values from the sealed allCategoriesRanked data.

Focused Window — Cybersecurity and Its Nearest Neighbors

Category	Sites Checked	With Robots	Blocking Any AI	Block Rate
Education	9	7	1	14.3%
Government	9	8	1	12.5%
Crypto	9	8	1	12.5%
Books	9	8	1	12.5%
Religion	10	9	1	11.1%
Insurance	10	9	1	11.1%
Cybersecurity	10	9	1	11.1%
Productivity	10	10	1	10%
Marketing	10	10	1	10%
Nonprofit	10	6	0	0%
Streaming	10	10	0	0%

Highest-Blocking Categories

Category	Block Rate
Gaming	88.9%
News	82.4%
Food	70%

Cybersecurity at 11.1% sits in a cluster of verticals — Religion, Insurance, Government, Crypto — where the category block rate is driven entirely by a single site among many. This is a common pattern in the lower half of the ranking: when the overall block rate is low, the distinction matters less than which specific site is blocking and why.

The contrast with high-blocking categories like Gaming (88.9%) and News (82.4%) is instructive. In those verticals, content is the business — restricting AI access is a direct competitive response. In Cybersecurity, most of the sites produce software and services, not content for content's sake. The marketing category also sits at 10% for the same structural reason: one trade publisher blocks while the software vendors do not.

Which Bots Face the Most Resistance — Corpus-Wide

The following table shows the bot-level picture across all 417 sites with parseable policies in this edition. darkreading.com's specific disallow configuration is not broken out at the bot level in this fact sheet; what we can say is that it is the sole site in the Cybersecurity category contributing to any bot's block count.

AI Crawler	Sites Blocking It (all 417)	Block Rate
CCBot	118	28.3%
ClaudeBot	104	24.9%
GPTBot	93	22.3%
Bytespider	90	21.6%
Meta-ExternalAgent	84	20.1%
Applebot-Extended	81	19.4%
Google-Extended	81	19.4%
PerplexityBot	75	18%
Amazonbot	70	16.8%

CCBot leads with 118 sites blocking it across the full corpus of 417. Common Crawl, which CCBot crawls for, feeds a significant number of AI training pipelines — which explains why it draws the most disallows. ClaudeBot (104) and GPTBot (93) follow. In the Cybersecurity category, no site besides darkreading.com contributes to any of these counts.

Amazonbot, at the bottom of the tracked-bot list with 70 blocks, represents Amazon Alexa's data-sourcing crawler. Perplexity's bot sits at 75. The gap between the top-blocked CCBot (118) and the least-blocked Amazonbot (70) reflects the uneven adoption of AI access policies: most blocking is concentrated at a subset of sites that disallow many bots simultaneously, while other sites — including all Cybersecurity vendors except darkreading.com — disallow none at all.

Why This Block Rate Reads as Low — Given the Sector

One might expect a cybersecurity-focused category to show more aggressive access controls. These are organizations whose products are built around controlling access. The reality of their robots.txt posture suggests a clear-eyed separation between their security product philosophy and their content distribution strategy.

Security vendors want their threat research, product documentation, and marketing material to be found — by search engines, by AI answer systems, and by potential buyers researching solutions. Restricting AI crawlers from their public marketing pages would reduce their reach in AI-generated responses, which increasingly surface vendor recommendations and product comparisons. The incentive points toward openness.

Only when a cybersecurity site functions primarily as a publisher — deriving value from the editorial content itself, as Dark Reading does — does the calculus reverse. The marketing category shows a nearly identical pattern at 10%: one publisher-style site (adweek.com) blocks while the tool and platform vendors do not.

The banking category illustrates a different form of the same logic: when every site in a category treats its public web presence as a distribution channel rather than a content moat, the block rate falls to 0%.

For additional context on how categories that share Cybersecurity's 11.1% position differ in character, see agriculture (33.3%, a higher-blocking sector where content and proprietary data matter more) and the telecom category (0%, another infrastructure-heavy sector where the public web is purely a service-marketing channel).

Reading the Sealed Numbers — Methodology

This report draws from the June 2026 Closing Web edition, a sealed point-in-time crawl of public robots.txt files (sha c5960481aa465ad3). The corpus covers 493 sites across 48 categories; 417 returned a parseable robots.txt file. A site is counted as blocking if its robots.txt contains at least one Disallow directive targeting at least one of the 9 AI crawlers tracked in this edition.

The sealed-data process:

Collect. Each site's robots.txt at its canonical root is fetched over the public internet — no authenticated access, no internal systems.
Parse and classify. Each file is parsed for User-agent matches against the 9 tracked bots. A site is marked as a blocker if at least one Disallow covers at least one path for at least one bot.
Seal. The full result set is content-hashed. The sha c5960481aa465ad3 uniquely identifies this exact dataset and cannot be altered after the fact.
Aggregate. Category and corpus counts are computed directly from the sealed set. nothing is estimated, modeled, or extrapolated.

Site names in this report are drawn exclusively from the sealed blockerSites, allowerSites, and noRobotsSites arrays for the Cybersecurity category.

Frequently Asked Questions

Q: Why does darkreading.com block AI crawlers when other security publications like krebsonsecurity.com do not?

A: This snapshot can only report what the robots.txt files say — it cannot speak to the internal editorial or business decisions behind them. What the data shows is that darkreading.com disallows at least one tracked AI crawler, and krebsonsecurity.com does not. Both are content-first security publications. The different postures likely reflect different assessments of how AI access affects their respective content economics — but that reasoning is not in the sealed data.

Q: How many AI crawlers does darkreading.com block?

A: The sealed data marks darkreading.com as blocking at least one AI crawler, which is the threshold for the block classification. The exact list of specific disallowed bots is not broken out at the per-site, per-bot level in this fact sheet. The 11.1% block rate is driven entirely by this one site.

Q: Is the 11.1% Cybersecurity block rate above or below the corpus average?

A: The corpus-wide block rate across all 417 sites is 36%. Cybersecurity at 11.1% is substantially below that. Even among the lower-blocking categories, Cybersecurity sits in a dense cluster of single-blocker categories — Education (14.3%), Government (12.5%), Crypto (12.5%), Religion (11.1%), Insurance (11.1%) — all sharing the same pattern of one site among many breaking from the permissive majority.

Q: Could more Cybersecurity sites start blocking AI crawlers?

A: Yes — this snapshot reflects the state as of June 14, 2026. Cybersecurity vendors are particularly aware of AI-related policy questions, and it is plausible that more sites in this category will add disallow directives in future editions. The current baseline is 1 of 9. Any change in a future edition would be detectable against this sealed anchor.

Q: What does a robots.txt disallow actually accomplish for a site like darkreading.com?

A: robots.txt is an honor-system standard — crawlers operated by reputable AI companies (OpenAI, Anthropic, Google, Meta, and others) have publicly stated they respect robots.txt disallow directives. A disallow instruction therefore reduces the likelihood that compliant AI crawlers will index and train on the content. It does not prevent bad actors or non-compliant scrapers from accessing the content through other means.

Put AI-Access Data to Work

The 11.1% Cybersecurity block rate — concentrated in one media publisher among many vendor sites — creates distinct monitoring needs for different audiences.

Security vendor competitive intelligence teams tracking how peers manage AI access can use this snapshot as a baseline. The workflow: monthly re-fetch of robots.txt for each of the 8 open sites in this corpus — crowdstrike.com, paloaltonetworks.com, fortinet.com, kaspersky.com, norton.com, malwarebytes.com, sentinelone.com, and krebsonsecurity.com. The trigger is any new Disallow block for CCBot, ClaudeBot, or GPTBot. A first mover among major vendors would signal a meaningful shift in the category's posture.

Content syndication and media intelligence professionals serving the security publishing space need to know which outlets are currently accessible to AI indexers. darkreading.com is already off-limits. Monitoring whether krebsonsecurity.com or other security media sites join it — on a quarterly cadence — keeps the accessible content map current.

AI product developers building security-adjacent tools — threat feed aggregators, vulnerability summary systems, security vendor comparison engines — can use this snapshot to confirm which domains currently allow compliant crawling. The 8 open vendor sites represent the accessible tier; any shift in that list affects retrieval pipeline assumptions.

US Tech Automations builds scheduled robots.txt monitoring pipelines that detect policy changes the moment a site updates its file — no manual rechecks required. See the agentic workflows platform to configure drift alerts across the Cybersecurity category or any other tracked domain set.

This snapshot of Cybersecurity sites is one slice of a wider dataset; read how many top websites block AI crawlers for the cross-industry view.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha c5960481aa465ad3).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Cybersecurity Sites Block AI Crawlers? 1 of 9 Do.” https://ustechautomations.com/resources/blog/do-cybersecurity-sites-block-ai-crawlers-2026

Sealed snapshot sha256: c5960481aa465ad3

Machine-readable data: CSV · JSON · All research & methodology