Research & Data

Do Insurance Sites Block AI Crawlers? 1 of 9 Do

Jun 14, 2026

Of the 10 Insurance sites we checked for this edition, 9 returned a parseable robots.txt file — and just 1 of those 9 blocks at least one AI crawler. That is an 11.1% block rate, placing Insurance among the most permissive categories in our entire 418-site snapshot. The lone blocker is prudential.com. The remaining eight sites — geico.com, progressive.com, statefarm.com, nationwide.com, libertymutual.com, farmers.com, usaa.com, and metlife.com — allow every AI crawler we checked. This is sealed-snapshot data, point-in-time as of June 14, 2026 (sha 27ca61d890a647db).

What makes this result distinctive is not the low block rate in isolation — it is how far Insurance falls below the corpus average. Across all 354 sites with a parseable robots.txt in this edition, 139 block at least one AI crawler, a 39.3% corpus-wide rate. Insurance at 11.1% sits well below that line, suggesting that major carriers have collectively chosen openness over access restriction when it comes to AI training and summarization crawls. That is a meaningful signal for anyone monitoring AI-access policy in the financial services sector.

1 of 9 Insurance sites block at least one AI crawler.

Insurance sites post a 11.1% AI-crawler block rate.

Corpus-wide, 139 of 354 sites block at least one AI crawler.

Key Takeaways

1 of 9 Insurance sites with a robots.txt blocks at least one AI crawler.

prudential.com is the sole Insurance blocker in this sealed snapshot.

The Insurance block rate of 11.1% sits well below the 39.3% corpus-wide average.

8 of 9 Insurance sites with parseable robots.txt files allow every AI crawler we checked.

Of 10 Insurance sites checked, 9 returned a parseable robots.txt; only 1 of those 9 blocks any AI crawler — an 11.1% block rate as of June 14, 2026.

Across all 354 sites with a parseable robots.txt in this edition, 139 block at least one AI crawler — a corpus-wide rate of 39.3%.

Who Gates the Crawlers in Insurance — and Who Does Not

A robots.txt file is a plain-text instruction set that tells web crawlers which paths they may or may not access. It is an honor-system standard: compliant bots read it, but nothing in the protocol enforces it. When a site adds a Disallow rule for a named AI crawler, it signals an intent to withhold content from that operator's training or indexing pipelines — but it does not prevent a non-compliant crawler from proceeding.

In Insurance, that signal is rare. prudential.com is the only site in the category that has made use of it. The eight sites that allow all crawlers have not taken a public stance against AI indexing through their robots.txt, which means any compliant AI system — from GPTBot to ClaudeBot to CCBot — can crawl their public content without an explicit barrier.

allstate.com returned no parseable robots.txt at all in this snapshot. A missing robots.txt is not a block — by convention it means crawlers may proceed. But it also means allstate.com has published no explicit AI-access policy through this mechanism.

The breadth of the allow list is notable. geico.com, progressive.com, and statefarm.com are among the most-trafficked insurance properties in the U.S., and all three are wide open to AI crawlers under this snapshot. usaa.com, which serves a more restricted membership base, likewise imposes no crawler restrictions through its robots.txt. For AI systems training on insurance domain content, this category is largely accessible.

You can see how Insurance compares to other open categories like Productivity and Genealogy elsewhere in this batch.

The Insurance Sites: Per-Site AI-Access Status

The table below records each of the 10 Insurance sites, whether a parseable robots.txt was found, and whether any AI crawler is blocked.

Site	robots.txt Present	Blocks Any AI Crawler
prudential.com	Yes	Yes
geico.com	Yes	No
progressive.com	Yes	No
statefarm.com	Yes	No
nationwide.com	Yes	No
libertymutual.com	Yes	No
farmers.com	Yes	No
usaa.com	Yes	No
metlife.com	Yes	No
allstate.com	No	—

Eight of the nine sites with parseable robots.txt files allow all AI crawlers. allstate.com is the only site that returned no robots.txt in this snapshot and is therefore excluded from the block-rate denominator.

Where Insurance Sits Across All 40 Categories

The table below shows all 40 categories in this edition, ranked from highest to lowest block rate. Insurance and its peer categories appear toward the permissive end of the spectrum.

Category	Sites Checked	With robots.txt	Any Blocker	Block Rate
Gaming	9	9	8	88.9%
News	20	17	14	82.4%
Food	10	10	7	70%
Tech	15	13	9	69.2%
Entertainment	9	9	6	66.7%
Healthcare	10	9	6	66.7%
Music	10	9	6	66.7%
Parenting	10	8	5	62.5%
Outdoors	10	5	3	60%
Reference	14	11	6	54.5%
Science	10	10	5	50%
Wedding	10	8	4	50%
Automotive	10	9	4	44.4%
HomeGarden	10	9	4	44.4%
Fashion	9	7	3	42.9%
Social	10	10	4	40%
Sports	10	10	4	40%
Fitness	10	10	4	40%
Photography	10	10	4	40%
Genealogy	10	10	4	40%
Jobs	10	8	3	37.5%
Travel	9	9	3	33.3%
Weather	10	6	2	33.3%
Beauty	10	6	2	33.3%
Legal	10	7	2	28.6%
RealEstate	10	7	2	28.6%
Pets	10	7	2	28.6%
Crafts	10	8	2	25%
Finance	12	11	2	18.2%
Retail	15	12	2	16.7%
Education	9	7	1	14.3%
Government	9	8	1	12.5%
Crypto	9	8	1	12.5%
Books	9	8	1	12.5%
Religion	10	9	1	11.1%
Insurance	10	9	1	11.1%
Productivity	10	10	1	10%
Nonprofit	10	6	0	0%
Streaming	10	10	0	0%
Dating	10	5	0	0%

Insurance sits near the bottom of the block-rate range — tied with Religion at 11.1% and just above Productivity at 10%. Only three categories post a zero block rate across this entire corpus: Nonprofit, Streaming, and Dating.

The Operator-Level Picture Across All 354 Sites

Even though Insurance itself is nearly fully open, the broader corpus tells a different story. The table below shows how many of all 354 sites with parseable robots.txt files block each major AI operator — not Insurance alone, but the full 418-site snapshot. This corpus-wide view shows which operators face the most friction.

Operator	Sites Blocking (all 354)
Common Crawl	109
Anthropic	104
Meta	89
OpenAI	87
ByteDance	83
Google	76
Perplexity	74
Apple	74
Cohere	68
Amazon	64
Diffbot	64
Mistral	24

Common Crawl and Anthropic lead as the most frequently blocked operators across all 354 sites, at 109 and 104 blocks respectively. OpenAI's GPTBot faces 87 blocks. These figures describe the full corpus — not Insurance specifically, where the pattern is nearly the inverse: one blocker, and almost universal openness.

For an operator like Anthropic, Insurance is a largely uncontested category for training-data access. The single robots.txt block at prudential.com is the only explicit barrier in this entire category.

How the Snapshot Was Sealed

This report is based on a point-in-time crawl of public robots.txt files conducted on June 14, 2026 and sealed as sha 27ca61d890a647db. The process is designed for reproducibility and honesty: nothing is estimated, modeled, or extrapolated.

Collect. Each of the 418 sites in the corpus was fetched at its canonical robots.txt path. Only public, unauthenticated responses were captured.
Parse. The fetched file was parsed for User-agent directives matching known AI crawler names. A site is marked as "blocking" only when its robots.txt contains a Disallow: / or equivalent broad disallow for at least one recognized AI bot token.
Seal. The full result set was hashed and content-addressed, producing the sha 27ca61d890a647db that anchors every figure in this report. No figure has been updated or modified since sealing.

A site that returned no robots.txt — like allstate.com — is counted in the sites total but excluded from the withRobots denominator. Block rates are computed against the withRobots count, not the raw site count, so they describe what sites have actually published as access policy.

Frequently Asked Questions

Q: Why does prudential.com block AI crawlers while the other major carriers do not?

A: The sealed snapshot records the fact of the block, not the motivation. Prudential may have added a blanket AI-crawler disallow as a precautionary measure or in response to specific policy guidance — we cannot know from the robots.txt file alone. What the data shows is that eight of the nine carriers with parseable robots.txt files have not made the same choice, as of June 14, 2026.

Q: Does allstate.com have an AI-access policy?

A: In this snapshot, allstate.com returned no parseable robots.txt. By convention, the absence of a robots.txt means crawlers may proceed — but it also means the site has not published a formal directive through this mechanism. Whether Allstate has a separate contractual or technical policy on AI access is outside the scope of what a robots.txt snapshot can capture.

Q: Does being blocked in robots.txt actually stop AI crawlers from indexing a site?

A: robots.txt is an honor-system standard. Crawlers that respect the protocol — which includes most major AI systems that publish their bot tokens — will observe the disallow. Crawlers that do not follow the protocol can proceed regardless. A robots.txt block is an instruction, not a technical enforcement mechanism like a firewall.

Q: How does the Insurance block rate compare to the corpus average?

A: Insurance posts an 11.1% block rate. Across all 354 sites with a parseable robots.txt in this edition, 139 block at least one AI crawler — a 39.3% corpus-wide average. Insurance is substantially below that line. Finance, a closely related category, posts an 18.2% block rate with 2 of 11 sites blocking.

Q: Is a low block rate in Insurance likely to stay this way?

A: This snapshot is cross-sectional — it captures policy at one moment in time. Whether block rates rise or fall is a question this single edition cannot answer. The value of a sealed snapshot is precisely that it creates a fixed reference point: you can re-run the same crawl at a later date and compare like-for-like.

Put AI-Access Data to Work

Three types of professionals derive concrete recurring value from tracking Insurance AI-access policy over time.

AI/LLM product teams and data sourcing leads who use Insurance content for training or retrieval-augmented generation workflows need to know which sites are open and which have erected barriers. Right now, Insurance is nearly all open — but prudential.com's block is a real restriction for any team relying on that domain's content.

A recurring weekly or monthly re-crawl of these nine sites, set to alert on any new Disallow directive, gives a product team advance notice before a training-data source goes dark. US Tech Automations automates exactly this monitoring: scheduled robots.txt re-crawls, policy-change diffing, and routed alerts to the team or pipeline that owns the affected data source.

SEO and digital marketing teams at Insurance carriers increasingly track how AI crawlers treat competitor sites. If a competitor adds a Disallow for GPTBot or Google-Extended, that is a visible signal about competitive strategy — potentially opening more AI-surface for those who stay open. A quarterly policy-audit workflow, comparing current robots.txt states against this sealed baseline, keeps competitive-intelligence teams current without manual checks.

Compliance and AI-governance counsel advising Insurance companies need a defensible record of what access policy existed at a specific date. The sealed snapshot format — with sha 27ca61d890a647db anchoring the June 14, 2026 state — is exactly the kind of verifiable artifact that supports legal or regulatory review of AI training-data provenance.

For any of these workflows, US Tech Automations builds the scheduling, alerting, and change-detection pipelines that turn a one-time snapshot into a live monitoring system.

You can also compare Insurance against adjacent categories like Beauty and Outdoors to see how different consumer verticals approach AI-access policy.

This snapshot of Insurance sites is one slice of a wider dataset; read how many top websites block AI crawlers for the cross-industry view.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 27ca61d890a647db).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Insurance Sites Block AI Crawlers? 1 of 9 Do.” https://ustechautomations.com/resources/blog/do-insurance-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 27ca61d890a647db

Machine-readable data: CSV · JSON · All research & methodology