Research & Data

Do Insurance Sites Block AI Crawlers? 1 of 9 Do

Jun 14, 2026

Of the 10 Insurance sites we checked for this edition, 9 returned a parseable robots.txt file — and just 1 of those 9 blocks at least one AI crawler. That is an 11.1% block rate, placing Insurance among the most permissive categories in our entire 418-site snapshot. The lone blocker is prudential.com. The remaining eight sites — geico.com, progressive.com, statefarm.com, nationwide.com, libertymutual.com, farmers.com, usaa.com, and metlife.com — allow every AI crawler we checked. This is sealed-snapshot data, point-in-time as of June 14, 2026 (sha 27ca61d890a647db).

What makes this result distinctive is not the low block rate in isolation — it is how far Insurance falls below the corpus average. Across all 354 sites with a parseable robots.txt in this edition, 139 block at least one AI crawler, a 39.3% corpus-wide rate. Insurance at 11.1% sits well below that line, suggesting that major carriers have collectively chosen openness over access restriction when it comes to AI training and summarization crawls. That is a meaningful signal for anyone monitoring AI-access policy in the financial services sector.

1 of 9 Insurance sites block at least one AI crawler.

Insurance sites post a 11.1% AI-crawler block rate.

Corpus-wide, 139 of 354 sites block at least one AI crawler.

Key Takeaways

1 of 9 Insurance sites with a robots.txt blocks at least one AI crawler.

prudential.com is the sole Insurance blocker in this sealed snapshot.

The Insurance block rate of 11.1% sits well below the 39.3% corpus-wide average.

8 of 9 Insurance sites with parseable robots.txt files allow every AI crawler we checked.

Of 10 Insurance sites checked, 9 returned a parseable robots.txt; only 1 of those 9 blocks any AI crawler — an 11.1% block rate as of June 14, 2026.

Across all 354 sites with a parseable robots.txt in this edition, 139 block at least one AI crawler — a corpus-wide rate of 39.3%.

Who Gates the Crawlers in Insurance — and Who Does Not

A robots.txt file is a plain-text instruction set that tells web crawlers which paths they may or may not access. It is an honor-system standard: compliant bots read it, but nothing in the protocol enforces it. When a site adds a Disallow rule for a named AI crawler, it signals an intent to withhold content from that operator's training or indexing pipelines — but it does not prevent a non-compliant crawler from proceeding.

In Insurance, that signal is rare. prudential.com is the only site in the category that has made use of it. The eight sites that allow all crawlers have not taken a public stance against AI indexing through their robots.txt, which means any compliant AI system — from GPTBot to ClaudeBot to CCBot — can crawl their public content without an explicit barrier.

allstate.com returned no parseable robots.txt at all in this snapshot. A missing robots.txt is not a block — by convention it means crawlers may proceed. But it also means allstate.com has published no explicit AI-access policy through this mechanism.

The breadth of the allow list is notable. geico.com, progressive.com, and statefarm.com are among the most-trafficked insurance properties in the U.S., and all three are wide open to AI crawlers under this snapshot. usaa.com, which serves a more restricted membership base, likewise imposes no crawler restrictions through its robots.txt. For AI systems training on insurance domain content, this category is largely accessible.

You can see how Insurance compares to other open categories like Productivity and Genealogy elsewhere in this batch.

The Insurance Sites: Per-Site AI-Access Status

The table below records each of the 10 Insurance sites, whether a parseable robots.txt was found, and whether any AI crawler is blocked.

Siterobots.txt PresentBlocks Any AI Crawler
prudential.comYesYes
geico.comYesNo
progressive.comYesNo
statefarm.comYesNo
nationwide.comYesNo
libertymutual.comYesNo
farmers.comYesNo
usaa.comYesNo
metlife.comYesNo
allstate.comNo

Eight of the nine sites with parseable robots.txt files allow all AI crawlers. allstate.com is the only site that returned no robots.txt in this snapshot and is therefore excluded from the block-rate denominator.

Where Insurance Sits Across All 40 Categories

The table below shows all 40 categories in this edition, ranked from highest to lowest block rate. Insurance and its peer categories appear toward the permissive end of the spectrum.

CategorySites CheckedWith robots.txtAny BlockerBlock Rate
Gaming99888.9%
News20171482.4%
Food1010770%
Tech1513969.2%
Entertainment99666.7%
Healthcare109666.7%
Music109666.7%
Parenting108562.5%
Outdoors105360%
Reference1411654.5%
Science1010550%
Wedding108450%
Automotive109444.4%
HomeGarden109444.4%
Fashion97342.9%
Social1010440%
Sports1010440%
Fitness1010440%
Photography1010440%
Genealogy1010440%
Jobs108337.5%
Travel99333.3%
Weather106233.3%
Beauty106233.3%
Legal107228.6%
RealEstate107228.6%
Pets107228.6%
Crafts108225%
Finance1211218.2%
Retail1512216.7%
Education97114.3%
Government98112.5%
Crypto98112.5%
Books98112.5%
Religion109111.1%
Insurance109111.1%
Productivity1010110%
Nonprofit10600%
Streaming101000%
Dating10500%

Insurance sits near the bottom of the block-rate range — tied with Religion at 11.1% and just above Productivity at 10%. Only three categories post a zero block rate across this entire corpus: Nonprofit, Streaming, and Dating.

The Operator-Level Picture Across All 354 Sites

Even though Insurance itself is nearly fully open, the broader corpus tells a different story. The table below shows how many of all 354 sites with parseable robots.txt files block each major AI operator — not Insurance alone, but the full 418-site snapshot. This corpus-wide view shows which operators face the most friction.

OperatorSites Blocking (all 354)
Common Crawl109
Anthropic104
Meta89
OpenAI87
ByteDance83
Google76
Perplexity74
Apple74
Cohere68
Amazon64
Diffbot64
Mistral24

Common Crawl and Anthropic lead as the most frequently blocked operators across all 354 sites, at 109 and 104 blocks respectively. OpenAI's GPTBot faces 87 blocks. These figures describe the full corpus — not Insurance specifically, where the pattern is nearly the inverse: one blocker, and almost universal openness.

For an operator like Anthropic, Insurance is a largely uncontested category for training-data access. The single robots.txt block at prudential.com is the only explicit barrier in this entire category.

How the Snapshot Was Sealed

This report is based on a point-in-time crawl of public robots.txt files conducted on June 14, 2026 and sealed as sha 27ca61d890a647db. The process is designed for reproducibility and honesty: nothing is estimated, modeled, or extrapolated.

  1. Collect. Each of the 418 sites in the corpus was fetched at its canonical robots.txt path. Only public, unauthenticated responses were captured.

  2. Parse. The fetched file was parsed for User-agent directives matching known AI crawler names. A site is marked as "blocking" only when its robots.txt contains a Disallow: / or equivalent broad disallow for at least one recognized AI bot token.

  3. Seal. The full result set was hashed and content-addressed, producing the sha 27ca61d890a647db that anchors every figure in this report. No figure has been updated or modified since sealing.

A site that returned no robots.txt — like allstate.com — is counted in the sites total but excluded from the withRobots denominator. Block rates are computed against the withRobots count, not the raw site count, so they describe what sites have actually published as access policy.

Frequently Asked Questions

Q: Why does prudential.com block AI crawlers while the other major carriers do not?

A: The sealed snapshot records the fact of the block, not the motivation. Prudential may have added a blanket AI-crawler disallow as a precautionary measure or in response to specific policy guidance — we cannot know from the robots.txt file alone. What the data shows is that eight of the nine carriers with parseable robots.txt files have not made the same choice, as of June 14, 2026.

Q: Does allstate.com have an AI-access policy?

A: In this snapshot, allstate.com returned no parseable robots.txt. By convention, the absence of a robots.txt means crawlers may proceed — but it also means the site has not published a formal directive through this mechanism. Whether Allstate has a separate contractual or technical policy on AI access is outside the scope of what a robots.txt snapshot can capture.

Q: Does being blocked in robots.txt actually stop AI crawlers from indexing a site?

A: robots.txt is an honor-system standard. Crawlers that respect the protocol — which includes most major AI systems that publish their bot tokens — will observe the disallow. Crawlers that do not follow the protocol can proceed regardless. A robots.txt block is an instruction, not a technical enforcement mechanism like a firewall.

Q: How does the Insurance block rate compare to the corpus average?

A: Insurance posts an 11.1% block rate. Across all 354 sites with a parseable robots.txt in this edition, 139 block at least one AI crawler — a 39.3% corpus-wide average. Insurance is substantially below that line. Finance, a closely related category, posts an 18.2% block rate with 2 of 11 sites blocking.

Q: Is a low block rate in Insurance likely to stay this way?

A: This snapshot is cross-sectional — it captures policy at one moment in time. Whether block rates rise or fall is a question this single edition cannot answer. The value of a sealed snapshot is precisely that it creates a fixed reference point: you can re-run the same crawl at a later date and compare like-for-like.

Put AI-Access Data to Work

Three types of professionals derive concrete recurring value from tracking Insurance AI-access policy over time.

AI/LLM product teams and data sourcing leads who use Insurance content for training or retrieval-augmented generation workflows need to know which sites are open and which have erected barriers. Right now, Insurance is nearly all open — but prudential.com's block is a real restriction for any team relying on that domain's content.

A recurring weekly or monthly re-crawl of these nine sites, set to alert on any new Disallow directive, gives a product team advance notice before a training-data source goes dark. US Tech Automations automates exactly this monitoring: scheduled robots.txt re-crawls, policy-change diffing, and routed alerts to the team or pipeline that owns the affected data source.

SEO and digital marketing teams at Insurance carriers increasingly track how AI crawlers treat competitor sites. If a competitor adds a Disallow for GPTBot or Google-Extended, that is a visible signal about competitive strategy — potentially opening more AI-surface for those who stay open. A quarterly policy-audit workflow, comparing current robots.txt states against this sealed baseline, keeps competitive-intelligence teams current without manual checks.

Compliance and AI-governance counsel advising Insurance companies need a defensible record of what access policy existed at a specific date. The sealed snapshot format — with sha 27ca61d890a647db anchoring the June 14, 2026 state — is exactly the kind of verifiable artifact that supports legal or regulatory review of AI training-data provenance.

For any of these workflows, US Tech Automations builds the scheduling, alerting, and change-detection pipelines that turn a one-time snapshot into a live monitoring system.

You can also compare Insurance against adjacent categories like Beauty and Outdoors to see how different consumer verticals approach AI-access policy.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 27ca61d890a647db).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Insurance Sites Block AI Crawlers? 1 of 9 Do.” https://ustechautomations.com/resources/blog/do-insurance-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 27ca61d890a647db

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.