Research & Data

Do Banking Sites Block AI Crawlers? None Do

Jun 14, 2026

The headline is stark: of the 7 Banking sites we checked, every single one with a published robots.txt policy allows all AI crawlers without restriction. 0 of 7 Banking sites block any AI crawler — a 0% block rate. This is the June 2026 Closing Web edition, a point-in-time sealed snapshot of public robots.txt files across 493 sites and 48 categories. The numbers here are not estimates or projections — they are verbatim counts from files read on June 14, 2026.

Banking sits at the very bottom of the 48-category block-rate ranking, sharing a 0% position with Telecom and Energy. Against a corpus-wide block rate of 36% across all 417 sites with parseable policies, the banking sector's unanimous openness is not just an absence of action — it is a meaningful signal about how this regulated, consumer-facing industry views AI access to its public web presence.

0 of 7 Banking sites block any AI crawler.

Banking sites post a 0% AI-crawler block rate.

Corpus-wide, 150 of 417 sites block at least one AI crawler.

Key Takeaways

0 of 7 Banking sites block any AI crawler — a 0% block rate.
All 7 Banking sites returned a parseable robots.txt file.
Corpus-wide, 150 of 417 sites block at least one AI crawler — a 36% rate.
9 distinct AI crawlers were tracked across all 417 sites in this edition.
Banking is one of only 3 categories in the 48-category corpus with a 0% block rate.

Every Banking site we checked with a parseable robots.txt policy — citibank.com, capitalone.com, usbank.com, pnc.com, td.com, ally.com, and discover.com — allows all AI crawlers without restriction, as of the June 14, 2026 sealed snapshot.

Which Sites Are Open — and Why That Matters

The 7 Banking sites in this corpus are citibank.com, capitalone.com, usbank.com, pnc.com, td.com, ally.com, and discover.com. Not one of them disallows a single AI crawler in their robots.txt file. A robots.txt file is a plain-text instruction set placed at the root of a website that tells crawlers — search engines, AI training bots, and indexing agents — which paths they are and are not permitted to access. It is an honor-system standard: compliant bots respect it; non-compliant bots do not.

The fact that all 7 banking sites we checked publish permissive policies is notable for what it implies about their strategy. These are institutions operating under intense regulatory scrutiny, where disclosure norms and public transparency are deeply ingrained. Their public-facing marketing pages, rates, product descriptions, and support content are already indexed by traditional search engines — and there is no obvious benefit in selectively blocking AI agents from that same material.

There is also a practical dimension. Banks market aggressively through their public web presence. Blocking AI crawlers from that content would reduce the probability that AI-generated answers reference their products, rates, or services. In a competitive market where AI assistants increasingly surface financial recommendations, open access may be a deliberate distribution choice rather than an oversight.

Across all 417 sites in the corpus, 150 block at least one AI crawler — a 36% block rate. Banking sits far below that line, making it one of the most permissive categories in the entire corpus.

How Banking Sits in the Broader Corpus

The table below shows a focused window of categories centered on Banking's position at the lower end of the block-rate spectrum. The full corpus spans 48 categories; this window shows the categories that cluster nearest to Banking in the ranking, plus the extremes for context.

Focused Category Window — Low-Blocking Tier

Category	Sites Checked	With Robots	Blocking Any AI	Block Rate
Finance	12	11	2	18.2%
Retail	15	12	2	16.7%
Education	9	7	1	14.3%
Government	9	8	1	12.5%
Crypto	9	8	1	12.5%
Books	9	8	1	12.5%
Religion	10	9	1	11.1%
Insurance	10	9	1	11.1%
Productivity	10	10	1	10%
Marketing	10	10	1	10%
Banking	7	7	0	0%
Telecom	10	6	0	0%
Energy	10	6	0	0%

Extremes Mini-Table

Category	Block Rate
Gaming (highest)	88.9%
News (2nd highest)	82.4%
Banking (floor)	0%
Telecom (floor)	0%
Energy (floor)	0%

Notice that even Finance — the broader category that includes investment platforms and fintech services — sits at 18.2%. Banking specifically, with its pure retail and commercial bank representation, is lower still. The neighboring Insurance category reaches 11.1% even with its similarly regulated character. The permissiveness of banking institutions appears driven by their particular brand of public-web-forward marketing rather than simply by regulatory status.

Which AI Bots Are Blocked Most — Across All 417 Sites

This table reflects the corpus-wide bot leaderboard across all 417 sites with parseable robots.txt policies. No Banking site contributes to any of these counts.

AI Crawler	Sites Blocking It (all 417)	Block Rate
CCBot	118	28.3%
ClaudeBot	104	24.9%
GPTBot	93	22.3%
Bytespider	90	21.6%
Meta-ExternalAgent	84	20.1%
Applebot-Extended	81	19.4%
Google-Extended	81	19.4%
PerplexityBot	75	18%
Amazonbot	70	16.8%

CCBot — the crawler behind Common Crawl, which feeds many AI training pipelines — is blocked by 118 of 417 sites, the highest count among the 9 bots tracked. ClaudeBot and GPTBot follow closely. These numbers describe the broader corpus; in Banking, none of these bots encounter any disallow instruction at all.

Why Regulated Verticals Often Leave Robots.txt Open

The pattern across heavily regulated consumer-finance categories — Banking, Insurance, and Finance — is one of relative permissiveness compared to media, gaming, and technology verticals. Several structural reasons explain this.

Regulated financial institutions invest substantially in public disclosures. Their websites serve as authoritative, compliance-reviewed channels for rate information, terms, fee schedules, and product details. Restricting AI access to this content would limit its distribution without any corresponding compliance benefit, since the content is already public. These sites have every reason to want their official information to appear accurately in AI-generated summaries.

Unlike news publishers or entertainment platforms, banking sites do not depend on excluding scrapers to protect a content moat. Their proprietary advantage is their regulated charter, their customer relationships, and their balance sheet — not the informational text on their marketing pages. Open access to that marketing content serves their distribution goals.

You can compare this pattern to what we found in other verticals: the cybersecurity category sits at 11.1%, and agriculture at 33.3%. Both sit well above Banking's floor position.

What a Future Block Would Signal

The current 0% block rate is a snapshot, not a permanent condition. If any of the 7 banking sites in this corpus were to add AI crawler disallows to their robots.txt files in a future edition, that would be a meaningful event worth monitoring. A single site shifting from open to blocked — for example, adding a Disallow rule for GPTBot or ClaudeBot — would move Banking from its current clean-zero position to a non-zero block rate, making it the first Banking site in this corpus to restrict any AI crawler.

More importantly, a policy shift at a major bank tends to signal industry-wide deliberation about AI access. Compliance and legal teams at these institutions watch each other's posture carefully. A first mover that adds AI disallows might trigger a cluster of similar decisions across the sector. The reverse is also true: as long as none of these 7 sites has taken that step, the institutional consensus appears to remain in favor of open access.

For anyone monitoring competitive intelligence in the financial sector, tracking robots.txt policy drift is now a meaningful signal layer. You can see how similar clean-zero categories — telecom and energy — also currently sit at 0%, which may reflect a similar B2B and regulatory dynamic in those verticals.

Methodology — How the Snapshot Was Sealed

This report draws from the June 2026 Closing Web edition, a point-in-time sealed snapshot (sha c5960481aa465ad3) of public robots.txt files. The corpus covers 493 sites across 48 categories; 417 returned a parseable robots.txt. A site is counted as "blocking" if its robots.txt contains a Disallow directive targeting at least one of the 9 AI crawlers tracked in this edition.

The sealed-data discipline works as follows:

Collect. We fetch the public robots.txt file at the canonical root of each site. No authentication, no crawl of protected paths — only the publicly readable policy file.
Parse and match. Each file is parsed for User-agent directives matching the 9 tracked AI crawlers. A site is marked as blocking if at least one directive disallows any path for at least one tracked bot.
Seal. The full response set is content-hashed and sealed. The sha c5960481aa465ad3 uniquely identifies this exact dataset. The seal prevents retroactive modification.
Aggregate. Per-category and corpus-wide counts are computed directly from the sealed set. nothing is estimated, modeled, or extrapolated.

Site names in this report are drawn exclusively from the fact sheet's allowerSites, blockerSites, and noRobotsSites arrays. No site name appears in this report that is not present in those sealed lists.

Frequently Asked Questions

Q: Does a 0% block rate mean banking sites welcome AI training on their content?

A: Not necessarily. robots.txt is an honor-system standard — compliant bots respect it, and non-compliant ones do not. A 0% block rate means no Banking site in this corpus has published a robots.txt disallow for any tracked AI crawler. It says nothing about whether the institutions have taken other legal or technical steps to limit AI use of their content. It is a statement about their publicly observable policy posture, not their legal strategy.

Q: Why would a major bank leave its robots.txt open to AI crawlers?

A: Banks publish marketing content, rate information, and product descriptions specifically to reach as many channels as possible. AI-generated answers increasingly surface financial information to consumers. Blocking AI crawlers from this marketing-layer content could reduce a bank's presence in those answers. The public web presence of a bank is a distribution asset, not a content moat.

Q: Could the 0% rate change in the next edition?

A: Yes. This is a point-in-time snapshot from June 14, 2026. Any of the 7 banking sites could update their robots.txt file tomorrow, and the next edition of this research would reflect that. The value of this sealed snapshot is precisely that it provides an anchor: a verifiable baseline against which future drift can be measured.

Q: How does Banking compare to the broader corpus?

A: The corpus-wide block rate across all 417 sites with parseable robots.txt files is 36%. Banking at 0% is substantially below that, placing it at the permissive floor alongside Telecom and Energy. Even adjacent regulated categories like Insurance (11.1%) and Finance (18.2%) show higher block rates than Banking.

Q: Are there bots that Banking sites are more likely to block in future editions?

A: Based on corpus-wide patterns, CCBot (118 sites blocking it), ClaudeBot (104 sites), and GPTBot (93 sites) are the bots most likely to appear in a disallow directive when a site decides to add AI access controls. If Banking sites shift in a future edition, these would be the most likely candidates to appear in new directives — though this report can only confirm the current sealed state.

Put AI-Access Data to Work

The 0% Banking block rate is a baseline, and baselines are only valuable when you monitor them for drift. Three audiences have immediate operational use for this data.

Competitive intelligence analysts at financial services firms can use this snapshot to confirm the current industry posture and set a re-check cadence. The concrete workflow: flag any Banking site in this corpus — such as capitalone.com or ally.com — for a monthly robots.txt re-fetch. The trigger is any new Disallow directive targeting CCBot, ClaudeBot, GPTBot, or any other bot in the tracked set. A policy change at one institution often precedes industry-wide movement.

AI product and strategy teams building finance-adjacent tools need to know which sites their retrieval pipelines can lawfully access under honor-system norms. The current 0% rate means Banking sites present no robots.txt barrier today. A quarterly re-check workflow — pulling the robots.txt for each of the 7 sites and comparing against this sealed baseline — detects the moment that assumption changes.

MarTech and content syndication professionals who seed AI answers with financial content can use this data to prioritize which institutional domains are currently accessible to AI indexers. Monitoring whether discover.com or usbank.com shifts to blocking helps calibrate content strategy in real time.

US Tech Automations automates this monitoring through scheduled robots.txt crawls, change-alert pipelines, and a unified AI-access policy dashboard — so you know the moment any site in a tracked category moves from open to blocked. Explore the agentic workflows platform to set up policy-drift alerts across any category.

Zoom out: Banking is just one vertical in a much larger picture — our cross-industry study measures how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha c5960481aa465ad3).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Banking Sites Block AI Crawlers? None Do.” https://ustechautomations.com/resources/blog/do-banking-sites-block-ai-crawlers-2026

Sealed snapshot sha256: c5960481aa465ad3

Machine-readable data: CSV · JSON · All research & methodology