Research & Data

Do Accounting Sites Block AI Crawlers? 4 of 8 Do

Jun 14, 2026

The Accounting category blocks AI crawlers at exactly double the corpus average. Of the 10 Accounting sites we checked, 8 returned a parseable robots.txt — and 4 of those 8 block at least one AI crawler, a 50% block rate. The corpus-wide rate across all 479 sites is 33.4%. Accounting sits a full tier above that line, and the composition of the 4 blockers tells the story: a major professional standards body, two trade publications, and a cloud accounting platform have all chosen to restrict AI access to their content.

A robots.txt file is the plain-text document operators publish to direct automated crawlers. It signals access policy for search engines, AI training bots, retrieval-augmented generation spiders, and web archivers. The standard is voluntary and honor-based — technically competent operators with training pipelines are expected to respect it, though no mechanism prevents non-compliance.

4 of 8 Accounting sites block at least one AI crawler.

Accounting sites post a 50% AI-crawler block rate.

Corpus-wide, 160 of 479 sites block at least one AI crawler.

Key Takeaways

4 of 8 Accounting sites with a parseable robots.txt block at least one AI crawler.

The Accounting block rate of 50% is substantially above the corpus-wide average of 33.4%.

aicpa-cima.com, journalofaccountancy.com, accountingtoday.com, and freshbooks.com each block at least one AI crawler.

  • Of 10 Accounting sites checked, 8 returned a parseable robots.txt; 4 of those 8 block at least one AI crawler.

  • The 4 blockers include a major professional standards body, two trade publications, and a cloud software platform.

  • intuit.com, taxfoundation.org, xero.com, and cpapracticeadvisor.com allow all tracked AI crawlers.

  • hrblock.com and waveapps.com returned no parseable robots.txt in this snapshot.

  • Across all 479 corpus sites, 102 publish an llms.txt file (21.3%) — a separate AI-access signaling standard.

Who Gates the Crawlers in Accounting

The 4 blocking sites are aicpa-cima.com, journalofaccountancy.com, accountingtoday.com, and freshbooks.com. The first three form a clear cluster: AICPA-CIMA is the world's largest accounting body, the Journal of Accountancy is its flagship publication, and Accounting Today is an independent trade outlet. All three carry authoritative professional content — CPE guidance, standards interpretations, regulatory updates, and practitioner-directed editorial — that has real commercial value when licensed and real competitive risk if scraped without restriction.

FreshBooks is the outlier. It is a cloud accounting software platform, not a publisher — and yet it blocks at least one AI crawler in this snapshot. This distinguishes it from its software peers in the allowerSites list and raises a question about whether the block is strategic (protecting customer-facing content, pricing pages, or help documentation from AI summarization) or legacy (a robots.txt rule inherited from an earlier crawl-blocking posture).

"4 of 8 Accounting sites with a parseable robots.txt block at least one AI crawler — a 50% rate that sits substantially above the corpus-wide 33.4%."

The 4 sites that allow all crawlers are intuit.com, taxfoundation.org, xero.com, and cpapracticeadvisor.com. Intuit and Xero are the two largest cloud accounting software platforms globally; both are permissive in this snapshot. The Tax Foundation is a nonprofit policy research organization that publishes openly to maximize policy influence. CPA Practice Advisor is a trade outlet that has chosen discoverability over restriction.

4 of 8 Accounting sites block at least one AI crawler — aicpa-cima.com, journalofaccountancy.com, accountingtoday.com, and freshbooks.com.

The two sites without a parseable robots.txt — hrblock.com and waveapps.com — returned no file in this snapshot. H&R Block is a large consumer tax-preparation brand, and Wave is a small-business accounting platform. Neither can be attributed a blocking or allowing stance from this data.

intuit.com, taxfoundation.org, xero.com, and cpapracticeadvisor.com allow all tracked AI crawlers in this snapshot.

What This Block Rate Actually Means

The 50% Accounting block rate is the most distinctive finding in this report. It ties Science and Wedding at 50%, just below the Reference category at 54.5%, and places Accounting well above the corpus average. This is not the pattern of a content-marketing-first sector. The Accounting category, as captured in this 10-site sample, is split between a powerful protective bloc — the professional standards establishment and trade press — and an equally significant permissive bloc — the cloud software platforms competing for practitioner discoverability.

The AICPA-CIMA block is particularly notable. Professional accounting standards bodies invest heavily in producing authoritative guidance documents, continuing education materials, and technical interpretations. Those assets are the kind of structured professional knowledge that AI training pipelines find valuable. The decision to restrict AI crawlers from that content is consistent with a licensing posture: the AICPA licenses content to CPE providers and educational institutions, and unrestricted AI training access would undercut that model.

The 50% Accounting block rate ties Science and Wedding and sits above the corpus average of 33.4%.

For a comparison category where the same editorial-vs.-platform tension has not yet produced blocking, HR sits at 22.2% — a much lower rate because its software platforms (Workday, BambooHR, ADP) are open and only two trade outlets restrict. Accounting has both the professional standards body and the trade press lining up alongside a software platform blocker, which pushes the rate to 50%.

Accounting Among the Corpus — Focused Neighbor Window

The focused window below shows Accounting alongside the categories closest to it in the block-rate ranking. Accounting shares the 50% tier with Science and Wedding, drawn from the sealed allCategoriesRanked data.

CategorySites CheckedSites with robots.txtSites Blocking Any AI CrawlerBlock Rate
Reference1411654.5%
Science1010550%
Wedding108450%
Accounting108450%
Automotive109444.4%
HomeGarden109444.4%
Fashion97342.9%
Social1010440%
Sports1010440%
Fitness1010440%

Accounting sits in the upper-middle band of the corpus — above the average but well below the high-blocking categories. The extremes offer context:

CategoryBlock Rate
Gaming88.9%
News82.4%
Food70%
Manufacturing0%
Construction0%
Toys0%

Accounting's 50% reflects the professional-content tension: not as extreme as News or Gaming, where nearly all content is proprietarily controlled, but meaningfully above the all-open industrial and toy sectors.

"Accounting's 50% block rate places it in the upper tier of the corpus — driven by a combination of professional standards bodies, trade media, and one software platform blocking AI crawlers."

Operator Landscape Across All 479 Sites

Since 4 Accounting sites block and 4 allow, the per-category bot targeting is worth watching — but because the sample is small, the corpus-wide leaderboard provides the most useful context for which bots face the most systematic resistance.

OperatorSites Blocking (all 479 corpus sites)Bot Token
Common Crawl124CCBot
Anthropic117ClaudeBot
OpenAI101GPTBot
Meta100Meta-ExternalAgent
ByteDance96Bytespider
Google83Google-Extended
Apple83Applebot-Extended
Perplexity76PerplexityBot
Amazon73Amazonbot
Cohere73
Diffbot70
Mistral24

Common Crawl leads at 124 sites across the full 479-site corpus. Anthropic's ClaudeBot follows closely at 117 — an unusually high share that reflects the wave of Anthropic-specific disallow rules added after 2023. OpenAI's GPTBot is next at 101.

Anthropic's ClaudeBot is blocked by 117 of 479 corpus sites — second only to Common Crawl in the June 2026 snapshot.

For Accounting sites specifically, a professional body like AICPA-CIMA is likely to target the highest-profile AI training crawlers first — the same bots that dominate the corpus-wide leaderboard. Any change in that behavior would be detectable through weekly robots.txt monitoring.

How the Snapshot Was Sealed

The Closing Web methodology is consistent across all 56 categories: US Tech Automations fetched robots.txt files for all 572 corpus sites in a single crawl, stored each file verbatim, and sealed the collection under sha256 hash 4e7c4a4a3c720f06 on June 14, 2026. A site was classified as blocking if any User-agent directive in its robots.txt matched one of the 9 defined AI crawler tokens and the accompanying Disallow rule covered the site root or full crawl scope.

Sites that returned no parseable file (hrblock.com and waveapps.com for Accounting) were listed in noRobotsSites and excluded from blocking calculations. nothing is estimated, modeled, or extrapolated — all figures are verbatim reads from the sealed file set.

  1. Collect. Fetch the robots.txt file from the domain root for each of the 572 corpus sites; store raw bytes verbatim without alteration.

  2. Parse. Apply the 9-token AI crawler recognition list (CCBot, ClaudeBot, GPTBot, Bytespider, Meta-ExternalAgent, Applebot-Extended, Google-Extended, PerplexityBot, Amazonbot); flag root-level Disallow directives as blocks.

  3. Seal and publish. Compute sha256 across the collected file set; record the hash and site-level results in the snapshot manifest to enable independent verification.

The llms.txt figure (102 of 479 sites, 21.3% corpus-wide) reflects a separate standard for AI-access signaling distinct from robots.txt and is tracked at the corpus level, not conflated with the category block rate.

Frequently Asked Questions

Q: Why does FreshBooks block AI crawlers when Intuit and Xero do not?

A: This snapshot records only the robots.txt signal, not the reasoning behind it. What we can observe is that FreshBooks, unlike Intuit and Xero, has published a directive blocking at least one of the 9 tracked AI crawlers. The most common explanations for a software platform taking this stance are protecting customer-facing documentation from AI summarization that could reduce support ticket deflection value, or a legacy disallow rule that was never revisited. Intuit and Xero appear to have chosen a discoverability-first posture for their public content.

Q: What does a 50% block rate mean for accountants who use AI tools?

A: It means that, at the time of this snapshot, half the Accounting sites in our corpus with published policies are restricting at least one AI training or retrieval crawler. Content from aicpa-cima.com, journalofaccountancy.com, accountingtoday.com, and freshbooks.com may be less likely to appear in AI-generated summaries or training sets that rely on robots.txt-compliant crawlers. Practitioners using AI tools for accounting guidance should be aware that key authoritative sources may be under-represented in AI knowledge bases.

Q: Does blocking a crawler in robots.txt actually stop it?

A: Not technically. robots.txt is an honor-system standard; a sophisticated or non-compliant crawler can ignore it. However, the major AI training operators — Anthropic, OpenAI, Google, Meta, and others — have stated they respect robots.txt disallow directives for their training crawlers. Retrieval-augmented-generation crawlers may or may not respect the standard depending on their operator policies. A robots.txt block is a clear signal, not a hard technical barrier.

Q: How is the Accounting category's 50% rate compared to a neighboring professional-services category?

A: HR sites block at 22.2% in this snapshot — less than half the Accounting rate. The difference lies in composition: HR's blockers are two trade editorial outlets, while Accounting's include a major professional standards body (AICPA-CIMA), two trade publications, and a software platform. The professional authority of AICPA-CIMA and the institutional standing of the Journal of Accountancy give the Accounting blocking bloc more weight than a comparable set of HR publications.

Q: Are hrblock.com and waveapps.com blocking AI crawlers?

A: We do not know from this data. Both returned no parseable robots.txt file in the snapshot and are listed in noRobotsSites. Their access posture cannot be determined from this signal. H&R Block and Wave may have restrictions in their terms of service or other mechanisms that are not captured in a robots.txt scan.

Q: Why do the 4 permissive Accounting sites not block, even though they publish professional content?

A: Intuit and Xero are software companies whose public-facing content — help articles, feature documentation, marketing pages — benefits from AI discoverability. The Tax Foundation publishes policy research explicitly to maximize reach and influence; open access aligns with its mission. CPA Practice Advisor appears to have made an editorial choice to prioritize discoverability over restriction. None of these sites treat their public web presence as a licensed-content asset in the same way that AICPA-CIMA or the Journal of Accountancy does.

Put AI-Access Data to Work

A tax-data product owner at an accounting software company — someone responsible for maintaining integrations with authoritative tax guidance sources for an AI-powered accounting assistant — faces a direct operational risk from the 4-blocker picture. The AICPA-CIMA and Journal of Accountancy blocks mean that content from the most authoritative U.S. accounting standards sources is restricted from training-data and retrieval crawlers that respect robots.txt.

The actionable workflow: set a weekly automated re-crawl of all 10 Accounting sites in this corpus, with an immediate alert if any currently permissive site (intuit.com, taxfoundation.org, xero.com, cpapracticeadvisor.com) adds a Disallow directive for a tracked AI crawler. The trigger is a file change; the cadence is weekly. A shift by intuit.com would be a significant signal affecting AI tools that rely on Intuit content for accounting guidance. Monitoring the 4 blockers for any increase in scope — additional bots being targeted — is equally valuable.

An AI product lead at a legal or professional-services intelligence platform building tools for accounting professionals needs to know which sources their retrieval system cannot reach via robots.txt-compliant crawling. The 4 blocking sites represent the most authoritative accounting content in the corpus. Without licensed access, those sources are off-limits for training pipelines that respect the standard. The workflow: treat the blockerSites list as a licensed-content procurement checklist; schedule quarterly reviews of those 4 sites for any change in their access posture.

A data-pipeline engineer building retrieval-augmented generation for accounting queries can use the allowerSites list — intuit.com, taxfoundation.org, xero.com, cpapracticeadvisor.com — as a confirmed-permissive base layer and plan licensed agreements for the 4 blockers. For a related category where permissive sites dominate, see do manufacturing sites block AI crawlers — a 0% rate that reflects a very different content-economics environment.

US Tech Automations automates robots.txt monitoring with scheduled crawls, change-diff alerting, and a per-category AI-access dashboard that fires an alert the moment any site in a watched category shifts its access posture — no manual checking required.

Build automated accounting content-access monitoring on the platform

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 4e7c4a4a3c720f06).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Accounting Sites Block AI Crawlers? 4 of 8 Do.” https://ustechautomations.com/resources/blog/do-accounting-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 4e7c4a4a3c720f06

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.