Research & Data

Do Religion Sites Block AI Crawlers? Sealed robots.txt Data

Jun 14, 2026

Religion is near the bottom of the 32-category ranking in the June 2026 Closing Web snapshot — and the reason is revealing. Of 10 Religion sites checked, 9 returned a parseable robots.txt; of those 9, only 1 blocks any AI crawler: christianitytoday.com. Every other site in the sample — biblegateway.com, crosswalk.com, catholic.org, beliefnet.com, desiringgod.org, churchofjesuschrist.org, chabad.org, and patheos.com — allows all 9 tracked AI crawler tokens without restriction.

That yields an 11.1% block rate, the second-lowest in the dataset. Only Nonprofit and Streaming register lower, both at 0%. The corpus average across all 293 sites is 42%; Religion lands far below that line.

The distinctive signal here is not that one site blocks — it is that eight mission-driven and community-oriented organizations explicitly do not. Many of these platforms host vast free libraries of scripture, commentary, devotional content, and interfaith dialogue. The apparent philosophy: maximum reach is the point. A crawl that brings scriptural or theological content into an AI response serves the stated mission of many of these organizations.

1 of 9 Religion sites block at least one AI crawler.

Religion sites carry a 11.1% AI-crawler block rate.

Across 293 sites, 123 block at least one AI crawler — a 42% rate.

Key Takeaways

1 of 9 Religion sites with a parseable robots.txt blocks any AI crawler — an 11.1% block rate.

Christianitytoday.com is the sole blocker; every other Religion site in the sample allows all AI crawlers.

The 11.1% Religion block rate sits well below the 42% corpus average across all 293 sites.

Gotquestions.org returned no parseable robots.txt file and is excluded from the block-rate denominator.

"1 of 9 Religion sites with a parseable robots.txt blocks any AI crawler — an 11.1% block rate as of June 14, 2026."

"Eight Religion sites — including biblegateway.com, catholic.org, and chabad.org — allow every tracked AI crawler without restriction."

What This Block Rate Actually Means for Religion

The 11.1% figure needs context to interpret correctly. Christianity Today is a media publication — it runs on journalism revenue, advertising, and subscriptions. Its robots.txt posture aligns with the News and media cluster (News at 82.4%) rather than the open-mission posture of its category peers. That it is the lone blocker in a category otherwise composed of scripture libraries, denominational organizations, and interfaith aggregators is not a religious contradiction; it is a business-model distinction.

Biblegateway.com hosts dozens of Bible translations in multiple languages and derives much of its value from wide dissemination. Blocking AI crawlers would work against that mission. Churchofjesuschrist.org and chabad.org operate as official organizational sites where broad access to doctrine, scripture, and community resources is central to the mission. Beliefnet.com and patheos.com are multifaith aggregator platforms whose content value increases with reach. Desiringgod.org distributes John Piper's teaching explicitly as a free resource. For every one of these, the robots.txt posture is strategically coherent with their stated purpose.

The one gap in the sample — gotquestions.org, which returned no parseable robots.txt — is notable only in that a site with no file has no declared policy. Well-behaved crawlers treat absent robots.txt as implicitly permissive.

This category pattern parallels what we see in streaming sites (0%) and crafts sites (25%): when the dominant sites in a category view AI-visible content as distribution rather than a threat to licensable assets, block rates fall.

How Religion Compares to the Full 32-Category Corpus

CategorySites CheckedWith robots.txtBlockingBlock Rate
Gaming99888.9%
News20171482.4%
Food1010770%
Tech1513969.2%
Entertainment99666.7%
Healthcare109666.7%
Music109666.7%
Parenting108562.5%
Reference1411654.5%
Science1010550%
Automotive109444.4%
HomeGarden109444.4%
Fashion97342.9%
Social1010440%
Sports1010440%
Fitness1010440%
Photography1010440%
Jobs108337.5%
Travel99333.3%
Weather106233.3%
Legal107228.6%
RealEstate107228.6%
Pets107228.6%
Crafts108225%
Finance1211218.2%
Retail1512216.7%
Education97114.3%
Government98112.5%
Crypto98112.5%
Religion109111.1%
Nonprofit10600%
Streaming101000%

Religion at 11.1% sits just above Government (12.5%), Crypto (12.5%), and Education (14.3%) — a cluster of categories that, like Religion, host large free-access public-good or community resources. At the other end, Gaming (88.9%) and News (82.4%) reflect sectors where text-and-IP content is the core product and scraping exposure is commercially immediate.

The contrast with photography sites (40%) is instructive: photography platforms protect commercially licensable images; religion platforms distribute content whose value is measured in reach, not restricted access.

The Corpus-Wide Bot and Operator Leaderboards

These figures span all 293 sites with parseable robots.txt files in this edition — not just the Religion category.

OperatorSites Blocking (of 293)
Common Crawl97
Anthropic93
Meta80
OpenAI77
ByteDance75
Perplexity69
Apple67
Google66
Cohere63
Diffbot60
Amazon56
Mistral23
Bot TokenSites Blocking (of 293)Block Rate
CCBot9733.1%
ClaudeBot8729.7%
Bytespider7525.6%
GPTBot7425.3%
Meta-ExternalAgent7023.9%
PerplexityBot6823.2%
Applebot-Extended6722.9%
Google-Extended6622.5%
Amazonbot5619.1%

Common Crawl leads the operator count at 97 sites blocked corpus-wide — a reflection of its deep history in AI training datasets and the corresponding tendency of publishers to name it first when adding restrictions. Mistral appears last at 23 sites, consistent with newer market entrants being less present in manually maintained robots.txt files.

Methodology — How the Data Was Sealed

The Closing Web snapshot was sealed June 14, 2026 (sha a5ca246fbdc79954). For each of the 339 domains in the 32-category corpus, the team fetched the robots.txt at the domain root and parsed every User-agent block. A site is counted as blocking if at least one of the 9 tracked AI crawler tokens appears in a disallow directive. A site that returns no parseable robots.txt is excluded from the block-rate denominator — which is why the Religion denominator is 9, not 10.

nothing is estimated, modeled, or extrapolated. Every count is a verbatim read from the sealed file set. The snapshot is cross-sectional: one point in time, June 14, 2026. No trend, growth, or change claims are valid from a single observation.

Robots.txt is an honor-system standard. A disallow directive is a public declaration of intent, not a technical barrier. Crawlers that respect the standard will avoid disallowed paths; crawlers that do not are not technically blocked. This data records the public policy signal each domain has chosen to publish.

Frequently Asked Questions

Q: Why does Religion have such a low block rate compared to other content categories?

A: The dominant sites in the Religion category are scripture libraries, denominational platforms, and interfaith aggregators whose mission centers on broad access and dissemination. Blocking AI crawlers would reduce the reach of content these organizations distribute intentionally as a free public resource. The one blocker — Christianity Today — is a journalism operation with a different business model, which is why it aligns with media-sector behavior rather than the open pattern of its category peers.

Q: Does gotquestions.org block AI crawlers?

A: Gotquestions.org returned no parseable robots.txt file in the June 14, 2026 snapshot. Without a robots.txt, no machine-readable AI-access policy is published. Well-behaved crawlers typically treat absent files as implicitly permissive by default, but this site has no declared position in the sealed data.

Q: How reliable is robots.txt as an indicator of AI-access policy?

A: Robots.txt is the most widely recognized public declaration of crawler policy and is the only one this research measures. However, it is an honor system — technical enforcement requires additional mechanisms. Some sites supplement robots.txt with terms of service, API gating, or legal action. This data captures the declared robots.txt posture only; it does not represent the full enforcement picture.

Q: Could a site like Biblegateway add a disallow directive after this snapshot?

A: Yes. Any domain owner can update their robots.txt at any time. This snapshot is a sealed point-in-time read from June 14, 2026. The 11.1% Religion block rate reflects the state on that specific date. Monitoring for changes requires re-running the crawl at a later date and comparing the two reads.

Q: How does the Religion block rate compare to similar community-content categories?

A: Religion (11.1%) is among the lowest in the corpus, comparable to Government (12.5%) and Crypto (12.5%), and above only Nonprofit (0%) and Streaming (0%). All of these are categories where reach and public access are central to the dominant players' operating model. By contrast, categories built on commercial IP — News (82.4%), Gaming (88.9%), Food (70%) — have much higher block rates.

Put AI-Access Data to Work

The Religion category data is actionable for three distinct practitioner types, each with a concrete recurring workflow.

An SEO or content-strategy lead at a faith-based media brand or publisher needs to understand the AI-access landscape of their editorial peers. Christianity Today has placed AI-crawler restrictions; the eight other sites in the sample have not. If your content team is writing scriptural commentary, devotional content, or theological analysis, knowing which peer domains are visible in AI-generated answers (and which are not) shapes your content targeting. A useful workflow: re-run the Religion category check weekly; flag the moment any open peer site adds a disallow — that changes the competitive landscape for AI-surface content distribution.

A publisher RevOps lead at a denominational or interfaith platform should audit their own robots.txt monthly against the operator leaderboard. Common Crawl leads at 97 corpus-wide blocks; Anthropic follows at 93. If the organization has a policy stance on any specific operator, the robots.txt should reflect that stance explicitly. A silent gap between declared policy and the published file is a recurring operational risk that a monthly audit catches.

A retrieval or data-pipeline engineer building a theology or scriptural knowledge base has the most straightforward workflow: the Religion category currently provides 8 confirmed allow sites as crawl-permitted sources. Maintain a domain allowlist from this snapshot; verify each domain weekly against its current robots.txt; remove any domain that adds a disallow entry before the next indexing run. The trigger is a per-domain robots.txt change; the cadence is weekly.

US Tech Automations automates all three workflows — scheduled domain-level fetches, per-operator change detection, and policy-drift alerting — so teams are notified the moment any Religion site shifts its declared stance rather than discovering the change during a periodic manual review. Visit the agentic workflows platform to see how that monitoring pipeline is configured.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha a5ca246fbdc79954).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Religion Sites Block AI Crawlers? Sealed robots.txt Data.” https://ustechautomations.com/resources/blog/do-religion-sites-block-ai-crawlers-2026

Sealed snapshot sha256: a5ca246fbdc79954

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.