Research & Data

Do Scuba Diving Sites Block AI Crawlers? 3 of 10 Do

Jun 14, 2026

Scuba diving is one of the few hobby verticals that lands almost exactly on the corpus average. Of the 10 Scuba Diving sites in this edition, all 10 returned a parseable robots.txt, and 3 of them disallow at least one AI crawler — a 30% block rate. That is close enough to the 31.1% figure across the whole snapshot that scuba reads as an ordinary slice rather than an outlier, which is itself a useful finding for anyone deciding whether the category is gating its content.

This report is built from a sealed snapshot, not a live query. On 14 June 2026 we fetched each site's public robots.txt, hashed it, and froze the result under snapshot sha 92ed5cd2858657d9. A robots.txt file is the plain-text rule sheet a site uses to tell crawlers which paths they may fetch. Every number here is read verbatim from the frozen files.

3 of 10 Scuba Diving sites block at least one AI crawler.

Reading the Sealed Numbers for Scuba

Scuba is unusual in one quiet way: every site we checked published a parseable robots.txt. There is no missing-file ambiguity in this category — all 10 sites stated a rule, and 3 of them chose to disallow an AI crawler.

The three blockers are divein.com, deeperblue.com, and divessi.com. The named allowers are scubadiving.com, padi.com, dan.org, scuba.com, leisurepro.com, girlsthatscuba.com, and scubadivermag.com. Notably, the largest training and certification authorities in the set — padi.com, dan.org, and the magazine scubadiving.com — all allow every crawler, while the blockers skew toward independent editorial and community-review sites.

Of the 10 Scuba Diving sites checked, all 10 published a parseable robots.txt and 3 disallow at least one AI crawler.

That every site has a published policy makes scuba a clean category to monitor — there is no gap to interpret. For a contrast where a site returned nothing, see the homebrewing report, where two domains had no parseable file.

Scuba Diving Site	AI Crawler Stance
divein.com	Blocks at least one AI crawler
deeperblue.com	Blocks at least one AI crawler
divessi.com	Blocks at least one AI crawler
padi.com	Allows all AI crawlers
dan.org	Allows all AI crawlers
scubadiving.com	Allows all AI crawlers
scuba.com	Allows all AI crawlers
girlsthatscuba.com	Allows all AI crawlers

What This Block Rate Actually Means

At 30%, scuba diving sits just under the corpus line. Across all 743 sites with a parseable robots.txt, 231 block at least one AI crawler — a 31.1% rate. Scuba is one of the rare categories that essentially matches the average, neither retreating from crawlers like the news and gaming brands nor leaving the gate fully open like the most permissive hobby verticals.

Scuba Diving sites post a 30% AI-crawler block rate.

The split has a logic. Certification bodies and the dive-shop retailers want their authoritative content surfaced — being the cited answer to "is nitrox worth it" is a marketing win for padi.com or scuba.com. The blockers are the independent review and forum properties, where original editorial is the asset most worth fencing off from AI training. Guard against over-reading this: 30% is a genuinely middle-of-the-pack rate, and the honest takeaway is that scuba is a stable, predictable category, not one in organized retreat.

Where Scuba Diving Sits Among Its Neighbors

The focused window below centers scuba among the categories blocking at nearly the same rate — its nearest neighbors in the ranking.

Category	Sites	With robots.txt	Block	Block Rate
Travel	9	9	3	33.3%
Agriculture	10	9	3	33.3%
Wine	10	9	3	33.3%
Yoga	10	10	3	30%
Scuba	10	10	3	30%
Legal	10	7	2	28.6%
RealEstate	10	7	2	28.6%
Pets	10	7	2	28.6%
Chess	10	7	2	28.6%

The extremes frame how moderate scuba's posture is. Gaming blocks 8 of 9 and Food 7 of 10 at the top; at the bottom, Drones blocks 0 of 9.

Extreme	Sites	With robots.txt	Block	Block Rate
Gaming (highest)	9	9	8	88.9%
Food	10	10	7	70%
Drones (lowest with robots)	10	9	0	0%

The neighboring chess report sits one notch lower at 28.6%.

Which Bots Are Blocked Most Across the Corpus

When a scuba site like divein.com or divessi.com names an AI user-agent to disallow, which bots appear most often corpus-wide? The leaderboard below shows a focused top-five cut across all 743 sites.

Bot	Sites Blocking (all 743 sites)
CCBot	169
ClaudeBot	147
GPTBot	145
Bytespider	142
Meta-ExternalAgent	125

Across all 743 sites, CCBot is the most-blocked individual bot, named in 169 disallow lists.

CCBot tops the list because it feeds Common Crawl, the dataset many downstream models train on, so one disallow line there has outsized reach. Of the 743 sites corpus-wide, 171 also publish an llms.txt file (23%) — a newer AI-specific manifest distinct from robots.txt.

The bot leaderboard sharpens what a scuba blocker is actually doing. When divein.com, deeperblue.com, or divessi.com disallows a crawler, it is most likely naming one of these five — CCBot, ClaudeBot, GPTBot, Bytespider, or Meta-ExternalAgent — because those are the user-agents site owners hear about most. A category at the corpus average like scuba shows a real but limited version of this behavior: a minority of independent properties opt out while the institutions stay open.

That mix is useful to anyone assembling a scuba content index. The certification and retail authorities — padi.com, dan.org, scuba.com — remain fully fetchable, so the most trustworthy answers to safety, gear, and certification questions stay available to answer engines. The three blockers narrow the pool of independent review content but do not touch the category's backbone sources.

How Scuba Compares to Its Hobby Peers

Scuba's middle-of-the-pack posture stands out precisely because so many adjacent hobby verticals are more permissive. Where skateboarding blocks just 11.1% and hunting 10%, scuba's 30% reflects a category with a slightly more defensive editorial wing. The difference is the independent dive-review and forum-style properties, which behave more like the news end of the spectrum than the gear end.

Yet the contrast with the truly guarded categories is still stark. Gaming sits at 88.9% and News at 82.4% — those are verticals where blocking is the default, not the exception. Scuba's three blockers are a minority within a category that otherwise leans open, which is why it lands almost exactly on the 31.1% corpus line rather than near either pole. The honest framing is that scuba is an average category, and average is the finding.

Corpus-wide, 231 of 743 sites block at least one AI crawler.

How the Snapshot Was Sealed

We assembled a list of Scuba Diving domains, fetched each site's /robots.txt over HTTP on 14 June 2026, and parsed every User-agent and Disallow directive for known AI crawler tokens. A site counts as a blocker if it disallows at least one AI user-agent on any path. The result was hashed and frozen under sha 92ed5cd2858657d9; nothing is estimated, modeled, or extrapolated. This sealed-snapshot method is the core of the US Tech Automations Closing Web edition.

A point-in-time read cannot capture intent that changes the next day, which is why the discipline matters: re-reading the same files over time is what turns a single count into a trend.

It is worth being precise about scope. The snapshot records what each site published — the named user-agents, the disallowed paths, and the presence of a parseable file — and nothing beyond it. It does not judge why padi.com stays open or why divein.com blocks, does not forecast either choice, and does not verify crawler compliance. The 30% figure is a verbatim count of stated intent across ten readable files. That every scuba site published a policy makes this one of the cleaner categories to audit, since there is no missing-file ambiguity to reason around.

Frequently Asked Questions

Q: Which Scuba Diving sites block AI crawlers?

A: Three: divein.com, deeperblue.com, and divessi.com. They skew toward independent editorial and community-review properties. The certification and retail authorities — padi.com, dan.org, scuba.com — allow every crawler in this snapshot.

Q: Why do PADI and DAN leave their content open?

A: Authority sites benefit from being cited. Being the surfaced answer to a safety or certification question is a marketing and trust win, so padi.com and dan.org allow every AI crawler rather than fencing their content off.

Q: Is 30% high or low for a hobby category?

A: It is right at the corpus average of 31.1% (231 of 743 sites). Scuba is one of the few categories that essentially matches the line — more guarded than skateboarding's 11.1%, far more open than gaming's 88.9%.

Q: Does every Scuba Diving site have a published policy?

A: Yes. All 10 Scuba Diving sites returned a parseable robots.txt, so there is no missing-file ambiguity — a rarer trait than it sounds, since several other categories had domains that published nothing at all.

Q: Does a robots.txt block guarantee a bot stays out?

A: No — the standard is advisory. robots.txt records a site's stated preference, and compliant AI crawlers honor it, but the file cannot itself enforce a fetch boundary. That published intent is what a sealed snapshot captures.

Put AI-Access Data to Work

A dive-travel booking-platform product manager at a site like scuba.com or girlsthatscuba.com should treat this as a recurring signal: re-crawl the top scuba domains weekly and alert the moment a review property such as divein.com or deeperblue.com changes its AI-crawler stance. If the independent reviewers keep blocking while the booking and certification sites stay open, the open sites become the citable sources when a traveler asks an answer engine "best liveaboard for beginners" — a discoverability edge worth tracking.

A second ICP, a publisher RevOps analyst at a dive-media brand, can watch whether peers follow divessi.com before setting their own policy. A third, an AI retrieval engineer indexing outdoor and travel content, can use the named allowers as a vetted ingestion set and re-validate it on a fixed cadence so a source that flips to blocking is caught.

US Tech Automations automates this with scheduled robots.txt and llms.txt crawls, change alerts, and an AI-access policy dashboard. Configure the cadence on our agentic workflow platform. For an adjacent slice, the skateboarding report shows a far more permissive hobby vertical.

Key Takeaways

Of 10 Scuba Diving sites with a parseable robots.txt, 3 block at least one AI crawler — a 30% rate.
The blockers are divein.com, deeperblue.com, and divessi.com; padi.com, dan.org, and scuba.com allow every crawler.
Scuba is one of the few categories that lands essentially on the corpus average of 31.1% (231 of 743 sites).
All 10 Scuba Diving sites published a parseable robots.txt — no missing-file ambiguity.
Corpus-wide, CCBot is the most-blocked bot at 169 sites; ClaudeBot follows at 147.

Zoom out: Scuba Diving is just one vertical in a much larger picture — our cross-industry study measures how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 92ed5cd2858657d9).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Scuba Diving Sites Block AI Crawlers? 3 of 10 Do.” https://ustechautomations.com/resources/blog/do-scuba-diving-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 92ed5cd2858657d9

Machine-readable data: CSV · JSON · All research & methodology