Research & Data

Do Surfing Sites Block AI Crawlers? 3 of 7 Do

Jun 14, 2026

Surf forecasting lives or dies on data — buoy readings, swell models, spot reports — and that same data is exactly what AI systems want to ingest. So we checked: of the surfing sites we track, how many tell AI crawlers they are not welcome? The sealed answer is that most leave the door open, but the gates that exist sit on the most valuable real estate in the sport.

Of the 10 Surfing sites we checked, 7 returned a parseable robots.txt, and 3 of those block at least one AI crawler — a 42.9% block rate. A robots.txt is the root-level text file where a site declares which automated agents may fetch its pages, and surfing's posture lands just above the corpus baseline rather than at either extreme.

3 of 7 Surfing sites block at least one AI crawler.

Everything here comes from a sha256-sealed snapshot of public robots.txt files captured on 14 June 2026 (snapshot sha c60e706824d5d127). The figures are direct reads of those files; nothing is estimated, modeled, or extrapolated.

The Sites Drawing the Line

Three named surfing sites account for the entire block rate: surfline.com, worldsurfleague.com, and beachgrit.com each disallow at least one AI crawler in their published robots.txt. That trio is the commercial core of the sport — a forecasting platform, the pro tour's official property, and a high-traffic editorial outlet. Where surfing gates, it gates around proprietary forecasts, live event coverage, and original reporting.

Four sites returned a robots.txt and allow every crawler through: theinertia.com, surfer.com, stabmag.com, and surfsimply.com. These are publishers and a coaching-content site that gain visibility when AI tools quote them, so an open policy works in their favor today.

Of the 7 Surfing sites with a published policy, 3 block at least one AI crawler while the others allow all of them.

Three more sites — surftoday.com, swellnet.com, and coastalwatch.com — returned no parseable robots.txt. An absent file is not a block: the honor-system default leaves a site readable to any crawler that asks. Running shows the same split-down-the-middle shape; compare it in our running sites report.

Surfing Site	AI Crawler Posture
surfline.com	Blocks at least one AI crawler
worldsurfleague.com	Blocks at least one AI crawler
beachgrit.com	Blocks at least one AI crawler
theinertia.com	Allows all AI crawlers
surfer.com	Allows all AI crawlers
stabmag.com	Allows all AI crawlers
surfsimply.com	Allows all AI crawlers

What This Block Rate Actually Means

A 42.9% rate tells you a surf-focused AI assistant can read most of the sport's web but is shut out of forecasting and pro-tour coverage specifically. That is the opposite of random: the blockers protect the data and events that cost money to produce, while the open sites trade reach for citations. For a forecasting business, training data is the moat — so surfline.com gating crawlers is a rational defense, not an accident.

Set against the whole snapshot, surfing runs a touch hot. Across all sites, 220 of 670 block at least one AI crawler, a 32.8% corpus rate, and surfing's 42.9% sits clearly above that line.

Surfing sites post a 42.9% AI-crawler block rate.

Look closer and the three blockers are not interchangeable. surfline.com is a data company first; its swell models and forecasts are the paid product, and feeding them into a free AI answer would undercut the subscription. worldsurfleague.com owns live competition coverage and athlete content that has commercial rights attached. beachgrit.com is editorial with a distinct voice that loses value when paraphrased without attribution.

The open sites — theinertia.com, surfer.com, stabmag.com, surfsimply.com — are reach-driven publishers and instruction content that wins when an assistant points a curious surfer their way. The split inside surfing is a clean map of who sells data versus who sells attention. Climbing pushes the data-protective instinct much further; see our climbing sites report for a vertical that gates the majority.

Across all 670 sites in the snapshot, 152 publish an llms.txt file — a 22.7% adoption rate for the newer AI-policy standard.

Where Surfing Sits in the Corpus

Surfing does not stand alone at 42.9%; it ties Running and Fashion exactly. Just above sit a 44.4% band — Birding, Watches, HomeGarden, and Automotive — and just below, at 40%, are Social, Sports, Fitness, and Photography. Surfing reads like the consumer-interest verticals it keeps company with: protective of its best assets, open with the rest.

The focused window centers surfing among the categories nearest it in the ranking.

Category	Sites With robots.txt	Block at Least One Crawler	Block Rate
Watches	9	4	44.4%
HomeGarden	9	4	44.4%
Automotive	9	4	44.4%
Fashion	7	3	42.9%
Running	7	3	42.9%
Surfing	7	3	42.9%
Social	10	4	40%
Sports	10	4	40%
Fitness	10	4	40%
Photography	10	4	40%

At the corpus extremes, Gaming gates the hardest at 88.9% and News follows at 82.4%, while Vinyl Record sits at 0%. The quilting vertical lands near surfing in the broad consumer-craft middle; see our quilting sites report for that comparison.

Which Bots Are Blocked Most

When a surfing site does gate, which agents does it name? The corpus-wide bot leaderboard answers that. CCBot, Common Crawl's harvester, is disallowed by 162 sites — the most of any bot — with ClaudeBot and GPTBot close behind. A surf site adding its first disallow line will almost certainly start with these.

The focused bot cut below counts disallows across all 670 sites.

Bot	Sites That Disallow It (all 670 sites)
CCBot	162
ClaudeBot	141
GPTBot	139
Bytespider	133
Meta-ExternalAgent	119

The surfing blockers — surfline.com, worldsurfleague.com, beachgrit.com — track this hierarchy: when a forecasting or tour site draws a boundary, it draws it against the same top crawlers leading the corpus.

The newer llms.txt standard is the quieter half of the story. Across all 670 sites, 152 publish one — a 22.7% adoption rate for the convention that lets a site describe its content and terms directly to large language models, rather than simply allowing or disallowing a fetch. For a forecasting business, llms.txt could eventually carry the nuance robots.txt cannot: read the editorial, leave the proprietary models alone. For now it is the minority approach, and surfing's stance still lives almost entirely in the allow-or-disallow lines of robots.txt.

How the Snapshot Was Sealed

We requested robots.txt from each site's root, parsed every User-agent and Disallow rule, and matched the agents against a fixed roster of known AI crawlers. One disallowed AI agent is enough to count a site as a blocker. The complete file set was hashed into the sha256 fingerprint c60e706824d5d127 on 14 June 2026, so every count is independently re-checkable against the frozen snapshot — nothing is estimated, modeled, or extrapolated.

Read the coverage honestly. Of 10 Surfing sites, 7 returned a parseable robots.txt; surftoday.com, swellnet.com, and coastalwatch.com returned none and are reported as no-policy rather than counted in either column. US Tech Automations applies the identical method to every category in this edition.

One definitional point keeps the counts honest: a surfing site registers as a blocker the instant its robots.txt disallows a single recognized AI agent, regardless of how many others it permits. surfline.com gating one operator and worldsurfleague.com gating several both land in the same column, because the measure captures the binary decision to draw any line at all.

That is deliberate — the question most readers bring to this report is whether a site has decided to gate, not the exact perimeter of its rules. The finer detail, which named agents each site disallows, is preserved verbatim in the sealed file and can be reconstructed from the same frozen snapshot without re-querying the live web.

Corpus-wide, 220 of 670 sites block at least one AI crawler.

Key Takeaways

Of 7 Surfing sites with a parseable robots.txt, 3 block at least one AI crawler — a 42.9% block rate.
The named blockers are surfline.com, worldsurfleague.com, and beachgrit.com; open sites include theinertia.com and surfer.com.
Surfing runs above the 32.8% corpus rate, tied with Running and Fashion at 42.9%.
Corpus-wide, 220 of 670 sites block at least one AI crawler, and CCBot leads the bot list at 162 sites.
Three Surfing sites returned no robots.txt and are reported as no-policy, not as blockers.

Frequently Asked Questions

Q: If surfline.com blocks a crawler, can it really keep AI out?

A: Not technically. robots.txt is an honor-system file: cooperating AI crawlers read it and stand down, but it cannot physically block a request. When surfline.com disallows an agent, well-behaved operators honor it — the file is a posted request, not a locked gate.

Q: Why do the surfing blockers skew toward forecasting and the pro tour?

A: surfline.com, worldsurfleague.com, and beachgrit.com own the sport's costliest assets — swell models, live event coverage, original reporting. Gating protects that. Publishers like theinertia.com and surfer.com instead want to be quoted, so an open policy serves their reach.

Q: What does it mean that surftoday.com and swellnet.com returned no robots.txt?

A: A missing file is not a block. By default an absent robots.txt leaves a site open to any crawler that asks. We report surftoday.com, swellnet.com, and coastalwatch.com as no-policy because there is no sealed rule to read — we never read a block into silence.

Q: How does surfing's 42.9% compare with neighboring categories?

A: It is above the 32.8% corpus rate and ties Running and Fashion exactly. Surfing sits just under the 44.4% band of Birding and Automotive and just above the 40% group of Social and Sports — squarely among consumer-interest verticals.

For anyone working in surf media or surf tech, the takeaway is that AI access in this vertical is a moving boundary, not a settled fact. The forecasting and pro-tour properties are gating today; the publishers are open today; and the three no-policy sites could land in either column the moment they write a rule.

A single snapshot answers the question "where do things stand now," but the decision that actually affects a business — whether to be a cited source, a licensed feed, or a fenced-off one — is one each site can revise at any time. That is why a point-in-time count is the anchor and ongoing monitoring is the product.

Put AI-Access Data to Work

A surf-forecast SaaS product manager should run this as a standing job: re-crawl surfline.com and the surfing set weekly and alert the instant a rival forecaster adds GPTBot or CCBot to its disallow list — early evidence a competitor is pulling its swell data out of AI answers and a chance to become the cited source. A surf-media partnerships lead can watch whether worldsurfleague.com and beachgrit.com tighten policy before negotiating content syndication. A retrieval-AI engineer building an outdoor-sports assistant needs the same feed to know which surf sources are open to index versus disallowed right now.

US Tech Automations automates that monitoring with scheduled robots.txt and llms.txt crawls, change alerts, and an AI-access policy dashboard that surfaces drift the day it lands. See it run inside our agentic workflows platform.

Curious how Surfing sites compare across every vertical? Our flagship study tracks how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha c60e706824d5d127).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Surfing Sites Block AI Crawlers? 3 of 7 Do.” https://ustechautomations.com/resources/blog/do-surfing-sites-block-ai-crawlers-2026

Sealed snapshot sha256: c60e706824d5d127

Machine-readable data: CSV · JSON · All research & methodology