Research & Data

Do Mycology Sites Block AI Crawlers? 2 of 10 Do

Jun 14, 2026

Mycology is one of the more open verticals in this edition. Every one of the 10 mushroom and fungi sites we checked returned a parseable robots.txt file, and only 2 of them ask any AI crawler to stay out. That works out to a 20% block rate — comfortably below the corpus as a whole and a long way from the heavily gated news and gaming categories.

The distinctive fact here is the coverage: there are no missing-policy sites in this slice at all. Where most categories have at least one site that published nothing, every mushroom site we checked had a robots.txt we could read. A robots.txt is the plain-text file a site publishes to tell automated crawlers which paths they may fetch, and in Mycology, almost all of those files leave the door open.

2 of 10 Mycology sites block at least one AI crawler.

Which Mushroom Sites Are Blocking — and Which Are Not

Only two sites carry the block. mushroomobserver.org, a community observation database, and shroomery.org, a long-running forum, are the two that disallow at least one AI crawler. Both are archive-and-discussion sites whose value is the accumulated, user-generated record — exactly the kind of corpus a site has reason to fence off from bulk model training.

Everything else stays open. mushroomexpert.com, namyco.org, fungi.com, mushroomappreciation.com, foragerchef.com, freshcap.com, northspore.com, and fieldforest.net all returned a robots.txt that places no restriction on AI crawlers. That group spans identification references, a society, foraging guides, and grow-supply retailers — a broad mix that has not moved to gate the model trainers.

There were no sites in this category without a policy file, so coverage here is complete: 10 sites checked, 10 readable.

A word on what "block" counts as here. We mark a site as a blocker only if its robots.txt names at least one AI crawler or AI operator in a disallow rule. Restricting an ordinary search bot does not qualify; disallowing even a single AI agent does. So the 2 Mycology blockers made a deliberate, recorded choice about machine-learning access, while the 8 allowers either welcome AI crawlers or have simply never added a rule against them — a distinction the raw count cannot separate, and one a future snapshot would expose if any of them changed course.

Mycology Site	AI-Crawler Posture
mushroomobserver.org	Blocks at least one
shroomery.org	Blocks at least one
mushroomexpert.com	Allows all
namyco.org	Allows all
fungi.com	Allows all
mushroomappreciation.com	Allows all
foragerchef.com	Allows all
freshcap.com	Allows all
northspore.com	Allows all
fieldforest.net	Allows all

All 10 Mycology sites returned a parseable robots.txt, and only 2 of them block at least one AI crawler.

Where This Sits in the Corpus

A 20% rate places Mycology in the low-blocking tier. It shares the figure with a band of media and craft categories, with the slightly-higher 22.2% cluster just above it and the 18.2% cluster just below. For a sense of how a closely related rock-and-fossil hobby compares, the Rockhounding AI-access report sits one rung up at a marginally higher rate.

Mycology sites post a 20% AI-crawler block rate.

Category	Sites	With robots.txt	Block ≥1	Block Rate
Rockhounding	10	9	2	22.2%
Podcasts	10	10	2	20%
Tattoo	10	5	1	20%
Printing3D	10	10	2	20%
Sewing	10	10	2	20%
Mycology	10	10	2	20%
Finance	12	11	2	18.2%
Retail	15	12	2	16.7%
ReefKeeping	9	6	1	16.7%

To frame the floor and ceiling of the full 112-category set, the extremes are stark: the top categories gate most of their sites while several block none at all.

Category	Block Rate
Gaming	88.9%
News	82.4%
Streaming	0%
Pickleball	0%

Of the 10 Mycology sites checked, all 10 returned a parseable robots.txt — complete coverage with no missing-policy state.

The neighbor window is the right lens for a 20% figure. Mycology shares the rate with Podcasts, Printing3D, and Sewing — media and maker categories whose value also leans on being found — and sits just above the 18.2% Finance band and just below the 22.2% Rockhounding band. That places mushroom sites firmly in the low-gating tier, among verticals where openness is the norm and a block is the exception rather than the posture.

Why Mycology Lands Where It Does

Across all 934 sites with a published policy, 277 — 29.7% — block at least one AI crawler. Mycology's 20% sits below that line, so the typical mushroom site gates less than the typical site overall. The category is largely informational: identification guides, foraging content, and grow-supply storefronts generally want maximum discoverability, including inside AI answers, because visibility drives both authority and sales.

The two exceptions fit the pattern that holds across the corpus. The blockers are the community-data sites, where the accumulated observation and forum record is the asset worth protecting. A read on a more open hobby still is in the Reef Keeping AI-access report, where blocking is rarer still.

Corpus-wide, 277 of 934 sites block at least one AI crawler.

The low rate carries a real implication for how mushroom knowledge flows into AI answers. Because eight of ten sites stay open, the bulk of the category's identification guidance, foraging advice, and grow-supply information remains available to the models that increasingly field these questions directly.

For a hobby where a misidentification can be dangerous, that openness cuts both ways: it keeps authoritative sources in the training pool, but it also means the two cautious community archives — the ones with the deepest user-contributed records — are the exceptions choosing to hold their data back. A sealed baseline like this one is what makes it possible to notice if that balance ever tips.

Which Bots Are Blocked Most

When a Mycology site does block, it picks from the same crawler menu as every other category. Measured across all 934 sites, the bot leaderboard is led by the bulk-crawl agents rather than the live retrieval bots.

Bot	Sites Blocking (all 934)
CCBot	204
ClaudeBot	181
GPTBot	181
Bytespider	175
Meta-ExternalAgent	155

CCBot — Common Crawl's agent — leads, which tracks with the broader finding that sites gate the training crawlers more often than the answer engines. The two Mycology blockers follow that same logic: they are protecting a corpus, not chasing a specific operator. A higher-blocking hobby that disallows the same bots far more aggressively is the Radio Control AI-access report, where half the sites gate.

The openness of the eight allowers is the category's defining feature, and it is worth reading their composition. mushroomexpert.com and namyco.org trade on authority — being cited, summarized, and surfaced is the whole point — while foragerchef.com and mushroomappreciation.com are content properties that grow by reach.

northspore.com, freshcap.com, fungi.com, and fieldforest.net are grow-supply retailers, and for a storefront an AI answer that names your product is closer to a free listing than a leak. None of them has a clear incentive to fence content off, which is why the category's block rate stays low and concentrated in the two community-data archives.

Frequently Asked Questions

Q: Why does Mycology have such a low block rate?

A: Most mushroom sites are informational or commercial — identification references, foraging guides, and grow-supply shops — and that kind of site usually wants maximum reach, including inside AI answers. Only 2 of 10 sites block any crawler, and both are community-data archives. Discoverability tends to outweigh the impulse to fence content off here.

Q: Which Mycology sites block AI crawlers?

A: Two: mushroomobserver.org, an observation database, and shroomery.org, a forum. Both are user-generated archives whose value is the accumulated record, which gives them a clearer reason to gate bulk model training than a reference page or a storefront has.

Q: Does every Mycology site really have a robots.txt?

A: Yes — all 10 sites we checked returned a parseable robots.txt file, so coverage in this category is complete. That is unusual; many categories include at least one site that published no policy. Here there is no missing-policy state to report.

Q: Does listing a crawler in robots.txt force it to stop?

A: No. The file is an honor-system standard that compliant crawlers read and obey, but it blocks nothing at the network level. A disallow rule is a request; enforcement depends on the operator choosing to comply. Every count here is a verbatim reading of those published files — nothing is estimated, modeled, or extrapolated.

Methodology

Each number is a verbatim count from a public robots.txt file, fetched once and sealed into a content-addressed snapshot (sha 760275d49a628cc3) on June 14, 2026. The edition spans 1117 sites across 112 categories, of which 934 returned a parseable robots.txt. Our research team recorded, per site, which named AI crawlers and operators appear in disallow rules. We did not infer intent or fill gaps: nothing is estimated, modeled, or extrapolated. This captures one sealed day, not a trajectory.

The sealing step is what makes the figures auditable. Rather than re-querying sites whenever someone asks, we freeze the raw robots.txt responses and hash them, so the exact bytes behind every count in this report can be reproduced later. That discipline is the reason we can say a category had complete coverage, or that a particular site published no policy, without hedging — the snapshot is the record, and the same method applies identically to every one of the 112 categories in the edition.

Put AI-Access Data to Work

The natural buyer for this slice is a market-research or data-licensing lead who tracks AI-access posture across many verticals to price and scope content deals. Their recurring, automatable workflow: re-crawl this Mycology set on a fixed weekly cadence and trigger a notice when any of the eight permissive sites — northspore.com or fungi.com, say — adds an AI-crawler token to its disallow list, because each new block narrows the pool of openly licensable mushroom content.

A category-native second ICP is a mushroom grow-kit and spawn DTC operations lead, who can use the same weekly crawl to watch whether the retail and reference sites their store competes against are gating the answer engines that shape buyer research. US Tech Automations runs this as scheduled robots.txt and llms.txt monitoring with change alerts. See the agentic workflows that automate it.

Key Takeaways

All 10 Mycology sites returned a parseable robots.txt, and 2 of them block at least one AI crawler — a 20% rate.
The blockers are mushroomobserver.org and shroomery.org, both community-data archives.
Coverage is complete here: there were no sites without a policy file.
Mycology's 20% sits below the 29.7% corpus average, consistent with its informational, discoverability-seeking mix.
The training crawlers — CCBot, ClaudeBot, GPTBot — are the most-blocked bots across all 934 sites.

See where Mycology sites fit in the broader trend in our study of how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 760275d49a628cc3).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Mycology Sites Block AI Crawlers? 2 of 10 Do.” https://ustechautomations.com/resources/blog/do-mycology-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 760275d49a628cc3

Machine-readable data: CSV · JSON · All research & methodology