Research & Data

Do Soapmaking Sites Block AI Crawlers? 1 of 6 Do

Jun 14, 2026

Almost nobody in the soapmaking world has decided AI crawlers are a problem. We checked 10 Soapmaking sites; only one of the six that publish a robots.txt has written a single rule that turns an AI bot away. The rest of the published policies leave the welcome mat out for every crawler we looked for.

That makes Soapmaking one of the quietest corners of this entire snapshot. A robots.txt file is a plain-text note at the root of a website that names which automated visitors may fetch which paths — an AI crawler is the automated reader that pulls pages to feed a model. When only one soap site bothers to gate those readers, the signal is less about strategy and more about a hobby that hasn't yet treated its tutorials as something to wall off.

This report is a single sealed reading, not a trend line. We fetched each site's public robots.txt once, on one day, and counted what the files literally say — which crawlers they name in a disallow rule and which they leave unmentioned. There is no projection here and no comparison to a past month, because the snapshot does not contain one. The value is in the verbatim count: of ten soap sites, six published a readable file, and within that group the blocking is confined to a single community forum. Everything below builds on those exact figures.

1 of 6 Soapmaking sites block at least one AI crawler.

What This Block Rate Actually Means

Of the 10 Soapmaking sites we checked, 6 returned a parseable robots.txt file. Within that group, exactly one — soapmakingforum.com — carries a rule that disallows an AI user-agent. The lone gatekeeper being a community forum is the most telling detail in the slice: forums host years of member-written threads, and that accumulated, searchable knowledge is precisely the kind of corpus a model owner wants and a moderator might want to protect.

The five sites with a published policy that allows everything tell the opposite story. wholesalesuppliesplus.com, soapqueen.com, soapguild.org, lovinsoap.com, and thenerdyfarmwife.com each returned a robots.txt that names no AI user-agent at all. For a supplier, a guild, or a recipe blog, open indexing is usually the goal — being quoted by an answer engine is free reach, not a leak.

Soapmaking sits well below the corpus average, and it reads like the company it keeps among low-blocking crafts. For a sense of where the wider corpus lands, the open posture across embroidery sites is the most permissive end of the same picture, while a craft with more commercial scale gates more.

Within Soapmaking, the only published block belongs to a community forum, not a supplier or a recipe blog.

Who Is Blocking — and Who Is Not

The named breakdown is short enough to read in full, which is part of what makes the slice easy to interpret. One site gates a crawler; five publish a policy and wave everyone through; four returned nothing parseable at all and so express no preference either way.

Soapmaking SitePublished PolicyAI Crawler Stance
soapmakingforum.comrobots.txt presentBlocks at least one AI crawler
wholesalesuppliesplus.comrobots.txt presentAllows all crawlers
soapqueen.comrobots.txt presentAllows all crawlers
soapguild.orgrobots.txt presentAllows all crawlers
lovinsoap.comrobots.txt presentAllows all crawlers
thenerdyfarmwife.comrobots.txt presentAllows all crawlers

The four sites that returned no parseable robots.txt — brambleberry.com, soap-making-resource.com, modernsoapmaking.com, and auntieclaras.com — are worth naming because their silence is not a decision. A missing file means crawlers default to "allowed" under the standard, so functionally these read open, but a single future edit could change that. They are the sites most likely to move first if the category ever tightens.

It is worth dwelling on what the lone blocker chose to gate. A forum disallow rule typically targets the broad-archive crawlers first — the ones whose collected text propagates into the widest set of downstream models. That is a measured move, not a blanket lockout: the goal is usually to keep years of community answers out of a training corpus while leaving ordinary search indexing intact. The other five published files simply never reach that decision point, naming no AI user-agent at all, which is why the category reads as open as it does.

The shape of the readable set also matters for how much weight to put on the number. With only six sites carrying a parseable policy, a single site's edit swings the rate meaningfully, so the 16.7% figure is best read as a point-in-time snapshot of a small, lightly-policed category rather than a durable equilibrium. The signal here is the absence of activity: a craft that has not, on the whole, decided AI access is a question worth answering in robots.txt.

Soapmaking sites post a 16.7% AI-crawler block rate.

Where Soapmaking Sits Among Similar Hobbies

To place 16.7% in context, here is the category alongside its nearest neighbors in the block-rate ranking — the few crafts and verticals filing right around the same line, plus a couple sitting just above and below it.

CategorySites With robots.txtBlock At Least OneBlock Rate
Mycology10220%
Finance11218.2%
Retail12216.7%
ReefKeeping6116.7%
Soapmaking6116.7%
Education7114.3%
Government8112.5%
Crypto8112.5%
Homebrewing8112.5%

Reading the window top to bottom tells a small story. Mycology and Finance edge just above Soapmaking, Education and the cluster of 12.5% categories fall just below, and the whole band lives in the lightly-policed lower third of the snapshot. There is no sharp boundary here — moving one site between "blocks" and "allows" would slide Soapmaking up or down a row. That fragility is the honest read of any small-base category: the rank is real but shallow, and the durable signal is the cluster it belongs to, not its exact position within it.

Soapmaking shares its exact rate with ReefKeeping and Retail, and it clusters near the bottom of the ranking with other low-key, low-blocking verticals. A craft like the orchid-growing slice we measured lands even lower, while a more performance-minded vertical such as Magic gates at a much higher rate. The grouping confirms the read: hobby and supply categories tend to leave their published policies open.

For the extremes, the contrast is stark — the heaviest blockers and the cleanest-zero categories anchor the two ends of the same snapshot.

CategoryBlock Rate
Gaming88.9%
News81.3%
Geocaching0%
Pickleball0%

The Operator-Level Picture Across the Corpus

Even though Soapmaking itself gates almost nothing, the corpus-wide pattern shows which operators get disallowed most often when sites do decide to act. The operator leaderboard counts every site that named each operator's crawler in a disallow rule, across all 993 sites.

OperatorSites Blocking (all 993 sites)
Common Crawl211
Anthropic201
OpenAI193
Meta184
ByteDance183

Common Crawl leads because its archive feeds many downstream models, so blocking it is the broadest single move a site can make. That ordering is corpus-wide and does not reflect Soapmaking, where the one block is a single forum's choice rather than a category trend.

The gap between a category like Soapmaking and the corpus aggregate is instructive. The 285 sites that block across the snapshot are concentrated in news, gaming, and information-heavy verticals where the published text is the product. Crafts that sell physical goods — soap molds, lye, fragrance oils, finished bars — see open indexing as a marketing channel, not a leak, so they rarely appear in that blocking total.

The same dynamic shows up in the clean-zero geocaching slice, where a findability-first hobby blocks nothing at all. Soapmaking is one step up from that floor: almost open, with a single community forum as the exception that proves the pattern. The shared trait across these low-blocking verticals is that their commercial value lives in transactions and physical product, not in the words on the page, so the words stay freely crawlable.

Corpus-wide, 285 of 993 sites block at least one AI crawler — Soapmaking sits far below that line.

Key Takeaways

  • Of 10 Soapmaking sites checked, 6 returned a parseable robots.txt, and only soapmakingforum.com blocks any AI crawler.

  • The 16.7% block rate puts Soapmaking among the most permissive verticals in the snapshot, tied with ReefKeeping and Retail.

  • Five sites publish an open policy; four returned nothing parseable, so they read open by default and could shift with one edit.

  • Corpus-wide, 285 of 993 sites — 28.7% — block at least one AI crawler, well above this category.

Corpus-wide, 285 of 993 sites block at least one AI crawler.

Frequently Asked Questions

Q: Does blocking a crawler in robots.txt actually stop it?

A: No. robots.txt is an honor-system convention: it advertises a preference, and well-behaved crawlers respect it, but the file enforces nothing on its own. A bot that ignores the rule still reaches the page. So soapmakingforum.com's single block is a request that compliant operators honor, not a wall.

Q: Why does only one Soapmaking site block any AI crawler?

A: The lone blocker is a community forum, and forums host years of member-written threads worth gating. Suppliers, the guild, and recipe blogs — wholesalesuppliesplus.com, soapguild.org, and the rest — generally want the free reach that open indexing brings, so they leave every crawler allowed.

Q: What does it mean that four soap sites returned no robots.txt?

A: brambleberry.com, soap-making-resource.com, modernsoapmaking.com, and auntieclaras.com returned nothing parseable. Under the standard, a missing file means crawlers are allowed by default, so they read open — but that is an absence of a decision, and any of them could add a blocking rule later.

Q: Is a 16.7% block rate high or low for this snapshot?

A: Low. The corpus-wide rate is 28.7%, so Soapmaking sits comfortably below it, tied with ReefKeeping and Retail and clustered with other hobby and supply verticals. These figures are verbatim counts from sealed public robots.txt files; nothing is estimated, modeled, or extrapolated.

Put AI-Access Data to Work

The first buyer this data fits is a horizontal one: an AI-search and GEO agency tracking which client-eligible sites stay open to model crawlers. For an agency, the recurring job is to re-crawl a watchlist that includes soapmakingforum.com weekly and alert the moment a site that currently allows everything — say wholesalesuppliesplus.com or soapqueen.com — adds a GPTBot or ClaudeBot token to its disallow list, because that flips whether a client's content can be cited in an answer engine. A brand-intelligence analyst can run the same cadence across many categories to watch AI-access drift.

The category-native second ICP is a soapmaking-supply catalog manager who sells lye, molds, and fragrance oils online. That role can monitor whether the recipe blogs and the guild it depends on for referral traffic stay crawlable, since their visibility in AI answers shapes top-of-funnel demand. US Tech Automations automates exactly this monitoring with scheduled robots.txt crawls, change alerts, and an AI-access dashboard. See how the platform runs that watch on our agentic workflows.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 5d5458529dab2773).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Soapmaking Sites Block AI Crawlers? 1 of 6 Do.” https://ustechautomations.com/resources/blog/do-soapmaking-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 5d5458529dab2773

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.