Do Bonsai Sites Block AI Crawlers? 4 of 9 Do
Bonsai is patient, deliberate work, and its sites appear to treat AI access the same way — nearly half of those with a published policy say no to at least one crawler. Of the 10 bonsai sites in this snapshot, 9 returned a parseable robots.txt, and 4 of those block at least one AI crawler, a 44.4% block rate.
The interesting wrinkle is who blocks: the gaters are a mix of the hobby's largest forum, an encyclopedia-style reference, a supplier, and an international body. That spread of blocking across community, reference, and commerce is what makes bonsai stand out from the corpus average. This is a sealed, point-in-time read of public robots.txt files — no estimates, no projections.
4 of 9 Bonsai sites block at least one AI crawler.
A robots.txt file is the plain-text policy a website publishes to tell automated crawlers which paths they may fetch. To gate an AI crawler, the site adds a token like User-agent: Bytespider followed by Disallow: /. In bonsai, four sites carry such a directive against at least one AI user-agent.
Which Sites Are Blocking — and Which Are Not
The four blockers are bonsainut.com, bonsaiempire.com, bonsaioutlet.com, and bonsai-bci.com — a forum, a reference hub, a supply store, and an international association, respectively. Each published a robots.txt that disallows at least one named AI crawler, so the gating is not confined to one type of site.
The sites that returned a robots.txt and allow every AI crawler are bonsaitonight.com, easternleaf.com, dallasbonsai.com, kaizenbonsai.com, and stonelantern.com. Several are nurseries and tool retailers, where catalog discoverability favors a permissive policy.
Four named sites — bonsainut.com, bonsaiempire.com, bonsaioutlet.com, and bonsai-bci.com — disallow at least one AI crawler.
One site, bonsai4me.com, returned no parseable robots.txt. A missing file is neither a block nor an allow; it means no crawl preference is published, which a compliant crawler reads as open by default.
| Bonsai Site | robots.txt Status |
|---|---|
| bonsainut.com | Blocks an AI crawler |
| bonsaiempire.com | Blocks an AI crawler |
| bonsaioutlet.com | Blocks an AI crawler |
| bonsai-bci.com | Blocks an AI crawler |
| bonsaitonight.com | Allows all AI crawlers |
| easternleaf.com | Allows all AI crawlers |
| dallasbonsai.com | Allows all AI crawlers |
| kaizenbonsai.com | Allows all AI crawlers |
| stonelantern.com | Allows all AI crawlers |
| bonsai4me.com | No parseable robots.txt |
What This Block Rate Actually Means
The corpus baseline frames the result. Across all 867 sites with a parseable robots.txt, 260 block at least one AI crawler — a 30% rate — so bonsai's 44.4% sits comfortably above the typical site. For a small horticultural hobby, that is a higher level of gating than one might expect.
The likely reason is the value of curated knowledge. Bonsai content is dense with technique — species care, wiring, repotting schedules — built over years by experienced growers and reference editors. The forum and the encyclopedia have a community-ownership instinct about that material, while the nurseries lean open because being found sells trees and tools. For a closely related enthusiast hobby sitting a step below bonsai in the ranking, see the metal detecting AI-crawler report.
The 44.4% should be read for what it counts. It is the number of sites with a parseable file that disallow at least one AI user-agent — not the share of pages protected, nor how strict each rule is. A site blocking one bot and a site blocking every AI crawler each count once. The figure measures the decision to gate, and bonsai is distinctive precisely because that decision is spread across every kind of site rather than concentrated in one type.
Corpus-wide, 260 of 867 sites with a published policy block at least one AI crawler.
Where This Sits in the Corpus
Bonsai shares its 44.4% block rate with cosplay and with a cluster of broader categories. The window below places bonsai among the categories it sits between — those gating slightly more above it and slightly less below it.
| Category | Sites | With robots.txt | Block ≥1 AI Crawler | Block Rate |
|---|---|---|---|---|
| HomeGarden | 10 | 9 | 4 | 44.4% |
| Genealogy | 10 | 9 | 4 | 44.4% |
| Watches | 10 | 9 | 4 | 44.4% |
| Birding | 10 | 9 | 4 | 44.4% |
| Cosplay | 10 | 9 | 4 | 44.4% |
| Bonsai | 10 | 9 | 4 | 44.4% |
| Fashion | 9 | 7 | 3 | 42.9% |
| Running | 9 | 7 | 3 | 42.9% |
| Surfing | 10 | 7 | 3 | 42.9% |
| Metal Detecting | 10 | 7 | 3 | 42.9% |
At the extremes for scale, Gaming leads the whole ranking at 88.9% and Food sits at 70%, while categories such as Prepping and Pottery post 0%. Bonsai lands in the upper-middle, gating more than most enthusiast categories without approaching the heaviest.
The cluster bonsai sits in is broad. HomeGarden, Genealogy, Watches, Birding, and Cosplay all post 44.4%, each with 4 of 9 sites gating, so a small horticultural hobby matches categories with far more commercial weight. Just below, the 42.9% tier of Fashion, Running, Surfing, and Metal Detecting differs by a single site — 3 of 7 rather than 4 of 9 — a reminder that at this sample size one editor's decision moves a whole category's published rate. Bonsai's distinction is less the rate than the breadth of who chose to gate.
Bonsai sites post a 44.4% AI-crawler block rate.
There is a knowledge-stewardship angle that sets bonsai apart from purely commercial categories. The technique a grower documents — when to defoliate a maple, how to wire a juniper without scarring it — represents years of trial recorded for the community. A forum or encyclopedia that gates AI crawlers is treating that record as something the community built and should govern, not raw material for a model to absorb silently.
The nurseries, by contrast, want their care guides and catalogs surfaced wherever a beginner is asking, so they stay open. Bonsai is unusual in that both instincts are clearly present, which is why the gating spreads across site types rather than clustering in one.
Which Bots Are Blocked Most Across the Corpus
A bonsai site that blocks "an AI crawler" is naming a specific bot. The focused cut below shows the most-disallowed bots across all 867 sites; CCBot, the Common Crawl agent, tops the list.
| Bot | Sites Disallowing (all 867 sites) | Share |
|---|---|---|
| CCBot | 194 | 22.4% |
| ClaudeBot | 171 | 19.7% |
| GPTBot | 170 | 19.6% |
| Bytespider | 163 | 18.8% |
| Meta-ExternalAgent | 145 | 16.7% |
A reference site like bonsaiempire.com or a forum like bonsainut.com protecting its knowledge base typically names these same bots. For a category that gates at the identical 44.4% rate but for image-ownership reasons, see the cosplay AI-access report; for the open end of the spectrum, see the prepping crawler-blocking breakdown, where no site blocks at all.
The bot order is consistent across the corpus: CCBot at 194, ClaudeBot at 171, and GPTBot at 170 lead because the publishers who gate one tend to gate the cluster together. Bytespider at 163 and Meta-ExternalAgent at 145 round out the heavily-disallowed group. A bonsai forum disallowing AI crawlers is usually naming several of these in one file rather than singling out a bot.
How the Snapshot Was Sealed
Every figure here is a verbatim count from a sealed snapshot of public robots.txt files captured 14 June 2026 and content-addressed with the sha 4247236167461a45. We fetch each site's published file, parse the AI user-agent tokens it disallows, and seal the count; nothing is estimated, modeled, or extrapolated. A site with no parseable file is recorded as exactly that — not a block, not a guess.
The edition behind this category spans 1038 sites overall, 867 with a parseable robots.txt, across 104 content categories. The llms.txt signal — a newer AI-preferences file — appeared on 216 of those sites, or 24.9%.
Sealing instead of re-querying live is what lets the 4 of 9 be checked again later against a fixed point. Robots.txt files change as sites revise them, so a live lookup drifts; a content-addressed snapshot pins the count to 14 June 2026. For bonsai, that fixed reference means a future snapshot can show, site by site, whether bonsainut.com or bonsaiempire.com held their blocks and whether any nursery flipped — change measured against a stable baseline rather than a moving target.
Frequently Asked Questions
Q: Which bonsai sites block AI crawlers in this snapshot?
A: Four sites — bonsainut.com, bonsaiempire.com, bonsaioutlet.com, and bonsai-bci.com — disallow at least one AI user-agent. The five with a parseable file that allow all crawlers are bonsaitonight.com, easternleaf.com, dallasbonsai.com, kaizenbonsai.com, and stonelantern.com.
Q: Why does the blocking span a forum, a reference, and a store?
A: Bonsai's value is curated technique. The forum (bonsainut.com) and the encyclopedia (bonsaiempire.com) protect community-built knowledge, while the supplier (bonsaioutlet.com) and association (bonsai-bci.com) round out a varied set — gating is not limited to one site type here, which is unusual for a hobby.
Q: How does bonsai compare to the corpus baseline?
A: It gates more. The corpus-wide rate is 30% across 867 sites, while bonsai posts 44.4% — tied with cosplay, HomeGarden, Genealogy, Watches, and Birding, and a clear step above its 42.9% neighbors like Fashion and Metal Detecting.
Q: Does a disallow rule actually stop a crawler from fetching the page?
A: Not on its own. robots.txt is an honor-system standard — a compliant crawler reads the file and respects the disallow, but the rule is a published request, not an enforced block. The 4 of 9 count reflects stated intent in the file, not technical exclusion.
Q: What does bonsai4me.com returning no file mean for the count?
A: bonsai4me.com published no parseable robots.txt, so it sits outside the 9 sites with a policy and is recorded as neither a block nor an allow. A missing file is read as open by default by compliant crawlers, but it is kept separate from the 4 blockers and the named allowers to keep the denominator accurate.
Put AI-Access Data to Work
The lead buyer for this data is a market-research or data-licensing lead tracking which corpora stay open to AI crawlers across many categories. For bonsai, the recurring job is to re-crawl the set weekly and alert the moment a permissive nursery such as kaizenbonsai.com or easternleaf.com adds a new AI user-agent to its disallow list, since a shift toward gating changes which sources a licensable dataset can include.
A category-native buyer fits second: a bonsai-nursery ecommerce manager confirming that catalog pages stay readable to AI shopping assistants while watching whether rivals like bonsaioutlet.com tighten access. US Tech Automations runs that watch as scheduled robots.txt and llms.txt crawls with change alerts on the exact domains you name. See the workflow on the agentic workflows platform.
Corpus-wide, 260 of 867 sites block at least one AI crawler.
Key Takeaways
Of 10 bonsai sites, 9 returned a parseable robots.txt and 4 of those block at least one AI crawler — a 44.4% rate.
The blockers span a forum, a reference, a store, and an association: bonsainut.com, bonsaiempire.com, bonsaioutlet.com, and bonsai-bci.com.
The allowers include bonsaitonight.com, easternleaf.com, dallasbonsai.com, kaizenbonsai.com, and stonelantern.com; bonsai4me.com returned no parseable file.
Bonsai sits above the 30% corpus baseline at 44.4%; CCBot leads disallows corpus-wide at 194 (22.4%).
Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 4247236167461a45).
Get this data as a daily feed
The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.
Prefer to talk first? Contact us.
Cite this report
US Tech Automations Research, 2026-06 edition. “Do Bonsai Sites Block AI Crawlers? 4 of 9 Do.” https://ustechautomations.com/resources/blog/do-bonsai-sites-block-ai-crawlers-2026
Sealed snapshot sha256: 4247236167461a45
Machine-readable data: CSV · JSON · All research & methodology
About the Author

Helping businesses leverage automation for operational efficiency.