Research & Data

Do Tea Sites Block AI Crawlers? None Do

Jun 14, 2026

The specialty-tea web is wide open to artificial intelligence. We checked the published crawl policies of 10 leading Tea sites, and every single one that posted a parseable robots.txt left the door open to AI. Not one disallowed a single AI crawler.

That is the headline, and it is unusual. In a corpus where roughly a third of sites push back on at least one AI bot, Tea posts a flat 0% block rate — the lowest band there is, shared with a handful of other verticals but reached by very few consumer-passion categories.

0 of 10 Tea sites block any AI crawler.

This report is a point-in-time reading from a sha256-sealed snapshot of public robots.txt files, captured on 14 June 2026. A robots.txt file is the plain-text policy a website publishes at its root to tell automated crawlers which paths they may fetch. We read only what each site declared in public. We did not guess, infer intent, or estimate behavior.

Which Tea Sites Allow Every Crawler

All 10 Tea sites we checked returned a robots.txt, and all 10 of those allow every AI crawler we track. The roster spans retailers, blends houses, and trade press: teatulia.com, adagio.com, harney.com, davidstea.com, artoftea.com, teasource.com, worldteanews.com, stashtea.com, bigelowtea.com, and republicoftea.com.

There is no blocker list to publish here, because there are no blockers. Every named site above sits in the allower column. That uniformity is itself the signal — a vertical this consistent is making a collective, if unspoken, choice to stay discoverable.

It is worth being precise about what "allow" means in this snapshot. A site allows a crawler when its robots.txt contains no rule that disallows that crawler's user-agent from the paths in question. None of the 10 Tea sites named a single AI user-agent in a disallow directive. Some published only a generic policy for traditional search engines; others published nothing AI-specific at all. In every case, the AI crawlers we track were free to fetch.

That distinction matters because it tells you the openness is not a loud, deliberate "AI welcome" banner — it is the quiet absence of a gate. For most of these brands, AI access simply was not a question they felt the need to answer in robots.txt, and the default answer is yes.

Every Tea site we checked posted a robots.txt, and 0 of them disallow a single AI crawler.

Tea Site	Returned robots.txt	Blocks any AI crawler
teatulia.com	Yes	No
adagio.com	Yes	No
harney.com	Yes	No
davidstea.com	Yes	No
artoftea.com	Yes	No
teasource.com	Yes	No
worldteanews.com	Yes	No
stashtea.com	Yes	No
bigelowtea.com	Yes	No
republicoftea.com	Yes	No

Why a Specialty-Beverage Vertical Stays Permissive

Tea sells on story, origin, and ritual. A single-estate Darjeeling, a house blend, a brewing guide — these are discovery purchases, and discovery increasingly runs through AI answer engines. A shop that lets crawlers read its catalog and brewing notes is a shop an assistant can recommend by name.

Most of these properties are commerce or content-marketing operations rather than subscription publishers guarding paywalled archives. There is little proprietary text to lose and a lot of consideration-stage traffic to win. Leaving robots.txt open is the path of least resistance and, for this vertical, the commercially rational one.

Tea sites post a 0% AI-crawler block rate.

It also reflects who builds these sites. Boutique-beverage storefronts rarely staff a policy team weighing AI access; the default platform robots.txt simply ships open. Whether by strategy or by default, the outcome is the same: an unguarded vertical.

There is also a content-supply angle. Much of what a tea site publishes — steeping temperatures, origin stories, caffeine guidance — is already common knowledge restated in the brand's voice. Unlike a news archive or a paid course, it is not scarce proprietary text that a model could substitute for the original. The marginal cost of letting an assistant read it is low, and the marginal benefit of being the source an assistant cites is real. The economics tilt toward openness.

Contrast this with the publisher logic that drives gating in other verticals. A magazine's archive is the asset it sells; a tea retailer's catalog is a means to a transaction. When the page exists to move a reader toward a purchase, being readable by the tools that route those readers is the whole point.

Where Tea Sits Among the Quietest Categories

Tea anchors the floor of the ranking. Below is a focused window of Tea and the categories nearest it at the bottom — the verticals that, like Tea, gate little or nothing. Compare that to the Whiskey report, where the block rate climbs into the higher band.

Category	Sites With robots.txt	Sites Blocking	Block Rate
Marketing	10	1	10%
Nonprofit	6	0	0%
Streaming	10	0	0%
Banking	7	0	0%
Telecom	6	0	0%
Energy	6	0	0%
Logistics	8	0	0%
Construction	6	0	0%
Manufacturing	8	0	0%
Boating	8	0	0%
Tea	10	0	0%

Tea keeps unusual company. Banking, Energy, and Manufacturing sit at the same 0% floor — verticals where sites either have nothing to gate or no reason to. Tea reaching that floor as a consumer-passion category, rather than a regulated B2B one, is what makes its position distinctive.

Tea shares the 0% floor with verticals like Banking and Manufacturing, but reaches it as a consumer-passion category.

The Corpus-Wide Picture and What a Future Block Would Signal

Step back to the full snapshot and the contrast sharpens. Across the whole corpus, 196 of 614 sites block at least one AI crawler — a 31.9% rate. Tea sits far beneath that line. Of the 725 sites checked overall, 614 returned a parseable robots.txt, and 141 sites (23%) went further by publishing an llms.txt file.

Even where sites do gate, the disallows concentrate on a familiar set of operators. The table below shows the most-disallowed AI operators across all 614 sites — the names a Tea brand would have to add to its robots.txt if it ever decided to close the door.

AI Operator	Sites Disallowing (all 614 sites)
Common Crawl	145
Anthropic	136
OpenAI	126
Meta	122
ByteDance	118

If a single Tea site flipped to blocking one of these tomorrow, it would be the first defection from a perfectly open vertical — worth watching, because in passion-retail categories one visible move often nudges peers to reconsider their own defaults.

The 23% llms.txt figure is its own signal. An llms.txt file is an emerging convention for telling AI systems how a site prefers to be used — a more expressive layer than the binary allow-or-block of robots.txt. Across all 614 sites with a robots.txt, 141 went on to publish one. Tea, having not even reached the point of gating in robots.txt, is unlikely to be early to that more advanced posture; the vertical is simply not treating AI access as a problem to manage yet.

Corpus-wide, 196 of 614 sites block at least one AI crawler.

How the Snapshot Was Sealed

The method is deliberately boring, because boring is auditable. We collected the public robots.txt file from each site on its own root, parsed the directives, and recorded which AI user-agents — if any — were disallowed. We then content-hashed the whole capture into a single sha256-sealed snapshot, sha 77d0521dc8809a6c, dated 14 June 2026. The hash means the figures cannot be quietly revised after publication; anyone can verify the report against the seal.

For this report, nothing is estimated, modeled, or extrapolated. The 0 blockers, the 10 sites with a robots.txt, and the 0% rate are verbatim counts, not projections. Where a site published no robots.txt, we record that absence plainly rather than guessing what it might mean — though for Tea, every site published one.

A few caveats keep the read honest. robots.txt reflects a stated policy, not enforced behavior; a non-compliant bot can ignore it. The category is a 10-site sample of prominent Tea properties, not a census of every tea site online. And a snapshot is a single day — the value of repeating it is precisely to catch the day a 0% vertical stops being one.

Key Takeaways

Of 10 Tea sites checked, 10 returned a parseable robots.txt and 0 block any AI crawler — a 0% rate.
Every named site, from harney.com to republicoftea.com, allows every AI crawler we track.
Tea sits at the floor of the ranking, far below the 31.9% corpus-wide block rate.
Across all 614 sites, Common Crawl is the most-disallowed operator at 145 sites.
A first Tea block would be a notable defection from a uniformly open vertical.

Frequently Asked Questions

Q: Does blocking a crawler in robots.txt actually stop it?

A: No. robots.txt is an honor-system standard; compliant crawlers respect it, but it is a request, not an enforcement mechanism. We report what each site declares, not whether every bot obeys.

Q: Why does every Tea site allow AI crawlers?

A: All 10 Tea sites with a robots.txt allow every AI crawler. For a discovery-driven retail vertical, open access means an AI assistant can read and recommend your catalog and brewing guides — a commercial upside that outweighs the case for gating.

Q: How is this different from a re-crawl I could run myself?

A: This is a sha256-sealed snapshot from 14 June 2026, content-addressed so the figures cannot drift after the fact. A live re-query would show today's robots.txt; ours is a fixed, citable point-in-time record.

Q: What would it mean if a Tea site started blocking AI crawlers?

A: It would be the first defection in a 0%-block vertical. Given how uniformly open Tea is, one site adding an AI-operator disallow would be an early signal worth tracking across the category.

Put AI-Access Data to Work

A specialty-tea ecommerce growth lead can treat this 0% reading as a competitive baseline: re-crawl the 10 Tea sites weekly and alert the moment any peer — say adagio.com or davidstea.com — adds an AI-operator token to its disallow list, because in a uniformly open vertical the first mover changes the discovery math. An AI retrieval product manager building a beverage recommender can monitor the same set to confirm sources stay fetchable before relying on them.

US Tech Automations runs that monitoring as a scheduled job: recurring robots.txt and llms.txt crawls, change alerts, and an AI-access policy dashboard that flags drift the day it happens. See how the agentic workflow platform automates it.

This is the same sealed-snapshot discipline behind every report in this edition — nothing is estimated, modeled, or extrapolated; the numbers are verbatim counts from public robots.txt files. Read the neighboring Skiing report for a vertical that gates a little, and the Golf report for one that gates more.

See where Tea sites fit in the broader trend in our study of how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 77d0521dc8809a6c).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Tea Sites Block AI Crawlers? None Do.” https://ustechautomations.com/resources/blog/do-tea-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 77d0521dc8809a6c

Machine-readable data: CSV · JSON · All research & methodology