Research & Data

Do Supplement Sites Block AI Crawlers? 2 of 6 Do

Jun 19, 2026

The supplement business runs on discovery. A shopper hunting for a specific magnesium form or a third-party-tested protein rarely walks into a store first — they ask, and increasingly the thing they ask is an answer engine. That makes a supplement brand's robots.txt a quiet but consequential file: it decides whether the AI assistants fielding those questions are allowed to read the product pages at all.

2 of 6 Supplement sites block at least one AI crawler.

Of the supplement domains we checked, six returned a parseable robots.txt — the root-level file that tells automated agents which paths they may fetch — and two of those disallow an AI crawler. That works out to a 33.3% block rate. Every figure here is read straight from the sealed snapshot; nothing is estimated, modeled, or extrapolated.

The two blockers are gnc.com and nowfoods.com. The other policied supplement domains leave the door open. Against the corpus, where 317 of 1203 sites with a policy gate at least one crawler for a 26.4% rate, supplements sit a touch above the average — neither a fortress category nor a wide-open one.

The Two Supplement Sites That Gate, and How Differently

What makes this category interesting is not just that two sites block, but that the two blocks look nothing alike. gnc.com names exactly one AI agent in its disallow group: Bytespider, ByteDance's crawler. That is a narrow, targeted gate — one bot, presumably the one GNC least wants harvesting its catalog — while every other named AI agent is left free to read the site.

nowfoods.com is the opposite. NOW Foods names a long roster in its disallow directives: GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Meta-ExternalAgent, Amazonbot, and Applebot-Extended — OpenAI, Anthropic, Google, Common Crawl, ByteDance, Meta, Amazon, and Apple. When NOW Foods closes, it closes the whole leaderboard at once.

The two supplement blockers are gnc.com (Bytespider only) and nowfoods.com (the full roster).

The open supplement sites are recognizable names in their own right: iherb.com, thorne.com, naturemade.com, and optimumnutrition.com. None of them disallows an AI agent. For a direct-to-consumer supplement brand, the product page, the ingredient breakdown, and the third-party-testing claims are the sales pitch — keeping them readable by retrieval agents extends reach rather than threatening it.

Four more supplement domains — vitaminshoppe.com, gardenoflife.com, vitacost.com, and lifeextension.com — returned no parseable robots.txt at the seal. They are therefore silent: neither an allow nor a block, and excluded from the rate entirely. That is why the denominator is six rather than the ten sites we checked. It would be wrong to read that silence as a stance; it is an artifact of how each host answered at one moment in time.

What This 33.3% Block Rate Actually Means

A robots.txt directive is a public request, and the supplement read is mostly "request granted" — but not unanimously. Two of the six policied sites gate, which puts the category slightly above the corpus average rather than at the open floor.

The honest interpretation is that supplements behave like most e-commerce: the default is to be findable, because being findable is how the product gets sold, but a meaningful minority has decided that bulk AI harvesting of their catalog and content is worth refusing. That minority is what separates supplements from a wide-open category like the aquarium sites, where not one institution gates a crawler.

The split between the two blockers is the instructive part. gnc.com's single-bot block suggests a specific objection — one crawler it would rather not feed — while nowfoods.com's comprehensive block reads as a policy decision to wall off AI training and retrieval broadly. Same category, same block rate contribution, completely different posture. In a six-file sample, those two decisions are the entire story, and together they land the category at 33.3%.

The small sample sharpens this rather than weakening it. With six policied files, the read is really about ten named brands and two decisions. That concentration is itself the finding: in supplements, AI-access posture is set brand by brand, not by a sweeping category norm. Track the handful of brands that publish a policy and you have tracked most of what moves the number.

Supplement sites post a 33.3% AI-crawler block rate.

This is a middling shape of story compared with the edition's extremes. Where the most-gated categories treat their archives as the product, supplements mostly treat their catalogs as something to be surfaced — with a couple of brands choosing otherwise. The contrast with the open floor is the point: a 26.4% corpus average hides categories that range from wide-open to heavily walled, and supplements sit just on the gated side of the middle.

Where Supplements Sit Among Similar Categories

A 33.3% block rate places Supplements in the middle band of the ranking — above the open floor, well below the fortress categories. The focused window below shows Supplements beside its nearest neighbors, verbatim from the sealed snapshot, name first and no rank column.

Category	Sites	With robots.txt	Block at least 1 crawler	Block rate
Beauty	10	6	2	33.3%
HamRadio	10	6	2	33.3%
Sneakers	10	6	2	33.3%
TabletopRPG	10	6	2	33.3%
Travel	9	9	3	33.3%
VinylRecords	9	3	1	33.3%

Supplements share their 33.3% reading with a broad, mixed band — Beauty, Ham Radio, Sneakers, Tabletop RPGs, Travel, and Vinyl Records all land on the same one-in-three mark, even as their denominators differ. It is a crowded part of the ranking, which is itself a sign that a third-of-sites-gating is a common middle posture across consumer categories. The extremes show what the ends look like:

Category	Sites	With robots.txt	Block at least 1 crawler	Block rate
Gaming	9	9	8	88.9%
News	20	17	14	82.4%
FastFood	10	6	0	0%
Hotels	10	3	0	0%

Supplements sit far below Gaming and News, and a clear step above the zero-block floor that fast-food and hotel chains define with their open policies. The category is open by default, gated by a determined minority. For a sharper contrast, the sneaker sites share the same 33.3% middle ground, while the furniture category tells its own access story.

The Bots Supplement Brands — and the Corpus — Reach For

Both supplement blockers name Bytespider, and NOW Foods names far more, so the useful corpus context is which bots get gated most broadly — the tokens a brand names first when it decides to close. The cut below shows the most-disallowed bots across all 1203 sites with a robots.txt, bot name first, count next.

Bot	Sites disallowing (of 1203)	Rate
CCBot	234	19.5%
GPTBot	210	17.5%
ClaudeBot	207	17.2%
Bytespider	203	16.9%
Meta-ExternalAgent	178	14.8%

CCBot, Common Crawl's agent, tops the corpus blocklist at 234 sites, with GPTBot and ClaudeBot close behind. Bytespider — the one bot both supplement blockers name — sits fourth corpus-wide at 203 sites, so gnc.com's narrow gate targets a crawler the broader field also gates often. nowfoods.com, by naming all five of these and more, is gating the highest-volume training crawlers the whole corpus gates first, just all at once.

Corpus-wide, 317 of 1203 sites block at least one AI crawler.

How the Supplement Snapshot Was Sealed

These figures come from one point-in-time crawl of public robots.txt files, sealed June 19, 2026 under snapshot sha 040215878ac7b85a. For each supplement domain we fetched robots.txt at the root, parsed its user-agent and disallow directives, and recorded whether any AI crawler token was disallowed. We report verbatim counts; nothing is estimated, modeled, or extrapolated. The four domains with no parseable file — vitaminshoppe.com, gardenoflife.com, vitacost.com, and lifeextension.com — are logged as silent, neither allow nor block.

The counting rule is deliberately narrow. A block is an explicit Disallow aimed at a named AI agent — GPTBot, ClaudeBot, CCBot, Bytespider, and the other leaderboard tokens. A supplement brand can disallow cart, search, or account paths without naming an AI agent, and that does not count as an AI block here. Only a directive that names one moves a site into the blocker column, which is why the supplement count is a clean two: gnc.com and nowfoods.com name them, the rest do not.

A note on what the snapshot deliberately does not do. It does not retry a slow or silent host until a file appears, does not follow a redirect into a different domain's policy, and does not infer a block from a site that merely looks unfriendly to bots.

Each supplement domain is read once, at seal time, exactly as it answered. That single-read rule is what makes the result content-addressable: anyone holding sha 040215878ac7b85a can re-derive the same six policied files and the same two blockers. The cost is that the four silent domains land in the excluded bucket rather than the allow column — the method favors reproducibility over a generous reading.

Frequently Asked Questions

Q: Which supplement sites block AI crawlers?

A: Two of the six supplement domains with a parseable robots.txt. gnc.com disallows a single AI agent, Bytespider. nowfoods.com disallows a much longer list — GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Meta-ExternalAgent, Amazonbot, and Applebot-Extended. Together those two gates are the entire 33.3% block rate.

Q: Why do most supplement brands leave AI crawlers in?

A: Reach. iherb.com, thorne.com, naturemade.com, and optimumnutrition.com all run on discovery — their product pages, ingredient detail, and testing claims are meant to be found and cited, including by AI assistants fielding shopper questions. For a direct-to-consumer brand, being readable extends the sales funnel rather than threatening it.

Q: Does the 33.3% rate cover all the supplement sites you found?

A: No. It covers the six sites that returned a parseable robots.txt. Four more — vitaminshoppe.com, gardenoflife.com, vitacost.com, and lifeextension.com — produced no parseable file at the seal, so they are excluded from the rate rather than counted as an allow or a block.

Q: Does a Disallow in robots.txt actually stop an AI crawler?

A: Not by force. robots.txt is an honor-system standard: a cooperative crawler reads it and complies, but the file enforces nothing technically. nowfoods.com signals that AI agents should stay out of its paths; each crawler decides whether to honor that request.

Put AI-Access Data to Work

For a supplement brand owner or e-commerce marketing lead — the person who owns how product pages appear online — this snapshot is a baseline worth watching. Most peers stay open while a couple gate, and your own posture may not be the one you think it is: a single inherited Disallow line can quietly wall off the answer engines your customers now ask before they buy. The risk is not that you blocked on purpose; it is that you blocked by accident and never measured it.

Set a recurring crawl that re-reads robots.txt for your own domain alongside gnc.com, nowfoods.com, and the open leaders, and alert the moment your file — or a competitor's — adds or drops an AI crawler token. US Tech Automations runs exactly that kind of scheduled robots.txt crawl with change alerts and agentic monitoring, so a policy shift surfaces the week it lands rather than at the next annual audit.

A second fit is an AI-search or GEO analyst tracking which consumer brands remain eligible to surface in answer engines. Their job is to know, continuously, whether the category peers they benchmark are still readable, and whether a vitaminshoppe.com-style silence is a timeout or a hardening stance. US Tech Automations monitors that drift across a watchlist of domains and routes the alert when a brand flips. See how the agentic monitoring works, and you have a standing read on supplement AI-access posture instead of a one-time count.

Corpus-wide, 330 of 1203 sites publish an llms.txt file.

Key Takeaways

Of the six Supplement sites with a parseable robots.txt, two block at least one AI crawler — a 33.3% rate, a touch above the corpus average.
The two blockers differ sharply: gnc.com names only Bytespider, while nowfoods.com names GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Meta-ExternalAgent, Amazonbot, and Applebot-Extended.
The open supplement sites — iherb.com, thorne.com, naturemade.com, and optimumnutrition.com — all allow every crawler, and naturemade.com and optimumnutrition.com also publish an llms.txt file.
Four domains — vitaminshoppe.com, gardenoflife.com, vitacost.com, and lifeextension.com — returned no parseable file at the seal and are excluded from the rate.
Corpus-wide, 317 of 1203 sites (26.4%) gate at least one crawler, so supplements sit just above the average.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 19, 2026 (snapshot sha 040215878ac7b85a).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Supplement Sites Block AI Crawlers? 2 of 6 Do.” https://ustechautomations.com/resources/blog/do-supplement-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 040215878ac7b85a

Machine-readable data: CSV · JSON · All research & methodology