Research & Data

Do Grocery Sites Block AI Crawlers? 1 of 7 Do

Jun 18, 2026

The grocery aisle is one of the most open categories in this edition — but it is not unanimous. Where some consumer-service verticals gate nothing at all, the supermarket chains have a single holdout, and that one decision is the whole story here.

1 of 7 Grocery sites block at least one AI crawler.

Of the major supermarket and grocery sites we checked, 7 returned a parseable robots.txt — the root-level file that tells automated agents which paths they may fetch — and 1 of those disallows an AI crawler. That works out to a 14.3% block rate. Every figure here is read straight from the sealed file; nothing is estimated, modeled, or extrapolated. The lone blocker is aldi.us, and it shapes the entire category number.

The six allowers are safeway.com, publix.com, wegmans.com, wholefoodsmarket.com, heb.com, and albertsons.com — a roll call of national and regional chains that leave their robots.txt open to AI agents. Three more grocery domains returned no parseable file at the seal and sit outside the rate. Against a corpus where 27.2% of policied sites gate at least one crawler, grocery lands well below the line, dragged up off zero by a single chain.

The One Holdout — Reading aldi.us

In a category built on weekly circulars and findable storefronts, one supermarket chose to draw a line. aldi.us is the single grocery blocker in the snapshot, and it does so with a specific, named directive: its own robots.txt group carries Disallow: / for PerplexityBot and Meta-ExternalAgent. Those are the crawlers operated by Perplexity and Meta, respectively — an answer-engine bot and a social-platform AI agent. aldi.us did not throw a blanket wildcard at every crawler; it singled out two named tokens and shut them out, while leaving the rest of the tracked leaderboard unaddressed.

That selectivity is worth dwelling on, because it is the difference between a category-defining posture and a targeted one. A site that disallows PerplexityBot and Meta-ExternalAgent has made a deliberate choice about two specific operators — not a sweeping rejection of AI as a whole. It is the most surgical kind of block in this report: two named agents, one chain, one line in the file.

The only grocery blocker, aldi.us, disallows PerplexityBot and Meta-ExternalAgent by name.

Everyone else in the readable set stays open. safeway.com, publix.com, wegmans.com, wholefoodsmarket.com, heb.com, and albertsons.com all return a parseable robots.txt that leaves the AI crawlers unrestricted. For a grocery chain, the incentive runs toward openness: storefronts, store locators, and weekly deals are exactly the kind of content a shopper might ask an AI assistant to surface, and being readable keeps a chain eligible to be the answer.

The contrast with a category that gates nothing at all is sharp — see how ticketing sites handle AI crawlers, where every published policy stays open and the block rate is a clean zero. Grocery is one chain away from that floor.

Why a Single Block Still Lands at 14.3%

A 14.3% rate sounds modest until you remember the denominator. With only 7 grocery sites carrying a parseable robots.txt, a single blocker is not a rounding error — it is 1 out of 7, and one decision is enough to lift the category off the zero floor. This is the arithmetic of a small policied base: in a category this size, each individual chain's choice moves the headline number visibly, and aldi.us's choice is the one that moved grocery off zero.

That is the counting rule, applied. A "block" here is a site whose own robots.txt group carries Disallow: / for a named AI user-agent. aldi.us carries exactly that for PerplexityBot and Meta-ExternalAgent, so it sits in the blocker column. The six other readable chains carry no such directive against any AI token, so they are allowers. A chain can disallow /cart or /account without naming an AI agent, and that housekeeping does not count as an AI block under this definition — only a directive aimed at a named crawler does, which is why grocery's count is a precise 1.

The honesty of the 14.3% reading rests on stating that base out loud. We are not saying one in seven grocery shoppers will hit a wall; we are saying that among the 7 chains that published a readable AI-access policy, exactly 1 gates a crawler. Three more grocery domains gave us no readable file at all and are kept out of the math entirely.

Grocery sites post a 14.3% AI-crawler block rate.

What the Silent Sites Tell Us

Three grocery domains returned no parseable robots.txt at seal, so they are silent — neither allow nor block, and excluded from the rate. kroger.com answered with an HTTP 0, the marker for a connection that produced no usable response at all. traderjoes.com and meijer.com each refused our request, the "forbidden" response that gave us no file to parse. None of these is a stance on AI crawlers; each is simply a file we could not read at seal time.

This is why the denominator is "sites with a parseable robots.txt," not "all grocery sites checked." We classify posture only where a posture was published in a form we could read. Folding kroger.com, traderjoes.com, and meijer.com into the rate as either allows or blocks would invent a policy none of them showed us. Keeping them out keeps the 14.3% honest: it is 1 blocker among 7 readable policies, full stop.

Among the 7 readable grocery policies, 6 allow every AI crawler and aldi.us blocks two.

It is worth being clear about what a silent site is not. A grocery chain that refuses our request is not refusing AI crawlers — it is refusing our reader, this once, for this fetch. The chain may run a perfectly permissive robots.txt that simply was not served at seal. We do not guess in either direction; the silent bucket exists precisely so an unreadable file never gets counted as a decision the chain never published.

Where Grocery Sits in the Corpus

A 14.3% block rate places grocery near the open end of the ranking, among the lightly-gated consumer categories rather than the heavily-gated publisher and content verticals. The focused window below shows grocery beside its nearest neighbors, verbatim from the sealed snapshot — category name first, no rank column.

Category	Sites	With robots.txt	Block at least 1 crawler	Block rate
Retail	15	12	2	16.7%
Soapmaking	10	6	1	16.7%
Grocery	10	7	1	14.3%
Education	9	7	1	14.3%
Sailing	7	7	1	14.3%
Cigars	10	7	1	14.3%

Grocery shares its 14.3% reading with Education, Sailing, and Cigars — all single-blocker categories with the same seven-file base — and sits just under Retail and Soapmaking at 16.7%. It is a crowded, low-gating part of the ranking, which is itself the signal: grocery behaves like a storefront vertical where openness is the norm and a single chain is the exception.

The same storefront instinct shows up even more cleanly one notch lower, where casino sites post a flat zero block rate across every readable file — the floor grocery's lone holdout keeps it just off. The extremes show what the opposite posture looks like:

Category	Sites	With robots.txt	Block at least 1 crawler	Block rate
Gaming	9	9	8	88.9%
News	20	17	14	82.4%
Streaming	10	10	0	0%

Grocery sits far below the gated poles. Where Gaming gates at 88.9% and News at 82.4%, grocery's lone holdout keeps the category near the open floor that Streaming's 0% defines. Supermarkets, like streaming services, mostly want to be found.

Which Operators Grocery Gates — and Which It Does Not

aldi.us blocks Perplexity and Meta; it gates no one else, and the rest of the category gates nobody at all. Reading the corpus operator leaderboard shows how grocery's single block fits the larger pattern — which operators sites push back on most when they do gate. The cut below shows operator-level block counts across all 1123 sites with a robots.txt, operator name first.

Operator	Sites disallowing (across all 1123 sites)	Rate
Common Crawl	228	20.3%
Anthropic	217	19.3%
OpenAI	209	18.6%
Meta	196	17.5%
Perplexity	105	9.3%

Across the corpus, Common Crawl, Anthropic, and OpenAI are the most-gated operators, with Meta in the middle of the pack at 196 sites and Perplexity lower at 105. aldi.us's choice to disallow Meta and Perplexity is notable precisely because Perplexity is one of the less-gated operators corpus-wide — so a grocery chain singling it out is a more deliberate signal than gating the usual high-volume training crawlers everyone else gates first.

Corpus-wide, 305 of 1123 sites block at least one AI crawler.

How the Grocery Snapshot Was Sealed

These figures come from one point-in-time crawl of public robots.txt files, sealed June 18, 2026 under snapshot sha 74d390d8f5175d21. For each grocery domain we fetched robots.txt at the root, parsed its user-agent and disallow directives, and recorded whether any AI crawler token was disallowed. We report verbatim counts; nothing is estimated, modeled, or extrapolated. Domains with no parseable file — kroger.com, traderjoes.com, and meijer.com — are logged as silent, neither allow nor block.

US Tech Automations runs this read across 1374 sites checked, 1123 with a parseable robots.txt, spanning 138 categories. Grocery contributes 7 of those policied files, and we report its slice as exactly the 7 it is — 1 blocker, 6 allowers.

A note on what the snapshot deliberately does not do. It does not retry a host that refuses our request or fails to connect until a file finally appears, does not follow a redirect into another domain's policy, and does not infer a stance from a site that merely loads slowly. Each grocery domain is read once, at seal time, exactly as it answered — which is why kroger.com, traderjoes.com, and meijer.com land in the silent bucket rather than the allow column.

That single-read rule is what makes the result content-addressable: anyone holding sha 74d390d8f5175d21 can re-derive the same seven readable files, the same single aldi.us block, and the same two named tokens it disallows. The cost is that a chain briefly unreachable at seal is excluded rather than generously read as open. The method favors reproducibility over a flattering count, and we would rather log three chains as silent than guess their policies into the allow column.

Frequently Asked Questions

Q: Which grocery site blocks AI crawlers, and which bots does it block?

A: aldi.us is the only blocker among the 7 grocery sites with a parseable robots.txt. Its robots.txt group carries Disallow: / for two named agents — PerplexityBot and Meta-ExternalAgent, operated by Perplexity and Meta. That single block produces the category's 14.3% rate. The other six readable chains gate no AI crawler.

Q: Why are three grocery sites left out of the 14.3% rate?

A: Because they published no readable policy at seal. kroger.com returned no response at all, and traderjoes.com and meijer.com each refused our request, so none produced a parseable robots.txt to classify. They are logged as silent and excluded from the rate — neither allow nor block. The 14.3% covers only the 7 sites whose AI-access policy we could actually read.

Q: Why would a grocery chain block only two operators instead of all of them?

A: Because the block is targeted, not sweeping. aldi.us disallowed PerplexityBot and Meta-ExternalAgent specifically rather than throwing a wildcard at every crawler, which signals a decision about two named operators — an answer engine and a social-platform agent — not a rejection of AI generally. The other six chains made the opposite call and left every AI token unrestricted.

Q: Does aldi.us blocking a crawler in robots.txt actually stop it?

A: Not by force. robots.txt is an honor-system standard: a cooperative crawler reads the file and complies, but the file enforces nothing technically. aldi.us signals that PerplexityBot and Meta-ExternalAgent should stay out; each of those crawlers decides whether to honor the request. The other six grocery chains send no such signal at all.

Put AI-Access Data to Work

For a grocery e-commerce or digital-shelf lead — someone at a chain like wegmans.com or albertsons.com responsible for whether store locations, weekly deals, and product pages surface in AI shopping answers — this snapshot is the baseline, and the job is watching the category's posture move. Right now the field is open except for one chain.

Set a recurring crawl that re-reads robots.txt for safeway.com, publix.com, heb.com, and the rest of the grocery set weekly, and alert the moment another chain follows aldi.us and adds PerplexityBot, Meta-ExternalAgent, or any AI token to its disallow list — one competitor gating is a category-level signal that the open consensus is fracturing.

A second fit is a grocery RevOps or competitive-intelligence analyst tracking how rivals position against emerging AI buying agents: they can monitor aldi.us specifically to see whether it widens its block beyond Perplexity and Meta, and watch the six allowers for any accidental self-block that would quietly pull their catalog out of AI answers.

The watch generalizes to the neighboring quick-commerce storefronts — fast-food chains sit on the same open end of the ranking — so an analyst covering everyday-purchase verticals can read grocery and fast food off one dashboard. US Tech Automations runs these scheduled robots.txt crawls with change alerts so a policy shift surfaces the week it lands rather than at the next audit cycle. See how the agentic monitoring works.

Corpus-wide, 298 of 1123 sites publish an llms.txt file.

Key Takeaways

Of the 7 Grocery sites with a parseable robots.txt, 1 blocks at least one AI crawler — a 14.3% rate, near the open end of the ranking.
The lone blocker, aldi.us, disallows PerplexityBot and Meta-ExternalAgent by name — a targeted block of Perplexity and Meta, not a blanket one.
The six allowers are safeway.com, publix.com, wegmans.com, wholefoodsmarket.com, heb.com, and albertsons.com.
kroger.com, traderjoes.com, and meijer.com returned no parseable file and are excluded from the rate — neither allow nor block.
Corpus-wide, 305 of 1123 sites (27.2%) gate at least one crawler, so grocery sits well below the average.

This snapshot of Grocery sites is one slice of a wider dataset; read how many top websites block AI crawlers for the cross-industry view.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 18, 2026 (snapshot sha 74d390d8f5175d21).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Grocery Sites Block AI Crawlers? 1 of 7 Do.” https://ustechautomations.com/resources/blog/do-grocery-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 74d390d8f5175d21

Machine-readable data: CSV · JSON · All research & methodology