Research & Data

Do Food Sites Block AI Crawlers? Sealed robots.txt Data

Jun 13, 2026

7 of 10 Food sites block at least one AI crawler.

Food sites block AI crawlers at a 70% rate.

72 of 157 sites block at least one AI crawler across the corpus.

Key Takeaways

7 of 10 Food sites with a parseable robots.txt block at least one AI crawler.

The Food block rate of 70% is well above the corpus-wide rate of 45.9%.

Every Food site in the snapshot returned a parseable robots.txt — 10 out of 10.

The Food category is one of the most aggressive in the June 2026 Closing Web snapshot when it comes to blocking AI crawlers. Only News (86.7%) and, in a statistical tie, Tech (69.2%) block at higher rates among the 16 categories studied. All 10 Food sites in the corpus returned a parseable robots.txt file — a coverage rate matched by only a handful of other categories. Seven of those 10 have issued disallow directives against at least one of the 9 tracked AI crawlers.

This report draws exclusively from a point-in-time snapshot sealed June 13, 2026 (sha 9ceca3bdf0dfeaca). Every number is a verbatim count from that snapshot. Every figure is a raw read; nothing is estimated, modeled, or extrapolated.


What This Snapshot Measures

The US Tech Automations Research team collected and parsed public robots.txt files from 182 prominent websites across 16 content categories. Of those 182, a total of 157 returned a parseable robots.txt file. The Closing Web project checks for disallow directives targeting 9 named AI crawlers: CCBot, ClaudeBot, GPTBot, Bytespider, PerplexityBot, Meta-ExternalAgent, Applebot-Extended, Google-Extended, and Amazonbot.

Across the full corpus of 157 sites, 72 — or 45.9% — block at least one of those crawlers. That is the benchmark against which every category in this report is compared.

For the Food category specifically, 10 sites were checked. Of those 10, all 10 returned a parseable robots.txt. Of those 10, 7 have issued at least one AI-crawler block — a rate of 70%.

All 10 Food sites in the corpus returned a parseable robots.txt file — the highest possible robots.txt coverage for a 10-site category.

At 70%, Food is the second-highest blocking category in the corpus, surpassed only by News at 86.7%.


Category Snapshot: Food

The table below summarizes the Food category results from the June 2026 sealed snapshot.

CategorySites CheckedWith robots.txtBlocking Any AIBlock Rate
Food1010770%

All 10 Food sites returned parseable robots.txt files. Seven have placed disallow directives against at least one AI crawler. Three have robots.txt files that do not block any of the 9 tracked bots. No Food site in the corpus had a missing or unparseable robots.txt.

The Blockers: allrecipes.com, foodnetwork.com, epicurious.com, seriouseats.com, bonappetit.com, simplyrecipes.com, food.com

Seven Food sites have declared AI-crawler blocks in their robots.txt files: allrecipes.com, foodnetwork.com, epicurious.com, seriouseats.com, bonappetit.com, simplyrecipes.com, and food.com. This is a broad coalition spanning recipe databases, culinary media brands, and community-driven platforms. The breadth of blocking across the Food category suggests that recipe and culinary content publishers have been particularly attentive to the question of AI training on their structured recipe content.

It is important to emphasize that robots.txt is an honor-system protocol. A disallow directive communicates declared intent to compliant crawlers; it is not a technical enforcement mechanism. Sites that choose to block AI crawlers in robots.txt are relying on those crawlers to honor the signal.

The Allowers: tasteofhome.com, delish.com, kingarthurbaking.com

Three Food sites returned parseable robots.txt files with no disallow directives against any of the 9 tracked AI crawlers: tasteofhome.com, delish.com, and kingarthurbaking.com. These sites have declared open access to AI training and retrieval systems under the honor-system framework. Whether that reflects a deliberate strategy, a pending policy update, or a different view on AI-content relationships is outside the scope of a robots.txt snapshot.

No Missing robots.txt

Unlike many categories in this corpus, Food has no sites with a missing or unparseable robots.txt. Every one of the 10 Food sites has published a public robots.txt file, making the category fully characterized from a declared-posture standpoint.


All 16 Categories: Cross-Category Ranking

The table below shows all 16 categories from the June 2026 Closing Web snapshot, ordered by block rate. Food sits second in the ranking — a striking position given that it is a consumer-facing content category rather than a news or technology publisher.

CategorySites CheckedWith robots.txtBlocking Any AIBlock Rate
News20151386.7%
Food1010770%
Tech1513969.2%
Entertainment99666.7%
Healthcare109666.7%
Reference1411654.5%
Automotive109444.4%
Social1010440%
Sports1010440%
Travel99333.3%
Legal107228.6%
Real Estate107228.6%
Finance1211218.2%
Retail1512216.7%
Education97114.3%
Government98112.5%

Food ranks second across 16 categories at 70%, separated from the top-ranked News (86.7%) by a meaningful gap, but leading Tech (69.2%) and Entertainment (66.7%) by a narrow margin. The contrast with lower-blocking categories is especially sharp: Real Estate and Legal both sit at 28.6%, Finance at 18.2%, and Government at only 12.5%.

For readers who want to compare Food against a sector with notably lower blocking posture, the Real Estate category report covers a sector where only 2 of 7 sites issue any AI block.


Corpus-Wide Bot and Operator Leaderboards

The following tables capture which AI bots and which operators are most frequently blocked across all 157 sites in the full corpus — not just Food. These are corpus-wide figures.

BotSites Blocking (of 157)Block Rate
CCBot5836.9%
ClaudeBot5333.8%
GPTBot4528.7%
Bytespider4428%
PerplexityBot4226.8%
Meta-ExternalAgent3924.8%
Applebot-Extended3924.8%
Google-Extended3723.6%
Amazonbot3119.7%

CCBot, operated by Common Crawl, is the most widely blocked bot across all 157 sites, with 58 sites issuing a disallow. ClaudeBot follows with 53 sites. GPTBot and Bytespider are both blocked by more than a quarter of the corpus. Amazonbot is the least-blocked of the 9 tracked bots at 19.7% across all 157 sites.

OperatorSites Blocking (of 157)
Common Crawl58
Anthropic55
OpenAI47
Meta45
ByteDance44
Perplexity42
Apple39
Google37
Cohere36
Diffbot36
Amazon31
Mistral15

Common Crawl (58) and Anthropic (55) are the most frequently blocked operators across the 157-site corpus. Mistral, with only 15 sites blocking its crawlers, is the least-blocked of the 12 tracked operators. The gap between the most-blocked and least-blocked operators reflects how unevenly blocking patterns distribute across the AI ecosystem.

These leaderboard figures are corpus-wide. The 7 Food blockers each named their own specific combination of bots — the leaderboard does not reflect per-category bot counts and should not be read as Food-specific data.

Additionally, across the full 157-site corpus, 27 sites (17.2%) have deployed an llms.txt file — a newer mechanism for communicating AI-readable content preferences. That figure is a corpus-wide count from the same sealed snapshot.

For context on how another high-blocking category compares, the Healthcare category report covers a sector at 66.7% — below Food but well above the corpus mean.


Methodology

txt file for each of the 182 sites in the June 2026 Closing Web corpus. Each file was parsed for user-agent strings matching any of the 9 tracked AI crawlers. A site is counted as "blocking" if at least one of those 9 bots is named in a disallow directive — regardless of how many paths are disallowed or how many other bots are also named.

txt was found and none of the 9 bots appear in a disallow. txt" if the file was absent or unparseable.

Nothing is estimated, modeled, or extrapolated. Every count in this report is a direct, verbatim read from the snapshot sealed June 13, 2026 under sha 9ceca3bdf0dfeaca. No inferences are drawn about enforcement, compliance, or the legal standing of these directives. robots.txt is a public, honor-system protocol — it communicates declared intent, not guaranteed outcome.


FAQ

Q: Does blocking a bot in robots.txt actually stop it from crawling recipe content?

A: No. robots.txt is an honor-system standard. A compliant crawler reads the file and respects the disallow directives it finds. A non-compliant actor will bypass the file entirely. The snapshot records what 10 Food sites have declared — not what any specific crawler has done. Legal and technical enforcement mechanisms exist separately from robots.txt.

Q: Why do 7 of 10 Food sites block AI crawlers — higher than most categories?

A: The snapshot captures declared posture, not stated motivations. Food is notable for its high block rate relative to the corpus mean of 45.9%. Recipe and culinary content is highly structured, deeply indexed, and directly valuable for AI training. That combination may have prompted more Food publishers to update their robots.txt files to address AI crawlers — but motivations are outside the scope of the sealed snapshot.

Q: What does it mean that tasteofhome.com, delish.com, and kingarthurbaking.com allow AI crawlers?

A: It means those 3 sites have parseable robots.txt files that issue no disallow directives against the 9 tracked bots. Their content is accessible to compliant AI crawlers under the honor system. Whether that represents a deliberate open-access strategy or a policy not yet updated is outside the scope of this report.

Q: Are the bot leaderboard numbers specific to Food sites?

A: No. The bot and operator leaderboards reflect counts across all 157 sites in the full corpus. They show which bots and operators face the most widespread blocking industry-wide. They are not filtered to Food. The 7 Food blockers each named specific bots in their robots.txt files — those details are per-site and not aggregated in this snapshot format.

Q: How does the Food block rate compare to the Legal category?

A: The Food block rate of 70% is far above the Legal block rate of 28.6%. Both figures appear in the cross-category table in this report. The gap between Food and Legal illustrates how much variation exists across content sectors — Food publishers have taken a much more aggressive stance toward AI crawlers than Legal sites. See the Legal category report for the full Legal breakdown.


Put AI-Access Data to Work

The Food category is one of the highest-blocking sectors in the corpus, and that fact creates recurring operational value for teams who track AI-access posture as a competitive signal — not just a one-time data point.

An SEO or content lead at a food media company has a concrete competitive map here: 7 of 10 sites have blocked at least one AI crawler, but tasteofhome.com, delish.com, and kingarthurbaking.com have not. A recurring automated job that re-fetches all 10 Food robots.txt files weekly and compares against this sealed baseline (sha 9ceca3bdf0dfeaca) will alert the moment any open site closes access or any blocking site relaxes its policy. That signal is especially valuable when AI platform answer quality in recipe and cooking topics is a traffic driver.

A publisher RevOps lead at a culinary media brand can use this data to understand which of their content is in-scope for AI-training crawlers and which is not. With 7 blockers and 3 open sites, the field is decisively split toward protection. A monthly cadence job comparing current robots.txt states against the June 2026 baseline makes policy drift visible before it changes the competitive landscape for AI-generated food content.

A retrieval or data engineer building a culinary knowledge base or RAG pipeline needs to know — on an ongoing basis — which of these 10 Food sources have declared themselves off-limits under the honor system. Automating a monthly re-fetch and logging changes against the sealed baseline means the engineering team is never operating on stale access assumptions. For context on how a more permissive sector handles the same monitoring challenge, see the Automotive category report, which sits near the corpus average.

US Tech Automations builds agentic workflows that automate this kind of recurring robots.txt monitoring — fetching, parsing, diffing, and alerting on policy changes so your team acts on live signals rather than a static snapshot taken months ago. See how agentic workflows handle AI-access monitoring at scale.


Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 13, 2026 (snapshot sha 9ceca3bdf0dfeaca).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Food Sites Block AI Crawlers? Sealed robots.txt Data.” https://ustechautomations.com/resources/blog/do-food-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 9ceca3bdf0dfeaca

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.