Do Wine Sites Block AI Crawlers? 3 of 9 Do
The Wine category splits almost exactly at the corpus average. Of the 10 Wine sites we checked, 9 returned a parseable robots.txt — and 3 of those 9 block at least one AI crawler, a 33.3% block rate. The corpus-wide rate across all 479 sites is 33.4%. Wine lands at effectively the same line, making it one of the truest corpus-average categories in this edition.
A robots.txt file is the plain-text document website operators publish at their domain root to direct automated crawlers — including AI training, retrieval, and summarization bots — on what they may or may not access. The standard is voluntary and honor-based; compliance is not technically enforced. That the Wine category sits right at the corpus average, not pulled high by a concentration of large platforms or low by a purely promotional sector, makes it an instructive reference point for understanding where the broader web stands.
3 of 9 Wine sites block at least one AI crawler.
Wine sites post a 33.3% AI-crawler block rate.
Corpus-wide, 160 of 479 sites block at least one AI crawler.
Key Takeaways
3 of 9 Wine sites with a parseable robots.txt block at least one AI crawler.
The Wine block rate of 33.3% sits right at the corpus-wide average of 33.4% across all 479 sites.
CCBot is the single most-blocked bot corpus-wide, disallowed by 124 of 479 sites.
Of 10 Wine sites checked, 9 returned a parseable robots.txt; 3 of those 9 block at least one AI crawler.
Blockers are vivino.com, decanter.com, and winespectator.com — all carrying proprietary ratings or editorial content.
The 6 allowers — wine.com, winefolly.com, totalwine.com, jancisrobinson.com, winecountry.com, drinkhacker.com — are retail, education, and discovery-oriented.
winemag.com returned no parseable robots.txt in this snapshot and cannot be attributed a stance.
Corpus-wide, 102 of 479 sites (21.3%) have published a separate llms.txt AI-access policy file.
Which Sites Are Blocking — and Which Are Not
The 3 sites that block at least one AI crawler are vivino.com, decanter.com, and winespectator.com. Each of these represents a different segment of the wine information economy: Vivino is a consumer app with a proprietary database of user-generated reviews and ratings, Decanter is a British enthusiast publication with exclusive editorial content, and Wine Spectator is the dominant subscription-supported trade magazine in the U.S. market. All three have assets — ratings, reviews, editorial archives — that they may view as proprietary data worth protecting from AI training sets.
The per-site breakdown across the 9 sites that returned a parseable robots.txt:
| Site | Blocks Any AI Crawler? |
|---|---|
| vivino.com | Yes |
| decanter.com | Yes |
| winespectator.com | Yes |
| wine.com | No |
| winefolly.com | No |
| totalwine.com | No |
| jancisrobinson.com | No |
| winecountry.com | No |
| drinkhacker.com | No |
"3 of 9 Wine sites with a parseable robots.txt block at least one AI crawler — a 33.3% block rate that sits right at the corpus-wide average of 33.4% across all 479 sites."
The 6 sites that allow every crawler are wine.com, winefolly.com, totalwine.com, jancisrobinson.com, winecountry.com, and drinkhacker.com. This group spans retail (wine.com, totalwine.com), education and content marketing (winefolly.com, drinkhacker.com), an individual expert critic's site (jancisrobinson.com), and a travel-oriented destination guide (winecountry.com). The pattern is coherent: these sites benefit from being discoverable by AI tools and are not protecting a proprietary review corpus that could conflict with AI training interests.
3 of 9 Wine sites block at least one AI crawler — placing the category right at the corpus average of 33.4%.
The remaining site, winemag.com (Wine Magazine), returned no parseable robots.txt in this snapshot. We cannot attribute either blocking or allowing behavior to it from this data.
vivino.com, decanter.com, and winespectator.com each block at least one AI crawler per the June 2026 sealed snapshot.
The structural split between blockers and allowers in Wine maps cleanly onto content ownership. Sites whose core value is original ratings data or subscriber-only editorial have an incentive to restrict training-data access. Sites whose value is retail, discovery, or open educational content do not.
What This Block Rate Actually Means for the Wine Category
The 33.3% block rate places Wine in interesting company. Travel, Agriculture, Weather, and Beauty all share a 33.3% rate in this snapshot. These are all content-diverse categories where a subset of sites carries proprietary or subscriber-supported editorial while the majority operates on discovery-first principles.
For the Wine category specifically, the split reflects a maturing debate about proprietary review data. Wine ratings are a form of structured expert knowledge — the kind of content that AI systems would eagerly incorporate into taste-profiling or recommendation engines. The blocking sites have the most to lose from unrestricted AI ingestion of their ratings archives.
The non-blocking sites, including jancisrobinson.com, are not naively open. Jancis Robinson is one of the most respected wine critics globally, and her site is educational and widely read — but its content is openly accessible by design, oriented toward wine education rather than subscription-locked data products. The permissive posture aligns with her editorial mission.
For context on a category that blocks at a much higher rate, see how HR sites are drawing lines at 22.2% — notably lower than Wine, because HR platforms that allow crawling tend to be enterprise software doing content marketing, while the two that block are trade editorial sites protecting exclusive content.
Where Wine Sits Among Its Nearest Neighbors
The focused window below shows Wine alongside the categories most closely adjacent in the block-rate ranking. Wine shares the 33.3% tier with Travel, Weather, Beauty, and Agriculture — all drawn from the sealed allCategoriesRanked data.
| Category | Sites Checked | Sites with robots.txt | Sites Blocking Any AI Crawler | Block Rate |
|---|---|---|---|---|
| Jobs | 10 | 8 | 3 | 37.5% |
| Aviation | 10 | 8 | 3 | 37.5% |
| Architecture | 8 | 8 | 3 | 37.5% |
| Travel | 9 | 9 | 3 | 33.3% |
| Weather | 10 | 6 | 2 | 33.3% |
| Beauty | 10 | 6 | 2 | 33.3% |
| Agriculture | 10 | 9 | 3 | 33.3% |
| Wine | 10 | 9 | 3 | 33.3% |
| Legal | 10 | 7 | 2 | 28.6% |
| RealEstate | 10 | 7 | 2 | 28.6% |
| Pets | 10 | 7 | 2 | 28.6% |
Wine shares the 33.3% position with Travel, Weather, Beauty, and Agriculture — all categories where a subset of sites carries valuable, original content worth protecting but the majority remains open. For extremes context: Gaming leads the corpus at 88.9% and News at 82.4%, while the zero-block tier includes Manufacturing, Construction, Logistics, and Toys (all at 0%).
"Wine's 33.3% block rate mirrors the corpus-wide rate of 33.4%, making it a useful reference category for what an average AI-access posture looks like across the web in June 2026."
Who Gets Disallowed — Corpus-Wide Bot Leaderboard
No Toy-specific or Wine-specific bot targeting is possible when only 3 sites are blocking, but the corpus-wide leaderboard shows which bots face the most resistance across all 479 sites. That context is directly relevant to anyone monitoring Wine-sector AI access, since it tells you which bots are most likely to be selectively restricted when a site does choose to block.
| Bot | Sites Blocking This Bot (all 479 corpus sites) |
|---|---|
| CCBot (Common Crawl) | 124 — 25.9% |
| ClaudeBot (Anthropic) | 108 — 22.5% |
| GPTBot (OpenAI) | 97 — 20.3% |
| Bytespider (ByteDance) | 96 — 20% |
| Meta-ExternalAgent (Meta) | 86 — 18% |
| Applebot-Extended (Apple) | 83 — 17.3% |
| Google-Extended (Google) | 83 — 17.3% |
| PerplexityBot (Perplexity) | 75 — 15.7% |
| Amazonbot (Amazon) | 73 — 15.2% |
CCBot leads at 124 sites because its Common Crawl affiliation makes it the longest-standing disallow target; many existing robots.txt files were written to block non-commercial scraping before the current AI-training debate, and CCBot is the canonical example. ClaudeBot and GPTBot follow closely, reflecting blocks added specifically in response to generative AI training concerns since 2023.
CCBot is blocked by 124 of 479 corpus sites — the highest of any single bot in the June 2026 snapshot.
How the Snapshot Was Sealed
The methodology for this report is straightforward: US Tech Automations fetched the robots.txt file from each of the 572 sites in the corpus in a single crawl pass, stored each file verbatim, and sealed the collection under hash 4e7c4a4a3c720f06 on June 14, 2026. A site was counted as blocking if any User-agent directive in its robots.txt matched a recognized AI crawler token and the accompanying Disallow covered the root or entire site.
Sites without a parseable file were excluded from block counts and listed separately. nothing is estimated, modeled, or extrapolated — the counts are verbatim reads from the sealed snapshot.
Collect. Fetch robots.txt at
https://for each of the 572 corpus sites; store raw bytes verbatim./robots.txt Parse. Apply the 9-token AI-crawler bot list; flag any
Disallow: /or functionally equivalent rule as a block.Seal. Compute sha256 of the collected file set; record the hash alongside the counts to enable independent verification.
The llms.txt signal is tracked separately: 102 of 479 sites (21.3%) published an llms.txt file as of the snapshot date. That figure reflects an additional, newer AI-access signaling standard distinct from robots.txt blocking. The same methodology applies across all reports in this batch — for example, see do accounting sites block AI crawlers for how a professional standards body shapes a very different block rate.
Frequently Asked Questions
Q: Why would vivino.com block AI crawlers but wine.com allow them?
A: Vivino's core asset is its proprietary database of user-generated ratings and label data — content it has invested in building and which has commercial value in data licensing and recommendation features. wine.com and totalwine.com are retail platforms whose value is inventory discovery; they benefit from AI tools surfacing their products in shopping contexts. The blocking decision maps directly onto what each site is protecting.
Q: Does a 33.3% block rate mean the Wine category is becoming more restrictive?
A: This report covers a single sealed snapshot — June 14, 2026. It is cross-sectional data only. There is no prior Closing Web edition for direct comparison in this series, so no trend claim is possible. The 33.3% is a point-in-time count, not a directional indicator.
Q: What happens if a site that currently allows AI crawlers changes its robots.txt?
A: The sealed snapshot will not reflect the change. This report describes the state on June 14, 2026 only. Any policy change after that date is outside the scope of this data. Monitoring for drift requires re-querying robots.txt on an ongoing basis — which is precisely what the automated workflows described in the BOFU section below address.
Q: Are there AI crawlers that Wine sites might block specifically that are not on the 9-bot list?
A: Possibly. The 9 bots tracked are the major AI training and retrieval crawlers as recognized by the standard as of the snapshot date: CCBot, ClaudeBot, GPTBot, Bytespider, Meta-ExternalAgent, Applebot-Extended, Google-Extended, PerplexityBot, and Amazonbot. Sites may also restrict other bots not on this list. This report measures only the 9 defined tokens; it makes no claim about other disallow rules.
Q: How does winemag.com's lack of a robots.txt affect the 33.3% calculation?
A: Only sites that returned a parseable robots.txt are included in the blocking calculation. winemag.com is listed in noRobotsSites — it returned no parseable file, so it is not counted in the denominator of 9 or in any blocking figure. The 3 of 9 count and the 33.3% rate are calculated only over the 9 sites that published a file.
Put AI-Access Data to Work
A wine publication data-rights manager or editorial director at a site like Decanter or Wine Spectator — someone responsible for protecting the commercial value of proprietary review content — has a direct use for this monitoring. Knowing that 3 of 9 peers are currently blocking, and which specific operators those blocks target, provides a reference frame for internal policy review.
The actionable workflow: set a weekly automated re-crawl of the 9 Wine sites in this corpus, plus any additional competitive titles, and trigger an alert the moment a site that currently allows all crawlers adds a new Disallow directive. That signal can prompt a policy discussion before a competitive disadvantage compounds. The trigger is a file change; the cadence is weekly.
A wine-tech product lead building a recommendation engine or AI sommelier product that ingests wine review content needs to know which domains are currently open to crawling. The allowerSites list — wine.com, winefolly.com, totalwine.com, jancisrobinson.com, winecountry.com, drinkhacker.com — represents the confirmed-permissive tier as of June 14, 2026. Monitoring those 6 sites for any transition from permissive to blocking converts the snapshot into an early-warning system for training-data availability.
An AI search data-pipeline engineer aggregating food and beverage content for a retrieval-augmented generation system uses the per-bot blocking data to understand which crawlers face the most resistance. If their stack uses ClaudeBot or GPTBot, the corpus-wide block rates — 108 and 97 sites respectively — set realistic expectations for what fraction of a target corpus will be inaccessible via those tokens. For related coverage of how other industry verticals handle AI access, see do toy sites block AI crawlers for the zero-block end of the spectrum.
US Tech Automations automates this monitoring with scheduled robots.txt crawls, change-diff alerting, and a per-category AI-access dashboard that surfaces policy shifts the moment they happen — no manual re-checking required.
Set up automated AI-access policy monitoring for the Wine category
Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 4e7c4a4a3c720f06).
Get this data as a daily feed
The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.
Prefer to talk first? Contact us.
Cite this report
US Tech Automations Research, 2026-06 edition. “Do Wine Sites Block AI Crawlers? 3 of 9 Do.” https://ustechautomations.com/resources/blog/do-wine-sites-block-ai-crawlers-2026
Sealed snapshot sha256: 4e7c4a4a3c720f06
Machine-readable data: CSV · JSON · All research & methodology
About the Author

Helping businesses leverage automation for operational efficiency.