Research & Data

Do Cigar Sites Block AI Crawlers? 1 of 7 Do

Jun 14, 2026

The cigar web is a row of open humidors with one closed door at the end. In our June snapshot, the retailers leave their crawl rules wide open while the category's flagship magazine is the only site that turns an AI crawler away.

1 of 7 Cigar sites block at least one AI crawler.

Of the Cigar sites we checked, 7 returned a parseable robots.txt — the root-level text file that names which automated agents may fetch which paths — and exactly 1 disallows an AI crawler. That is a 14.3% block rate. These are verbatim reads of the sealed file; nothing is estimated, modeled, or extrapolated.

The one blocker is cigaraficionado.com, the category's editorial brand. The six allowers are all online tobacconists. Set against the corpus, where 28% of sites with a policy gate at least one crawler, cigar sits well below that benchmark.

Which Sites Are Blocking — and Which Are Not

The cigar split lines up neatly with what each site sells. The lone blocker, cigaraficionado.com, is a content brand — reviews, ratings, and editorial whose archive is the product. That is precisely the kind of material a publisher has reason to keep out of AI training crawlers, and its single disallow is the whole 14.3%.

The six that allow everything are e-tailers: famous-smoke.com, neptunecigar.com, holts.com, jrcigars.com, smallbatchcigar.com, and atlanticcigar.com. None carries a disallow aimed at an AI agent. For an online tobacconist, an AI assistant reading the catalog is free shelf space in the answer — the last thing they want is to be invisible when a shopper asks for a recommendation.

Three more cigar domains — cigarsinternational.com, thompsoncigar.com, and cigarbid.com — returned no parseable robots.txt at the seal. That is silence, not a stated allow and not a block; we record it as such and exclude it from the rate. The same media-gated, retail-open shape appears in how billiards sites handle AI crawlers, another enthusiast niche of shops orbiting a single hub, and it tips even further open in the zero-block kayaking result.

Precision matters in how we count this. A block, in this study, is an explicit Disallow aimed at a named AI agent — GPTBot, ClaudeBot, CCBot, and the rest of the leaderboard tokens. cigaraficionado.com carries such a directive; the six tobacconists do not. A site can disallow a checkout path or an admin route without touching any AI token, and that is not counted as an AI block here. Only a directive naming an AI agent moves a domain into the blocker column, which is why cigar's count is a clean 1.

That distinction keeps the 14.3% grounded in the file rather than in inference. We are not guessing at intent from a thin sitemap or a redirect; we are reading one specific line that cigaraficionado.com chose to publish. The verbatim read is the whole method.

Of the 7 Cigar sites with a policy, exactly 1 disallows an AI crawler.

Why Cigar Lands Where It Does

A 14.3% rate is low, and it is low for a reason rooted in commerce. When a vertical is built on storefronts, the controlling incentive is to be found, and being found means letting crawlers in. A tobacconist that blocks an AI crawler is opting out of an emerging discovery channel for no offsetting gain.

cigaraficionado.com is the inverse case. Its value sits in written content, so feeding that content to AI training crawlers for free is a decision worth weighing — and it has weighed it toward blocking. That is the same dynamic that separates publishing-heavy categories near the top of the ranking from retail-heavy ones near the bottom.

The takeaway is that cigar reads as a commerce vertical with a single editorial holdout. The retailers set the posture, and the posture is open. A retailer flipping to a block would be the surprising event, and the catchable one.

There is a small-sample caveat worth stating plainly. With only 7 policied files, each site carries a large share of the percentage, so the 14.3% is one tobacconist away from doubling and one editorial mind-change away from halving.

That fragility is not a flaw in the data — it is an honest property of a thin slice, and it is exactly why drift monitoring matters more here than in a category with many policied sites. In a small vertical, a single domain's policy edit can move the headline number, so the value is in catching that edit, not in treating the snapshot as a permanent verdict.

Cigar sites post a 14.3% AI-crawler block rate.

How Cigar Compares to Similar Categories

A 14.3% block rate puts Cigar among the corpus's lighter gatekeepers. The focused window below shows Cigar beside its nearest neighbors in the ranking, taken verbatim from the sealed snapshot — name first, no rank column.

CategorySitesWith robots.txtBlock ≥1 crawlerBlock rate
Soapmaking106116.7%
Education97114.3%
Sailing77114.3%
Cigars107114.3%
Government98112.5%
Crypto98112.5%
Books98112.5%

Cigar shares its exact 14.3% reading with Education and Sailing — a mix of single-blocker categories sitting just above a band of 12.5% verticals. Several here are small-sample niches where one site moves the whole percentage. How sailing sites treat AI crawlers is the closest tie, another seven-file vertical with one blocker. The extremes table sets the scale:

CategorySitesWith robots.txtBlock ≥1 crawlerBlock rate
Gaming99888.9%
News20171482.4%
Geocaching10400%
Kayaking10400%

The distance from Gaming's aggressive gating to the zero-block floor is the corpus in miniature, and cigar sits much nearer the open end.

The Operators Cigar Sites Would Gate First

cigaraficionado.com's single disallow is one data point; the more useful read is which operators get gated most across the whole corpus, since those are the tokens a tobacconist would add first if it ever started. The cut below shows the most-disallowed operators across all 1053 sites, operator name first, count next.

OperatorSites disallowing (all 1053 sites)
Common Crawl221
Anthropic210
OpenAI202
ByteDance190
Meta190

Common Crawl leads, with Anthropic and OpenAI close behind — the same handful of operators that dominate gating everywhere. None of them appears in the six open cigar storefronts, which is exactly why the category reads as permissive.

Corpus-wide, 295 of 1053 sites block at least one AI crawler.

Reading the Sealed Cigar Numbers

These figures come from a single point-in-time crawl of public robots.txt files, sealed June 14, 2026 under snapshot sha d0b7ef205c390023. For each Cigar domain we fetched robots.txt at the root, parsed user-agent and disallow directives, and recorded whether any AI crawler token was disallowed. We report verbatim counts; nothing is estimated, modeled, or extrapolated. Domains with no parseable file — cigarsinternational.com, thompsoncigar.com, cigarbid.com — are logged as silent, not as allows or blocks.

US Tech Automations runs this across 1274 sites checked, 1053 with a parseable robots.txt, spanning 128 categories. Cigar contributes 7 of those files, and we report its slice as exactly the 7 it is — no more.

A note on what the snapshot deliberately avoids. It does not retry a slow host until a file appears, does not follow a redirect into another domain's policy, and does not infer a block from a site that merely seems hostile to bots. Each cigar domain is read once, at seal time, exactly as it answered.

That single-read rule is what makes the result content-addressable: anyone holding sha d0b7ef205c390023 can re-derive the same seven files and the same one blocker. The cost is that a tobacconist briefly offline at seal lands in the no-parseable-file bucket rather than the allow column — which is exactly why cigarsinternational.com, thompsoncigar.com, and cigarbid.com are logged as silent. The method favors reproducibility over a generous reading, and we would rather undercount an open site than guess one into the allow column.

Frequently Asked Questions

Q: Which cigar site is the one blocking AI crawlers?

A: cigaraficionado.com — the category's flagship editorial brand. It is the single domain among the 7 with a policy that disallows an AI crawler, and it accounts for the entire 14.3% block rate. The six online tobacconists all allow every crawler.

Q: Why do the online tobacconists allow every crawler?

A: Discoverability. Retailers like famous-smoke.com and jrcigars.com want AI shopping assistants to read and surface their products, so a block would cost them visibility with no upside. Their robots.txt files contain no AI disallow.

Q: Does the 14.3% figure include every cigar site you checked?

A: No. It covers the 7 sites that returned a parseable robots.txt. Three more — cigarsinternational.com, thompsoncigar.com, and cigarbid.com — produced no parseable file at the seal, so they are excluded from the rate rather than counted either way.

Q: Is a robots.txt block enforceable against an AI crawler?

A: No. robots.txt is an honor-system standard: a cooperative crawler reads the file and complies, but nothing at the network level forces it to. cigaraficionado.com's disallow expresses intent for AI agents to stay out; honoring it is the crawler's choice.

Put AI-Access Data to Work

For a cigar e-commerce or DTC ops lead running a storefront like famous-smoke.com, AI shopping agents are a growing route to the buyer, and this snapshot is the baseline worth protecting: the category is open today, with only the flagship magazine gating. Set a recurring crawl that re-reads robots.txt for cigaraficionado.com, jrcigars.com, and neptunecigar.com weekly, and alert the moment a competing storefront adds an AI crawler token to its disallow list — a rival closing its catalog to AI is an opening to be the named recommendation instead.

A cigar retail catalog manager is the second fit: they can watch the same set to confirm their own listings stay readable as AI buying agents proliferate, and catch any accidental self-block on their domain. US Tech Automations runs these scheduled robots.txt crawls with change alerts so drift surfaces the week it happens, not at the next quarterly review. See how the agentic monitoring works.

Across all 1053 sites, 295 block at least one AI crawler.

Key Takeaways

  • Of the 7 Cigar sites with a parseable robots.txt, 1 blocks at least one AI crawler — a 14.3% rate.

  • The lone blocker is cigaraficionado.com, an editorial brand; the six online tobacconists all allow every crawler.

  • Three sites returned no parseable file and are excluded from the block-rate math.

  • Corpus-wide, 295 of 1053 sites (28%) gate at least one crawler, so cigar sits well below the line.

  • Common Crawl is the most-disallowed operator across all 1053 sites, with Anthropic and OpenAI close behind.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha d0b7ef205c390023).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Cigar Sites Block AI Crawlers? 1 of 7 Do.” https://ustechautomations.com/resources/blog/do-cigar-sites-block-ai-crawlers-2026

Sealed snapshot sha256: d0b7ef205c390023

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.