Do Fragrance Sites Block AI Crawlers? 2 of 7 Do
Fragrance is the most-gated of the four niches in this batch, and the reason is its two giant databases. The community archives that catalog millions of scents are the sites pulling up the drawbridge, while the boutique sellers leave the path open.
2 of 7 Fragrance sites block at least one AI crawler.
Of the Fragrance sites we checked, 7 returned a parseable robots.txt — the root-level file that tells automated agents which paths they may fetch — and 2 of those disallow an AI crawler. That is a 28.6% block rate. Every number here is read straight from the sealed file; nothing is estimated, modeled, or extrapolated.
The two blockers are fragrantica.com and basenotes.net, both large community fragrance databases. The five allowers are niche perfume retailers and review sites. Against the corpus, where 28% of sites with a policy gate at least one crawler, fragrance lands right around that line.
Who Gets Disallowed Here
Fragrance is unusual among this batch because its blockers are the heavyweights. fragrantica.com and basenotes.net are vast, user-built scent encyclopedias — structured catalogs of notes, houses, and reviews that took years of community effort to assemble. That kind of proprietary, hard-won database is exactly what a site has reason to keep out of AI training crawlers, and both of them do.
The five that allow everything are smaller, commerce-leaning operations: luckyscent.com, surrendertochance.com, perfumeposse.com, nstperfume.com, and theperfumedcourt.com — a mix of niche e-tailers, sample shops, and review blogs. None disallows an AI agent. For a boutique seller or a decant shop, being readable by an AI assistant keeps the products and writeups eligible to surface in an answer.
Three more fragrance domains — fragrancex.com, scentbird.com, and perfume.com — returned no parseable robots.txt at the seal, so they are silent, neither allow nor block. That two database sites drive the entire rate is the distinctive read here: this is a content-asset story, not a retail one. The contrast with how cigar sites handle AI crawlers is instructive — there a single magazine gates, here two data archives do — and with the single-blocker billiards result, where one hub gates among eight open shops.
The counting rule is the same throughout. A block is an explicit Disallow aimed at a named AI agent — GPTBot, ClaudeBot, CCBot, and the other leaderboard tokens. fragrantica.com and basenotes.net each carry such a directive; the five smaller sites do not. A domain can disallow administrative or search paths without naming an AI agent, and that does not count as an AI block here. Only a directive that names one moves a site into the blocker column, which is why fragrance's count is a clean 2.
That precision is what makes the 28.6% defensible. We are reading specific lines that two specific databases chose to publish, not inferring stance from missing files or server behavior. The two blockers earned their place in the column by writing the directive, and the read is verbatim.
Both fragrance blockers — fragrantica.com and basenotes.net — are community scent databases.
Why Fragrance Lands on the Corpus Average
A 28.6% rate is the highest in this batch, and it sits almost exactly on the corpus average. The driver is asset type rather than category size. When the most valuable thing a site owns is a structured database, the math on AI access changes: feeding that catalog to training crawlers for free risks giving away the very thing that makes the site worth visiting.
That is why fragrance reads higher than its retail-heavy cousins. The boutique sellers behave like every other storefront — open, because discoverability pays. But the two database giants behave like publishers protecting an archive, and in a seven-file sample, two blockers is enough to lift the whole category to the corpus line.
So the honest interpretation is a category of two minds: commerce wants in, the databases want control. The percentage is the weighted average of those two postures.
The small sample sharpens this rather than weakening it. With only 7 policied files, the two database blockers carry an outsized share of the headline, so the 28.6% is really a story about two specific decisions at fragrantica.com and basenotes.net. That concentration is itself the finding: in fragrance, AI-access posture is not set by the long tail of small sellers but by a short list of large data owners. Track those two and you have tracked most of what moves the category's number — which is the opposite of a category where gating is spread thinly across many similar sites.
Fragrance sites post a 28.6% AI-crawler block rate.
Where Fragrance Sits in the Corpus
A 28.6% block rate places Fragrance right at the corpus average, mid-pack rather than at either extreme. The focused window below shows Fragrance beside its nearest neighbors in the ranking, verbatim from the sealed snapshot — name first, no rank column.
| Category | Sites | With robots.txt | Block ≥1 crawler | Block rate |
|---|---|---|---|---|
| Legal | 10 | 7 | 2 | 28.6% |
| RealEstate | 10 | 7 | 2 | 28.6% |
| Pets | 10 | 7 | 2 | 28.6% |
| Chess | 10 | 7 | 2 | 28.6% |
| Knitting | 9 | 7 | 2 | 28.6% |
| Fragrance | 10 | 7 | 2 | 28.6% |
| Crafts | 10 | 8 | 2 | 25% |
Fragrance shares its 28.6% reading with a broad mix — Legal, Real Estate, Pets, Chess, and Knitting all land on the same two-blocker mark, just above a 25% band. It is a crowded part of the ranking, which is itself a sign that fragrance is unremarkable against the average. The extremes table shows what the ends look like:
| Category | Sites | With robots.txt | Block ≥1 crawler | Block rate |
|---|---|---|---|---|
| Gaming | 9 | 9 | 8 | 88.9% |
| Food | 10 | 10 | 7 | 70% |
| Bowling | 10 | 9 | 0 | 0% |
| Kayaking | 10 | 4 | 0 | 0% |
Fragrance sits between these poles, far below Gaming and well above the zero-block floor that kayaking's open paddlesports policies define.
The Bots Fragrance Databases Gate Most
The two fragrance blockers add to a much larger corpus pattern, and knowing which bots get gated most tells a brand which token a database competitor reached for first. The cut below shows the most-disallowed bots across all 1053 sites, bot name first, count next.
| Bot | Sites disallowing (all 1053 sites) |
|---|---|
| CCBot | 221 |
| ClaudeBot | 197 |
| GPTBot | 197 |
| Bytespider | 190 |
| Meta-ExternalAgent | 168 |
CCBot, Common Crawl's agent, tops the corpus blocklist, with ClaudeBot and GPTBot tied right behind. The fragrance database sites that block are joining this broad pattern of gating the highest-volume training crawlers first.
Corpus-wide, 295 of 1053 sites block at least one AI crawler.
How the Fragrance Snapshot Was Sealed
These figures come from one point-in-time crawl of public robots.txt files, sealed June 14, 2026 under snapshot sha d0b7ef205c390023. For each Fragrance domain we fetched robots.txt at the root, parsed its user-agent and disallow directives, and recorded whether any AI crawler token was disallowed. We report verbatim counts; nothing is estimated, modeled, or extrapolated. Domains with no parseable file — fragrancex.com, scentbird.com, perfume.com — are logged as silent, neither allow nor block.
US Tech Automations runs this read across 1274 sites checked, 1053 with a parseable robots.txt, spanning 128 categories. Fragrance contributes 7 of those files, and we report its slice as exactly the 7 it is.
A note on what the snapshot deliberately does not do. It does not retry a slow host until a file appears, does not follow a redirect into a different domain's policy, and does not infer a block from a site that merely looks unfriendly to bots. Each fragrance domain is read once, at seal time, exactly as it answered.
That single-read rule is what makes the result content-addressable: anyone holding sha d0b7ef205c390023 can re-derive the same seven files and the same two blockers. The cost is that a seller briefly offline at seal lands in the no-parseable-file bucket rather than the allow column — which is why fragrancex.com, scentbird.com, and perfume.com are logged as silent. The method favors reproducibility over a generous reading, and we would rather undercount an open site than guess one into the allow column.
Frequently Asked Questions
Q: Which two fragrance sites block AI crawlers?
A: fragrantica.com and basenotes.net — both large community scent databases. They are the two domains among the 7 with a policy that disallow an AI crawler, together making the 28.6% block rate. The five smaller retail and review sites all allow every crawler.
Q: Why do the database sites block while the boutique sellers do not?
A: Asset type. fragrantica.com and basenotes.net own structured, community-built catalogs that are the reason to visit, so they have cause to keep that data out of AI training crawlers. Sellers like luckyscent.com gain from discoverability, so they stay open.
Q: Does the 28.6% rate cover all the fragrance sites you found?
A: No. It covers the 7 sites that returned a parseable robots.txt. Three more — fragrancex.com, scentbird.com, and perfume.com — produced no parseable file at the seal, so they are excluded from the rate rather than counted as allows or blocks.
Q: Does blocking a crawler in robots.txt actually stop it?
A: Not by force. robots.txt is an honor-system standard: a cooperative crawler reads it and complies, but the file enforces nothing technically. fragrantica.com and basenotes.net signal that AI agents should stay out; each crawler decides whether to honor that request.
Put AI-Access Data to Work
For a fragrance e-commerce or DTC growth lead running a storefront like luckyscent.com, AI shopping agents are an emerging discovery channel, and this snapshot is the baseline: the sellers are open while the two database giants gate. Set a recurring crawl that re-reads robots.txt for fragrantica.com, basenotes.net, and theperfumedcourt.com weekly, and alert the moment any retail competitor adds an AI crawler token to its disallow list — and watch the databases, since a shift there reshapes what AI assistants can say about scents at all.
A fragrance retail merchandising or RevOps manager is the second fit: they can monitor the same set to keep their own catalog readable as AI buying agents grow, and catch any accidental self-block. US Tech Automations runs these scheduled robots.txt crawls with change alerts so a policy shift surfaces the week it lands rather than at the next audit. See how the agentic monitoring works.
Corpus-wide, 280 of 1053 sites publish an llms.txt file.
Key Takeaways
Of the 7 Fragrance sites with a parseable robots.txt, 2 block at least one AI crawler — a 28.6% rate, the highest in this batch.
Both blockers, fragrantica.com and basenotes.net, are community scent databases; the five smaller sellers and review sites all allow every crawler.
Three sites returned no parseable file and are excluded from the block-rate math.
Corpus-wide, 295 of 1053 sites (28%) gate at least one crawler, so fragrance lands right on the line.
CCBot is the most-disallowed bot across all 1053 sites, with ClaudeBot and GPTBot tied just behind.
Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha d0b7ef205c390023).
Get this data as a daily feed
The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.
Prefer to talk first? Contact us.
Cite this report
US Tech Automations Research, 2026-06 edition. “Do Fragrance Sites Block AI Crawlers? 2 of 7 Do.” https://ustechautomations.com/resources/blog/do-fragrance-sites-block-ai-crawlers-2026
Sealed snapshot sha256: d0b7ef205c390023
Machine-readable data: CSV · JSON · All research & methodology
About the Author

Helping businesses leverage automation for operational efficiency.