Research & Data

Do Beauty Sites Block AI Crawlers? 2 of 6 Do

Jun 14, 2026

Among the 10 Beauty sites we checked for the June 2026 Closing Web edition, only 6 returned a parseable robots.txt file — the other four published no AI-access policy through this mechanism at all. Of those 6, exactly 2 block at least one AI crawler: allure.com and byrdie.com. That yields a 33.3% block rate for Beauty, a figure that lands right at the corpus midpoint in this 418-site snapshot (sealed sha 27ca61d890a647db).

What makes Beauty distinctively interesting is not the blockers but the no-robots group: sephora.com, makeupalley.com, beautylish.com, and intothegloss.com all returned no robots.txt in this snapshot. Four of ten sites publishing no policy at all is an unusually high share. The four sites that do have a robots.txt and allow all crawlers — ulta.com, temptalia.com, dermstore.com, and fentybeauty.com — are wide open. The result is a category split three ways: blockers, open-with-policy, and silent.

2 of 6 Beauty sites block at least one AI crawler.

Beauty sites post a 33.3% AI-crawler block rate.

Corpus-wide, 139 of 354 sites block at least one AI crawler.

Key Takeaways

2 of 6 Beauty sites with a parseable robots.txt block at least one AI crawler.

allure.com and byrdie.com are the only Beauty blockers in this sealed snapshot.

The Beauty block rate of 33.3% is slightly below the 39.3% corpus-wide average.

4 of 10 Beauty sites returned no robots.txt at all — an unusually high share.

What This Block Rate Actually Means for Beauty

A robots.txt file is a publicly accessible plain-text document that instructs web crawlers which paths they are and are not permitted to access. It is a signal — a declaration of intent — not a technical lock. A disallow directive for a named AI bot like GPTBot or ClaudeBot tells that crawler to stay out of indicated paths, but only if the crawler honors the protocol. Most major AI operators do.

allure.com and byrdie.com, both editorial beauty publications, have added AI-crawler directives to their robots.txt files. Editorial and media properties are among the categories most likely to block AI crawlers across the broader corpus — News, for instance, posts an 82.4% block rate across all 354 sites with parseable robots.txt in this edition. allure.com and byrdie.com sit closer to the editorial side of the beauty category than the retail side, which may explain why they are the blockers while retail-oriented sites like ulta.com and dermstore.com are not.

The four sites with no robots.txt — sephora.com, makeupalley.com, beautylish.com, and intothegloss.com — are not asserting openness. They simply have not published a directive. By convention, crawlers may proceed in the absence of a robots.txt, but these sites carry no explicit endorsement of AI access either.

Of 10 Beauty sites checked, 6 returned a parseable robots.txt; 2 of those 6 block at least one AI crawler — a 33.3% block rate as of June 14, 2026.

Compare the editorial-vs-retail dynamic in Beauty to the approach taken in Book sites, where a single editorial outlet — bookriot.com — is likewise the lone blocker in an otherwise open category.

The Beauty Category: Sites and Policies

The table below summarizes the AI-access status for each of the 10 Beauty sites checked in this snapshot.

Site	robots.txt Present	Blocks Any AI Crawler
allure.com	Yes	Yes
byrdie.com	Yes	Yes
ulta.com	Yes	No
temptalia.com	Yes	No
dermstore.com	Yes	No
fentybeauty.com	Yes	No
sephora.com	No	—
makeupalley.com	No	—
beautylish.com	No	—
intothegloss.com	No	—

ulta.com, temptalia.com, dermstore.com, and fentybeauty.com are the confirmed open sites — all with parseable robots.txt files that contain no disallow directives for known AI crawlers. The four no-robots sites are excluded from the block-rate denominator; the 33.3% figure applies strictly to the 6 sites with a parseable file.

Where Beauty Lands Among All 40 Categories

The category-level view shows how Beauty's 33.3% block rate positions it within the full 40-category corpus.

Category	Sites Checked	With robots.txt	Any Blocker	Block Rate
Gaming	9	9	8	88.9%
News	20	17	14	82.4%
Food	10	10	7	70%
Tech	15	13	9	69.2%
Entertainment	9	9	6	66.7%
Healthcare	10	9	6	66.7%
Music	10	9	6	66.7%
Parenting	10	8	5	62.5%
Outdoors	10	5	3	60%
Reference	14	11	6	54.5%
Science	10	10	5	50%
Wedding	10	8	4	50%
Automotive	10	9	4	44.4%
HomeGarden	10	9	4	44.4%
Fashion	9	7	3	42.9%
Social	10	10	4	40%
Sports	10	10	4	40%
Fitness	10	10	4	40%
Photography	10	10	4	40%
Genealogy	10	10	4	40%
Jobs	10	8	3	37.5%
Travel	9	9	3	33.3%
Weather	10	6	2	33.3%
Beauty	10	6	2	33.3%
Legal	10	7	2	28.6%
RealEstate	10	7	2	28.6%
Pets	10	7	2	28.6%
Crafts	10	8	2	25%
Finance	12	11	2	18.2%
Retail	15	12	2	16.7%
Education	9	7	1	14.3%
Government	9	8	1	12.5%
Crypto	9	8	1	12.5%
Books	9	8	1	12.5%
Religion	10	9	1	11.1%
Insurance	10	9	1	11.1%
Productivity	10	10	1	10%
Nonprofit	10	6	0	0%
Streaming	10	10	0	0%
Dating	10	5	0	0%

Beauty at 33.3% ties with Travel and Weather. Corpus-wide, 139 of 354 sites with parseable robots.txt files block at least one AI crawler, for a 39.3% average. Beauty lands just below that line — slightly more open than the overall corpus, but not dramatically so.

Which Bots Are Most Blocked Across the Full Corpus

Because only 2 Beauty sites are blocking, the per-bot breakdown for Beauty alone would not be meaningful. The more informative view is how individual AI bots fare across all 354 sites with parseable robots.txt — the full corpus-wide picture.

Bot	Sites Blocking (all 354)	Block Rate (all 354)
CCBot	109	30.8%
ClaudeBot	96	27.1%
GPTBot	83	23.4%
Bytespider	83	23.4%
Meta-ExternalAgent	78	22%
Google-Extended	76	21.5%
Applebot-Extended	74	20.9%
PerplexityBot	73	20.6%
Amazonbot	64	18.1%

CCBot — Common Crawl's indexer — is the single most blocked bot across the 418-site corpus, facing restrictions at 109 sites, or 30.8% of all sites with a parseable robots.txt. Anthropic's ClaudeBot is second at 96 blocks. GPTBot and Bytespider tie at 83 each. For the Beauty category specifically, the 2 blockers have each added at least one of these bot-level disallow directives — which ones precisely is a per-site detail the snapshot captures but this summary-level report does not itemize.

CCBot is blocked by 109 of 354 sites with parseable robots.txt in the June 2026 corpus — 30.8%.

Methodology: How This Snapshot Was Built

We fetched the robots.txt file at each of 418 sites on June 14, 2026, parsed each file for User-agent directives matching known AI crawler tokens, and sealed the result as sha 27ca61d890a647db. This data is cross-sectional: one point in time, nothing is estimated, modeled, or extrapolated. Counts are verbatim from the sealed snapshot.

Collect. Each site was fetched at its canonical robots.txt path. Responses were captured as received; no normalization of content was applied.
Parse. Files were parsed for User-agent blocks matching the 9 AI bot tokens in the leaderboard. A site is flagged as blocking if at least one recognized AI-crawler token appears with a Disallow directive covering at least the root path.
Seal. The complete dataset was content-addressed at sha 27ca61d890a647db, anchoring every figure in this report to that exact crawl state.

A missing robots.txt is neither a block nor an explicit allow — it simply means no public policy was published through this mechanism on the date of the snapshot.

Frequently Asked Questions

Q: How can sephora.com — one of the largest beauty retailers — have no robots.txt?

A: A missing robots.txt is more common than it might seem, even on large sites. Sephora may manage crawler access through other technical means (meta robots tags, server-level headers, CDN rules) or may simply not have prioritized robots.txt maintenance. The absence does not mean Sephora has no crawler policy — it means that policy, if any, is not expressed through a public robots.txt file that our snapshot could read.

Q: Why do editorial beauty sites block AI crawlers while retail beauty sites do not?

A: That pattern is visible in the data — allure.com and byrdie.com are editorial publishers, while ulta.com, dermstore.com, and fentybeauty.com are retail-oriented. Editorial content is generally more concerned with reproduction rights and AI summarization of original journalism or reviews, while retail sites may see AI crawling as neutral or beneficial for product-discovery pipelines. The data shows the outcome, not the reasoning.

Q: Is 33.3% a high or low block rate for a consumer content category?

A: Relative to the 39.3% corpus-wide average, 33.3% is slightly below average — modestly more open than the typical site in this edition. Among consumer lifestyle categories, Beauty is comparable to Travel and Weather, both also at 33.3%.

Q: What is the difference between a site blocking one AI crawler versus all of them?

A: The sealed snapshot flags any site that disallows at least one recognized AI-crawler token. A site that blocks only CCBot is treated the same as a site that blocks all 9 bots in terms of the binary "any blocker" flag. The per-bot breakdown in the leaderboard shows how many sites have blocked each individual bot across the full corpus.

Put AI-Access Data to Work

Three recurring workflows benefit directly from tracking Beauty category AI-access policy over time.

Content sourcing teams at AI companies that use beauty editorial and retail content for product recommendations, summarization, or fine-tuning need to monitor whether the two current blockers — allure.com and byrdie.com — change their posture, and whether any of the four silent sites eventually publish a blocking robots.txt. A weekly re-crawl of these 10 sites, with alerts triggered by any new Disallow directive, gives content teams advance notice. US Tech Automations builds and schedules exactly this kind of automated monitoring — robots.txt change detection, policy diffing, and routed alerts to the appropriate pipeline owner.

Brand and SEO teams at beauty retailers track how AI crawlers access competitor and editorial content. If allure.com or byrdie.com expands its block list to include bots currently allowed, that shifts the content-sourcing landscape for any AI-powered product recommendation engine that relies on those reviews. A quarterly audit comparing current robots.txt files to the sealed June 14, 2026 baseline gives a defensible, time-stamped record of any drift.

Digital rights and AI-governance advisors working with beauty publishers need an evidence base for conversations about AI training data. The sealed snapshot — sha 27ca61d890a647db — is the kind of verifiable, independently reproducible artifact that supports legal or licensing discussions about what content was or was not accessible to AI crawlers at a specific date.

US Tech Automations automates this monitoring — scheduled crawls, change alerts, and policy dashboards for teams that need to stay current on AI-access drift without doing it by hand.

For comparison, see how Insurance sites approach AI-access policy — a closely related consumer-financial category that is even more permissive — and the Wedding category report for another mid-range consumer vertical.

Curious how Beauty sites compare across every vertical? Our flagship study tracks how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 27ca61d890a647db).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Beauty Sites Block AI Crawlers? 2 of 6 Do.” https://ustechautomations.com/resources/blog/do-beauty-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 27ca61d890a647db

Machine-readable data: CSV · JSON · All research & methodology