Research & Data

Do Beauty Sites Block AI Crawlers? 2 of 6 Do

Jun 14, 2026

Among the 10 Beauty sites we checked for the June 2026 Closing Web edition, only 6 returned a parseable robots.txt file — the other four published no AI-access policy through this mechanism at all. Of those 6, exactly 2 block at least one AI crawler: allure.com and byrdie.com. That yields a 33.3% block rate for Beauty, a figure that lands right at the corpus midpoint in this 418-site snapshot (sealed sha 27ca61d890a647db).

What makes Beauty distinctively interesting is not the blockers but the no-robots group: sephora.com, makeupalley.com, beautylish.com, and intothegloss.com all returned no robots.txt in this snapshot. Four of ten sites publishing no policy at all is an unusually high share. The four sites that do have a robots.txt and allow all crawlers — ulta.com, temptalia.com, dermstore.com, and fentybeauty.com — are wide open. The result is a category split three ways: blockers, open-with-policy, and silent.

2 of 6 Beauty sites block at least one AI crawler.

Beauty sites post a 33.3% AI-crawler block rate.

Corpus-wide, 139 of 354 sites block at least one AI crawler.

Key Takeaways

2 of 6 Beauty sites with a parseable robots.txt block at least one AI crawler.

allure.com and byrdie.com are the only Beauty blockers in this sealed snapshot.

The Beauty block rate of 33.3% is slightly below the 39.3% corpus-wide average.

4 of 10 Beauty sites returned no robots.txt at all — an unusually high share.

What This Block Rate Actually Means for Beauty

A robots.txt file is a publicly accessible plain-text document that instructs web crawlers which paths they are and are not permitted to access. It is a signal — a declaration of intent — not a technical lock. A disallow directive for a named AI bot like GPTBot or ClaudeBot tells that crawler to stay out of indicated paths, but only if the crawler honors the protocol. Most major AI operators do.

allure.com and byrdie.com, both editorial beauty publications, have added AI-crawler directives to their robots.txt files. Editorial and media properties are among the categories most likely to block AI crawlers across the broader corpus — News, for instance, posts an 82.4% block rate across all 354 sites with parseable robots.txt in this edition. allure.com and byrdie.com sit closer to the editorial side of the beauty category than the retail side, which may explain why they are the blockers while retail-oriented sites like ulta.com and dermstore.com are not.

The four sites with no robots.txt — sephora.com, makeupalley.com, beautylish.com, and intothegloss.com — are not asserting openness. They simply have not published a directive. By convention, crawlers may proceed in the absence of a robots.txt, but these sites carry no explicit endorsement of AI access either.

Of 10 Beauty sites checked, 6 returned a parseable robots.txt; 2 of those 6 block at least one AI crawler — a 33.3% block rate as of June 14, 2026.

Compare the editorial-vs-retail dynamic in Beauty to the approach taken in Book sites, where a single editorial outlet — bookriot.com — is likewise the lone blocker in an otherwise open category.

The Beauty Category: Sites and Policies

The table below summarizes the AI-access status for each of the 10 Beauty sites checked in this snapshot.

Siterobots.txt PresentBlocks Any AI Crawler
allure.comYesYes
byrdie.comYesYes
ulta.comYesNo
temptalia.comYesNo
dermstore.comYesNo
fentybeauty.comYesNo
sephora.comNo
makeupalley.comNo
beautylish.comNo
intothegloss.comNo

ulta.com, temptalia.com, dermstore.com, and fentybeauty.com are the confirmed open sites — all with parseable robots.txt files that contain no disallow directives for known AI crawlers. The four no-robots sites are excluded from the block-rate denominator; the 33.3% figure applies strictly to the 6 sites with a parseable file.

Where Beauty Lands Among All 40 Categories

The category-level view shows how Beauty's 33.3% block rate positions it within the full 40-category corpus.

CategorySites CheckedWith robots.txtAny BlockerBlock Rate
Gaming99888.9%
News20171482.4%
Food1010770%
Tech1513969.2%
Entertainment99666.7%
Healthcare109666.7%
Music109666.7%
Parenting108562.5%
Outdoors105360%
Reference1411654.5%
Science1010550%
Wedding108450%
Automotive109444.4%
HomeGarden109444.4%
Fashion97342.9%
Social1010440%
Sports1010440%
Fitness1010440%
Photography1010440%
Genealogy1010440%
Jobs108337.5%
Travel99333.3%
Weather106233.3%
Beauty106233.3%
Legal107228.6%
RealEstate107228.6%
Pets107228.6%
Crafts108225%
Finance1211218.2%
Retail1512216.7%
Education97114.3%
Government98112.5%
Crypto98112.5%
Books98112.5%
Religion109111.1%
Insurance109111.1%
Productivity1010110%
Nonprofit10600%
Streaming101000%
Dating10500%

Beauty at 33.3% ties with Travel and Weather. Corpus-wide, 139 of 354 sites with parseable robots.txt files block at least one AI crawler, for a 39.3% average. Beauty lands just below that line — slightly more open than the overall corpus, but not dramatically so.

Which Bots Are Most Blocked Across the Full Corpus

Because only 2 Beauty sites are blocking, the per-bot breakdown for Beauty alone would not be meaningful. The more informative view is how individual AI bots fare across all 354 sites with parseable robots.txt — the full corpus-wide picture.

BotSites Blocking (all 354)Block Rate (all 354)
CCBot10930.8%
ClaudeBot9627.1%
GPTBot8323.4%
Bytespider8323.4%
Meta-ExternalAgent7822%
Google-Extended7621.5%
Applebot-Extended7420.9%
PerplexityBot7320.6%
Amazonbot6418.1%

CCBot — Common Crawl's indexer — is the single most blocked bot across the 418-site corpus, facing restrictions at 109 sites, or 30.8% of all sites with a parseable robots.txt. Anthropic's ClaudeBot is second at 96 blocks. GPTBot and Bytespider tie at 83 each. For the Beauty category specifically, the 2 blockers have each added at least one of these bot-level disallow directives — which ones precisely is a per-site detail the snapshot captures but this summary-level report does not itemize.

CCBot is blocked by 109 of 354 sites with parseable robots.txt in the June 2026 corpus — 30.8%.

Methodology: How This Snapshot Was Built

We fetched the robots.txt file at each of 418 sites on June 14, 2026, parsed each file for User-agent directives matching known AI crawler tokens, and sealed the result as sha 27ca61d890a647db. This data is cross-sectional: one point in time, nothing is estimated, modeled, or extrapolated. Counts are verbatim from the sealed snapshot.

  1. Collect. Each site was fetched at its canonical robots.txt path. Responses were captured as received; no normalization of content was applied.

  2. Parse. Files were parsed for User-agent blocks matching the 9 AI bot tokens in the leaderboard. A site is flagged as blocking if at least one recognized AI-crawler token appears with a Disallow directive covering at least the root path.

  3. Seal. The complete dataset was content-addressed at sha 27ca61d890a647db, anchoring every figure in this report to that exact crawl state.

A missing robots.txt is neither a block nor an explicit allow — it simply means no public policy was published through this mechanism on the date of the snapshot.

Frequently Asked Questions

Q: How can sephora.com — one of the largest beauty retailers — have no robots.txt?

A: A missing robots.txt is more common than it might seem, even on large sites. Sephora may manage crawler access through other technical means (meta robots tags, server-level headers, CDN rules) or may simply not have prioritized robots.txt maintenance. The absence does not mean Sephora has no crawler policy — it means that policy, if any, is not expressed through a public robots.txt file that our snapshot could read.

Q: Why do editorial beauty sites block AI crawlers while retail beauty sites do not?

A: That pattern is visible in the data — allure.com and byrdie.com are editorial publishers, while ulta.com, dermstore.com, and fentybeauty.com are retail-oriented. Editorial content is generally more concerned with reproduction rights and AI summarization of original journalism or reviews, while retail sites may see AI crawling as neutral or beneficial for product-discovery pipelines. The data shows the outcome, not the reasoning.

Q: Is 33.3% a high or low block rate for a consumer content category?

A: Relative to the 39.3% corpus-wide average, 33.3% is slightly below average — modestly more open than the typical site in this edition. Among consumer lifestyle categories, Beauty is comparable to Travel and Weather, both also at 33.3%.

Q: What is the difference between a site blocking one AI crawler versus all of them?

A: The sealed snapshot flags any site that disallows at least one recognized AI-crawler token. A site that blocks only CCBot is treated the same as a site that blocks all 9 bots in terms of the binary "any blocker" flag. The per-bot breakdown in the leaderboard shows how many sites have blocked each individual bot across the full corpus.

Put AI-Access Data to Work

Three recurring workflows benefit directly from tracking Beauty category AI-access policy over time.

Content sourcing teams at AI companies that use beauty editorial and retail content for product recommendations, summarization, or fine-tuning need to monitor whether the two current blockers — allure.com and byrdie.com — change their posture, and whether any of the four silent sites eventually publish a blocking robots.txt. A weekly re-crawl of these 10 sites, with alerts triggered by any new Disallow directive, gives content teams advance notice. US Tech Automations builds and schedules exactly this kind of automated monitoring — robots.txt change detection, policy diffing, and routed alerts to the appropriate pipeline owner.

Brand and SEO teams at beauty retailers track how AI crawlers access competitor and editorial content. If allure.com or byrdie.com expands its block list to include bots currently allowed, that shifts the content-sourcing landscape for any AI-powered product recommendation engine that relies on those reviews. A quarterly audit comparing current robots.txt files to the sealed June 14, 2026 baseline gives a defensible, time-stamped record of any drift.

Digital rights and AI-governance advisors working with beauty publishers need an evidence base for conversations about AI training data. The sealed snapshot — sha 27ca61d890a647db — is the kind of verifiable, independently reproducible artifact that supports legal or licensing discussions about what content was or was not accessible to AI crawlers at a specific date.

US Tech Automations automates this monitoring — scheduled crawls, change alerts, and policy dashboards for teams that need to stay current on AI-access drift without doing it by hand.

For comparison, see how Insurance sites approach AI-access policy — a closely related consumer-financial category that is even more permissive — and the Wedding category report for another mid-range consumer vertical.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 27ca61d890a647db).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Beauty Sites Block AI Crawlers? 2 of 6 Do.” https://ustechautomations.com/resources/blog/do-beauty-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 27ca61d890a647db

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.