Do Fashion Sites Block AI Crawlers? Sealed robots.txt Data
Fashion's relationship with AI access is more fragmented than almost any other content category we checked — and the fragmentation runs along a clear fault line. The editorial and trade-publishing side of fashion blocks AI crawlers. The commerce-first side largely does not.
3 of 7 Fashion sites with a parseable robots.txt block at least one AI crawler in our June 2026 sealed snapshot — a block rate of 42.9%, slightly below the corpus-wide average of 46.6%. The story in the data is not primarily about that narrow gap; it is about why two of the nine Fashion sites we checked have no robots.txt at all, and what that says about how fashion e-commerce platforms have approached the AI-access question.
A robots.txt file is a plain-text crawl-instruction standard that websites post at their root URL. It communicates permissions to automated bots under an honor-system protocol — not a technical access barrier. This report presents verbatim counts from a sealed snapshot of public robots.txt files collected on June 14, 2026, across 260 sites and 24 categories. The snapshot is content-addressed with sha 834f1e2f07af24fd. To be explicit, nothing is estimated, modeled, or extrapolated — every figure is a direct read from the sealed file.
The Two Missing Files: net-a-porter.com and asos.com
The most distinctive feature of the Fashion category in this snapshot is that 2 of the 9 sites checked — net-a-porter.com and asos.com — returned no parseable robots.txt file. These are two of the highest-traffic fashion e-commerce destinations globally. Their absence from the robots.txt landscape is notable: both are large, technically sophisticated operations with engineering teams that certainly know robots.txt exists.
Under the honor-system default, a missing robots.txt means all paths are open to all crawlers. That is a different signal from a site that publishes a robots.txt with no AI disallows — an explicit open statement. net-a-porter.com and asos.com are simply silent. Whether that silence reflects a deliberate policy choice, a maintenance gap, or a pending update is not knowable from the snapshot alone. What is knowable is that both sites are operating without the first-layer signal mechanism that every other fashion site in our set has deployed.
The 42.9% block rate is computed across the 7 sites that returned a parseable robots.txt. The 2 no-robots sites are counted separately in the snapshot and are not included in that denominator.
3 of 7 Fashion sites with a robots.txt block an AI crawler.
Fashion sites block at a 42.9% rate.
104 of 223 sites block at least one AI crawler.
Key Takeaways
3 of 7 Fashion sites with a parseable robots.txt block at least one AI crawler. That 42.9% rate places Fashion just below the corpus-wide 46.6%.
2 of the 9 Fashion sites checked — net-a-porter.com and asos.com — returned no parseable robots.txt file in the June 14, 2026 snapshot.
The 3 blockers are vogue.com, gq.com, and wwd.com — all editorial or trade-publication properties, not e-commerce sites.
The 4 allowers are elle.com, harpersbazaar.com, ssense.com, and farfetch.com — a mix of editorial and marketplace properties that allow every tracked AI crawler.
Common Crawl is blocked by 85 sites across all 223 sites with a parseable robots.txt in the full corpus — the most-blocked operator in the June 2026 snapshot.
The editorial/commerce divide in Fashion is the sharpest structural pattern in this category's data.
Fashion Sites: The Snapshot
| Metric | Count |
|---|---|
| Fashion sites checked | 9 |
| Sites with a parseable robots.txt | 7 |
| Sites blocking at least one AI crawler | 3 |
| Block rate (of sites with robots.txt) | 42.9% |
Of the 9 Fashion sites we checked, 7 returned a parseable robots.txt. net-a-porter.com and asos.com were the exceptions. Of the 7 with a file, 3 blocked at least one AI crawler.
The 3 blockers: vogue.com, gq.com, and wwd.com. The 4 allowers: elle.com, harpersbazaar.com, ssense.com, and farfetch.com.
vogue.com, gq.com, and wwd.com are the Fashion sites that block AI crawlers — all three are editorial or trade-publication properties, not direct-to-consumer e-commerce.
The editorial logic is consistent with what we see in News (82.4%) and Gaming (88.9%): sites whose commercial value is concentrated in proprietary text content — trend coverage, designer interviews, trade news — are more likely to gate that content from AI training pipelines. vogue.com and gq.com are Conde Nast properties with deep archives of editorial photography descriptions, fashion coverage, and cultural commentary. wwd.com is Women's Wear Daily, a trade publication whose content is the primary product.
By contrast, ssense.com and farfetch.com are marketplace platforms where the commercial value is in the transaction, not in the content. Allowing crawlers serves their discoverability interests; their product catalog data is something they would likely want surfaced in AI-powered search.
How Fashion Compares Across All 24 Categories
| Category | Sites Checked | With robots.txt | Blocking | Block Rate |
|---|---|---|---|---|
| Gaming | 9 | 9 | 8 | 88.9% |
| News | 20 | 17 | 14 | 82.4% |
| Food | 10 | 10 | 7 | 70% |
| Tech | 15 | 13 | 9 | 69.2% |
| Entertainment | 9 | 9 | 6 | 66.7% |
| Healthcare | 10 | 9 | 6 | 66.7% |
| Music | 10 | 9 | 6 | 66.7% |
| Reference | 14 | 11 | 6 | 54.5% |
| Science | 10 | 10 | 5 | 50% |
| Automotive | 10 | 9 | 4 | 44.4% |
| HomeGarden | 10 | 9 | 4 | 44.4% |
| Fashion | 9 | 7 | 3 | 42.9% |
| Social | 10 | 10 | 4 | 40% |
| Sports | 10 | 10 | 4 | 40% |
| Jobs | 10 | 8 | 3 | 37.5% |
| Travel | 9 | 9 | 3 | 33.3% |
| Weather | 10 | 6 | 2 | 33.3% |
| Legal | 10 | 7 | 2 | 28.6% |
| RealEstate | 10 | 7 | 2 | 28.6% |
| Finance | 12 | 11 | 2 | 18.2% |
| Retail | 15 | 12 | 2 | 16.7% |
| Education | 9 | 7 | 1 | 14.3% |
| Government | 9 | 8 | 1 | 12.5% |
| Nonprofit | 10 | 6 | 0 | 0% |
Fashion at 42.9% sits in the middle of the distribution — below the corpus-wide 46.6%, just above Social (40%) and Sports (40%), and below Automotive (44.4%) and HomeGarden (44.4%). The narrow gap below the corpus average is largely explained by the commerce-first character of the Fashion set: when your most prominent sites are e-commerce platforms, the incentive to block AI crawlers is lower than in a category dominated by editorial properties.
The upper tier of the distribution — Gaming (88.9%), News (82.4%), Food (70%), Tech (69.2%) — is dominated by editorial-heavy categories. Fashion has editorial properties too, but they are mixed in with marketplace platforms in a way that pulls the aggregate rate down. For the editorial sub-segment of Fashion, the blocking rate would look much more like Music (66.7%) or Science (50%). For the e-commerce sub-segment, it would look more like Retail (16.7%).
Fashion sits just below the corpus-wide 46.6% block rate — a reflection of the category mixing editorial publishers with e-commerce platforms that have different incentives around AI access.
This pattern is worth comparing to our Science report, where the open-access vs subscription fault line produces a near-perfect 50/50 split along structural lines. Fashion's fault line is editorial vs commerce, and it produces a similar internal division.
Corpus-Wide Bot and Operator Counts
The following tables cover all 223 sites with a parseable robots.txt in the June 2026 snapshot — not just Fashion sites.
Bots blocked most often (across all 223 sites):
| Bot | Sites Blocking It | Share of Corpus |
|---|---|---|
| CCBot | 85 | 38.1% |
| ClaudeBot | 74 | 33.2% |
| Bytespider | 69 | 30.9% |
| GPTBot | 64 | 28.7% |
| Meta-ExternalAgent | 63 | 28.3% |
| PerplexityBot | 60 | 26.9% |
| Applebot-Extended | 60 | 26.9% |
| Google-Extended | 57 | 25.6% |
| Amazonbot | 50 | 22.4% |
Operators blocked most often (across all 223 sites):
| Operator | Sites Blocking Them |
|---|---|
| Common Crawl | 85 |
| Anthropic | 80 |
| Meta | 73 |
| ByteDance | 69 |
| OpenAI | 66 |
| Perplexity | 60 |
| Apple | 60 |
| 57 | |
| Cohere | 56 |
| Diffbot | 55 |
| Amazon | 50 |
| Mistral | 21 |
Common Crawl leads at 85 sites; Anthropic follows at 80 across all 223 sites with a parseable robots.txt. For the Fashion category specifically, the 3 blocking sites — vogue.com, gq.com, and wwd.com — are most likely targeting the large training-pipeline operators at the top of the leaderboard: Common Crawl, Anthropic, OpenAI, and Meta. These represent the pipelines most likely to harvest editorial fashion content for AI systems that can discuss trends, designers, and brand histories.
Mistral at 21 sites is the lowest-count operator in our leaderboard. As Mistral and similar newer-entrant operators scale crawl activity, editorial fashion properties that have not updated their disallow lists may find themselves exposed to operators they had not yet addressed.
Methodology
US Tech Automations fetched robots.txt files from 260 prominent web domains across 24 content categories on June 14, 2026. Each file was parsed against a fixed list of 9 AI crawler user-agent strings from publicly documented bot identities. The snapshot is content-addressed with sha 834f1e2f07af24fd — immutable after the sealing date. Nothing is estimated, modeled, or extrapolated. A site is classified as blocking when it disallows at least one of the 9 tracked bots in its robots.txt. Sites returning no file are counted separately and excluded from the block-rate denominator.
The collection steps:
Fetch. Each domain root was queried for its robots.txt file. Domains with no file or a server error were recorded as no-robots sites.
Parse. Each retrieved file was decomposed into user-agent blocks and evaluated for Disallow directives against the 9 tracked AI bots.
Seal. The full collected dataset was hashed on June 14, 2026, producing content address 834f1e2f07af24fd — verifiable and immutable.
Aggregate. All per-category and corpus-wide counts were computed directly from the sealed file with no estimation or interpolation.
For neighboring categories in the distribution, see the Home and Garden report and the Jobs report for how similarly-placed categories approach AI access.
Frequently Asked Questions
Q: Why do the editorial Fashion sites block while the e-commerce sites allow?
A: Editorial sites — vogue.com, gq.com, wwd.com — hold proprietary text: trend analysis, designer profiles, trade news, cultural commentary. That content has training value for AI systems that discuss fashion. E-commerce platforms — ssense.com, farfetch.com — hold product catalogs and transaction infrastructure. For them, AI discoverability is more valuable than content protection; allowing crawlers helps surface their inventory in AI-powered search. The incentive structures point in opposite directions.
Q: What does it mean that net-a-porter.com and asos.com have no robots.txt?
A: Under the honor-system default, a missing robots.txt means all paths are open to all crawlers. However, the absence of a robots.txt does not mean these sites have a deliberate open-access policy — it may reflect a maintenance gap, a pending update, or a decision not to address the question via robots.txt at all. The absence communicates nothing about legal permissions; those are governed by terms of service and copyright law independently.
Q: Is Fashion above or below the corpus-wide block rate?
A: Fashion at 42.9% sits just below the corpus-wide rate of 46.6% across 223 sites with a parseable robots.txt. Fashion is in the open-majority half of the 24-category distribution, though only narrowly. The categories with the lowest block rates — Finance (18.2%), Retail (16.7%), Education (14.3%), Government (12.5%), Nonprofit (0%) — are structurally different in that they rely less on proprietary editorial content.
Q: Does allowing AI crawlers benefit fashion e-commerce sites?
A: It depends on the operator and use case. Allowing crawlers like GPTBot (OpenAI) or Applebot-Extended means those systems may surface product information in AI-powered search responses. For a marketplace like farfetch.com or ssense.com, that discoverability could drive incremental traffic. For an editorial property like vogue.com, the same exposure is a potential drain on subscription value. The two business models produce opposite conclusions about the same crawl.
Q: How quickly can a site change its robots.txt posture?
A: Immediately. robots.txt files can be updated at any time and take effect as soon as a crawler re-fetches the file. A site that allows all crawlers today can block all of them tomorrow with a single file change. This is why point-in-time sealed snapshots, repeated on a cadence, are the right monitoring mechanism. The June 14, 2026 snapshot is the baseline; recurring checks detect drift from it.
Put AI-Access Data to Work
Fashion at 42.9% with a clear editorial/commerce divide and two major sites missing robots.txt entirely creates a specific and actionable monitoring landscape. Three roles have the most to gain from a recurring workflow here.
An SEO or content strategy lead at a fashion media company should treat the allower sites among editorial peers as a live signal. If elle.com and harpersbazaar.com are allowing all AI crawlers while vogue.com and gq.com are blocking, that asymmetry will eventually show up in which properties get surfaced in AI-generated fashion recommendations. The right cadence: re-crawl these 7 Fashion sites with robots.txt weekly, alert the moment any allower adds a disallow or any blocker removes one. Track the 2 no-robots sites separately — the moment net-a-porter.com or asos.com deploys a robots.txt, it is newsworthy for the category.
A publisher RevOps or brand-protection lead at vogue.com, gq.com, or wwd.com should monitor whether Mistral (currently at only 21 sites in the 12-operator leaderboard) and other lower-ranked operators are increasing crawl activity. A disallow list that addresses Common Crawl and Anthropic but not an emerging operator leaves a gap. A monthly review of the operator leaderboard against your robots.txt disallow list closes that gap proactively.
A retrieval or data-pipeline engineer building a fashion-knowledge layer needs to know that in the current snapshot, elle.com, harpersbazaar.com, ssense.com, and farfetch.com are accessible under honor-system robots.txt readings, while vogue.com, gq.com, and wwd.com are not. Monitoring the four allowers for policy shifts is a production-relevant signal for any pipeline that depends on their content.
US Tech Automations automates scheduled robots.txt and llms.txt crawls across your fashion domain set, routes change-diff alerts when any site updates its AI-access policy, and maintains a live AI-access dashboard — so your team has a real-time view of who is blocking, who is allowing, and when that changes.
Automate AI-access monitoring with agentic workflows
Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 834f1e2f07af24fd).
Get this data as a daily feed
The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.
Prefer to talk first? Contact us.
Cite this report
US Tech Automations Research, 2026-06 edition. “Do Fashion Sites Block AI Crawlers? Sealed robots.txt Data.” https://ustechautomations.com/resources/blog/do-fashion-sites-block-ai-crawlers-2026
Sealed snapshot sha256: 834f1e2f07af24fd
Machine-readable data: CSV · JSON · All research & methodology
About the Author

Helping businesses leverage automation for operational efficiency.