Research & Data

Do Sneaker Sites Block AI Crawlers? 2 of 6 Do

Jun 19, 2026

Sneaker sites live or die on attention. The whole business runs on release calendars, hype cycles, and resale prices that people search for constantly — which makes how these sites treat AI crawlers an unusually pointed question. If an answer engine cannot read a sneaker site, that site disappears from the very searches its audience now starts inside a chatbot.

2 of 6 Sneaker sites block at least one AI crawler.

Of the sneaker domains we checked, six returned a parseable robots.txt — the root-level file that tells automated agents which paths they may fetch — and two of those disallow an AI crawler. That works out to a 33.3% block rate. Every figure here is read straight from the sealed snapshot; nothing is estimated, modeled, or extrapolated.

The two blockers are sneakernews.com and kicksonfire.com — both editorial sneaker-news sites, not stores. The retailers and marketplaces leave the door open. Against the corpus, where 317 of 1203 sites with a policy gate at least one crawler for a 26.4% rate, sneaker sites sit a touch above the average — pulled up by their news arms.

The Two Sneaker Sites That Gate, and the Four That Do Not

What separates the sneaker blockers from the rest is what kind of site they are. sneakernews.com and kicksonfire.com are publishers: their value is original release coverage, leak reporting, and review archives. That archive is the product, so gating bulk AI harvesting of it is a familiar publisher move. Both files disallow the same wide group of agents — GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Meta-ExternalAgent, Amazonbot, and Applebot-Extended. That is OpenAI, Anthropic, Google, Common Crawl, ByteDance, Meta, Amazon, and Apple, all named explicitly. When a sneaker-news site closes, it closes most of the leaderboard at once.

The open sneaker domains are the commerce side of the category: stockx.com, footlocker.com, nicekicks.com, and hypebeast.com. None of them disallows an AI agent, the same wide-open posture you see in the aquarium category, where not a single site gates a crawler. A resale marketplace or a retailer runs on being found — the product pages, the price data, the drop pages are meant to be surfaced, cited, and clicked, including by an AI assistant fielding a question about a release or a resale value. For a store, AI readability is distribution, not exposure.

The two sneaker blockers are sneakernews.com and kicksonfire.com — both editorial, not retail.

Four sneaker domains — goat.com, flightclub.com, solecollector.com, and finishline.com — returned no parseable robots.txt at the seal. They are therefore silent: neither an allow nor a block, and excluded from the rate entirely. Those four timeouts or empty responses are why the denominator is six rather than the ten sites we checked. It would be wrong to read any of that silence as a stance; it is an artifact of a busy host at one moment in time.

What This 33.3% Block Rate Actually Means

A building permit is a public record; a robots.txt directive is a public request — and the sneaker read is split clean down the middle of the category's two business models. The honest interpretation is that sneaker retail behaves like an open publisher while sneaker media behaves like a guarded one. Both blockers are content sites whose archives are their moat; every open domain is a store or marketplace whose pages are its storefront.

That split is the whole finding. In a six-file sample, two comprehensive blockers — both editorial — are enough to lift the category above the corpus average, and they land it at exactly 33.3%. The number is not telling you sneaker sites are broadly cautious; it is telling you the two news outlets in this set are, and the commerce sites are not.

The small sample sharpens this rather than weakening it. With six policied files, the read is really a story about ten named sites and two decisions, both at content publishers. Track whether the marketplaces ever flip — whether a stockx.com or a footlocker.com decides its price data is an asset to wall off rather than a draw — and you have tracked most of what could move this category's number.

Sneaker sites post a 33.3% AI-crawler block rate.

This is a different shape of story than the most-gated categories in the edition, where nearly every site blocks because its archive is the product. Sneaker commerce treats its catalog as a reason to be visited; sneaker media treats its archive as a thing to protect. That contrast lives inside one category.

Where Sneaker Sites Sit Among Similar Categories

A 33.3% block rate places Sneakers in the middle of the ranking — above the open floor, well below the gated ceiling. The focused window below shows Sneakers beside its nearest neighbors, verbatim from the sealed snapshot, name first and no rank column.

Category	Sites	With robots.txt	Block at least 1 crawler	Block rate
Agriculture	10	9	3	33.3%
Beauty	10	6	2	33.3%
Sneakers	10	6	2	33.3%
HamRadio	10	6	2	33.3%
Supplements	10	6	2	33.3%
TabletopRPG	10	6	2	33.3%
Travel	9	9	3	33.3%

Sneakers share their 33.3% reading with a broad, mixed band — Beauty, Ham Radio, Supplements, and Tabletop RPG all land on the same two-blocker mark from the same six-file denominator. It is a crowded part of the ranking, which is itself a sign that one-in-three is a common middle posture: most sites in these categories want to be readable, while a guarded minority does not. The extremes show what the ends look like:

Category	Sites	With robots.txt	Block at least 1 crawler	Block rate
Gaming	9	9	8	88.9%
News	20	17	14	82.4%
FastFood	10	6	0	0%
Hotels	10	3	0	0%

Sneakers sit far below Gaming and News — categories where the archive is the business — and well above the zero-block floor that fast-food chains define with their open policies. The category is open where it sells and gated where it publishes.

The Bots the Sneaker Blockers Reach For

Both sneaker blockers are comprehensive, so the useful corpus context is which bots get gated most broadly — the tokens a site names first when it decides to close. The cut below shows the most-disallowed bots across all 1203 sites with a robots.txt, bot name first, count next.

Bot	Sites disallowing (of 1203)	Rate
CCBot	234	19.5%
GPTBot	210	17.5%
ClaudeBot	207	17.2%
Bytespider	203	16.9%
Meta-ExternalAgent	178	14.8%

CCBot, Common Crawl's agent, tops the corpus blocklist at 234 sites, with GPTBot and ClaudeBot close behind. sneakernews.com and kicksonfire.com name all five of these — and more — in their disallow groups, so neither is improvising; they are gating the highest-volume training crawlers the whole corpus gates first, just all at once.

Corpus-wide, 317 of 1203 sites block at least one AI crawler.

How the Sneaker Snapshot Was Sealed

These figures come from one point-in-time crawl of public robots.txt files, sealed June 19, 2026 under snapshot sha 040215878ac7b85a. For each sneaker domain we fetched robots.txt at the root, parsed its user-agent and disallow directives, and recorded whether any AI crawler token was disallowed. We report verbatim counts; nothing is estimated, modeled, or extrapolated. The four domains with no parseable file — goat.com, flightclub.com, solecollector.com, and finishline.com — are logged as silent, neither allow nor block.

The counting rule is deliberately narrow. A block is an explicit Disallow aimed at a named AI agent — GPTBot, ClaudeBot, CCBot, and the other leaderboard tokens. A sneaker site can disallow checkout, search, or account paths without naming an AI agent, and that does not count as an AI block here. Only a directive that names one moves a site into the blocker column, which is why the sneaker count is a clean two: sneakernews.com and kicksonfire.com name them, the rest do not.

The snapshot also does not retry a slow host until a file appears, does not follow a redirect into a different domain's policy, and does not infer a block from a site that merely looks unfriendly to bots. Each sneaker domain is read once, at seal time, exactly as it answered.

That single-read rule is what makes the result content-addressable: anyone holding sha 040215878ac7b85a can re-derive the same six policied files and the same two blockers. The cost is that four resale and retail names land in the silent bucket rather than the allow column — the method favors reproducibility over a generous reading.

Frequently Asked Questions

Q: Which sneaker sites block AI crawlers?

A: sneakernews.com and kicksonfire.com — the two editorial sneaker-news sites in the set. Each is the comprehensive kind of blocker, naming GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Meta-ExternalAgent, Amazonbot, and Applebot-Extended. Those two gates are the entire 33.3% block rate; every store and marketplace in the set leaves crawlers in.

Q: Why do the sneaker stores leave AI crawlers in?

A: Distribution. stockx.com, footlocker.com, nicekicks.com, and hypebeast.com run on being found — their product pages, drop calendars, and price data are meant to be surfaced and cited, including by AI assistants answering release and resale questions. For a retailer or marketplace, being readable is how customers arrive, not a risk to manage.

Q: Does the 33.3% rate cover all the sneaker sites you found?

A: No. It covers the six sites that returned a parseable robots.txt. Four more — goat.com, flightclub.com, solecollector.com, and finishline.com — produced no parseable file at the seal, so they are excluded from the rate rather than counted as an allow or a block.

Q: Does a Disallow in robots.txt actually stop an AI crawler?

A: Not by force. robots.txt is an honor-system standard: a cooperative crawler reads it and complies, but the file enforces nothing technically. sneakernews.com and kicksonfire.com signal that AI agents should stay out of their paths; each crawler decides whether to honor that request.

Put AI-Access Data to Work

For a sneaker brand or e-commerce marketing owner — the person who owns how a store or release page appears online — this snapshot is a baseline worth watching. The retailers and marketplaces here stay open, but an accidental Disallow shipped during a site migration can quietly wall off the answer engines your customers now ask before they buy.

Knowing your own robots.txt posture, in real time, is the difference between surfacing in an AI shopping answer and vanishing from it. The split-by-business-model pattern sneaker sites share with the supplement category, where media gates and commerce stays open, is exactly the kind of norm a single migration mistake can flip.

A second fit is a GEO or AI-search analyst tracking which sneaker peers remain eligible to surface in answer engines. Their job is to know, continuously, whether the product and pricing pages they monitor are still readable, and whether a goat.com-style silence is a timeout or a hardening stance.

US Tech Automations runs scheduled robots.txt crawls with change alerts and agentic monitoring across a watchlist of domains, so a policy shift surfaces the week it lands rather than at the next audit. See how the agentic monitoring works, and you have a standing read on sneaker AI-access posture instead of a one-time count.

Corpus-wide, 330 of 1203 sites publish an llms.txt file.

Key Takeaways

Of the six Sneaker sites with a parseable robots.txt, two block at least one AI crawler — a 33.3% rate, a touch above the corpus average.
The two blockers are sneakernews.com and kicksonfire.com, both editorial; each gates comprehensively, naming GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Meta-ExternalAgent, Amazonbot, and Applebot-Extended.
The open sneaker domains — stockx.com, footlocker.com, nicekicks.com, and hypebeast.com — are all stores or marketplaces, and they allow every crawler.
Four sites — goat.com, flightclub.com, solecollector.com, and finishline.com — returned no parseable file at the seal and are excluded from the rate.
Corpus-wide, 317 of 1203 sites (26.4%) gate at least one crawler, so the category split is media-closed, commerce-open.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 19, 2026 (snapshot sha 040215878ac7b85a).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Sneaker Sites Block AI Crawlers? 2 of 6 Do.” https://ustechautomations.com/resources/blog/do-sneaker-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 040215878ac7b85a

Machine-readable data: CSV · JSON · All research & methodology