Research & Data

Do Chess Sites Block AI Crawlers? 2 of 7 Do

Jun 14, 2026

Most chess sites still let AI crawlers read the board. Of the 10 Chess sites we checked on 14 June 2026, 7 returned a parseable robots.txt, and only 2 of those 7 disallow at least one named AI crawler — a 28.6% block rate.

That is a quieter posture than the wider web shows. Across the whole snapshot, roughly a third of sites with a published policy turn away at least one AI bot. Chess sits below that line, which makes it a useful case study in how an information-rich hobby vertical treats automated readers when its content is reference material people genuinely want surfaced.

2 of 7 Chess sites block at least one AI crawler.

This report is a point-in-time read of public robots.txt files, sealed under snapshot sha 92ed5cd2858657d9. Every figure below is a literal count from that file set; nothing is inferred from traffic, rankings, or guesswork.

Which Chess Sites Gate the Crawlers

The two sites posting AI-crawler disallow rules are chesstempo.com and thechessworld.com. One leans toward training tools and puzzle databases; the other is an editorial and instructional publisher. Both have a clear reason to guard their pages: structured content that an answer engine could lift wholesale.

The allowers are the names most players would recognize. chess.com, lichess.org, chessbase.com, chessable.com, and newinchess.com all returned a robots.txt that leaves the named AI crawlers a clear path. For platforms whose growth depends on being the place people land when they search a chess opening, an open policy is a discoverability choice as much as a permissive one.

Of the 7 Chess sites with a published policy, only chesstempo.com and thechessworld.com disallow a named AI crawler.

Three more sites — chess24.com, ichess.net, and chessgames.com — returned no parseable robots.txt at all. That is not the same as allowing or blocking; it simply means there was no published rule for a crawler to read. We count those sites in the checked total but not in the block-rate denominator, which is why the rate is measured against 7, not 10.

What a 28.6% Block Rate Actually Means

A block rate is a snapshot of intent, not of enforcement. robots.txt is a request, and a 28.6% figure tells you how many Chess sites with a policy chose to write a disallow rule for at least one AI operator — no more.

The signal worth reading is the split. Chess content divides into two economic shapes: open platforms that want to be found, and gated databases that treat their structured records as the product. The two blockers map cleanly to the second shape, while the five allowers map to the first. That clean division is more informative than the headline percentage.

Chess sites post a 28.6% AI-crawler block rate.

For anyone building AI-access tooling, the lesson is that category averages hide the real story. A vertical can look permissive in aggregate while its most data-dense properties quietly close their doors.

It is also worth noting what the rate does not capture. A site that allows every named crawler today can add a disallow line tomorrow, and a site with no robots.txt at all may publish one next week. The 28.6% figure is a reading of a single day, which is exactly why it is sealed to that day rather than presented as a stable property of the Chess vertical.

The two blockers are also instructive about motive. A puzzle-and-training database and an instructional publisher both produce content whose value lies in its structure and completeness — precisely the qualities that make a page attractive to a training crawler and worth guarding from one. The five allowers, by contrast, win when their pages are quoted and linked, so openness is not a concession for them but a growth lever.

How Chess Compares to Its Nearest Neighbors

Chess does not sit alone at 28.6%. Several adjacent verticals cluster around the same block rate, and a focused window around Chess shows where it lands among similarly permissive categories. The table below uses verbatim sealed counts; categories are listed by name, not rank.

Category	Sites	With robots.txt	Block at least one	Block rate
Beauty	10	6	2	33.3%
Agriculture	10	9	3	33.3%
Yoga	10	10	3	30%
Scuba	10	10	3	30%
Legal	10	7	2	28.6%
RealEstate	10	7	2	28.6%
Pets	10	7	2	28.6%
Chess	10	7	2	28.6%
Crafts	10	8	2	25%
Space	9	8	2	25%
BoardGames	10	8	2	25%

Chess keeps company with Legal, RealEstate, and Pets — all at the same 28.6% mark. These are verticals where some operators run reference databases worth guarding while most participants chase visibility. Just below sit Crafts and BoardGames, hobby categories with even lighter gating. The nearby scuba diving report shows a closely related leisure vertical at 30%.

The extremes give the full sweep. The most-blocked categories cluster high, while several verticals post no blocks at all.

Category	With robots.txt	Block at least one	Block rate
Gaming	9	8	88.9%
News	17	14	82.4%
Drones	9	0	0%
Banking	7	0	0%

Chess sits far from Gaming's 88.9%, which is the kind of contrast worth holding onto: two adjacent leisure interests treat AI readers in opposite ways.

The neighbors around Chess help explain why it lands where it does. Legal, RealEstate, and Pets all share the 28.6% mark, and each is a vertical where a few operators run structured reference databases worth guarding while most participants chase visibility.

That mixed economy produces a moderate block rate almost by construction: a couple of database owners gate, the rest stay open, and the average lands in the high twenties. Chess fits that pattern cleanly, which is itself the reassuring read — its rate is ordinary for a category of its shape, not a sign of unusual pressure on the hobby.

The Operator-Level Picture Across the Corpus

Block rates per category tell you who is gating; the operator leaderboard tells you whom they are gating. The figures below are corpus-wide across all 743 sites with a published policy, named by operator rather than rank.

Operator	Sites disallowing
Common Crawl	169
Anthropic	160
OpenAI	150
Meta	143
ByteDance	142

Common Crawl leads at 169 sites, with Anthropic and OpenAI close behind. The pattern is consistent: the operators best known for large-scale training corpora draw the most disallow rules. A Chess site weighing its own policy is implicitly deciding where it stands against this same shortlist.

Across all 743 sites, Common Crawl is named in 169 disallow rules — the single most-blocked operator in the corpus.

For the two Chess blockers, the practical effect is that they are joining a much larger movement, not inventing a new one. The corpus-wide rate of 31.1% sets the baseline; Chess sits just under it.

Corpus-wide, 231 of 743 sites block at least one AI crawler.

How the Snapshot Was Sealed

The method here is deliberately narrow. We fetched the public robots.txt file from each Chess domain, parsed it for disallow rules naming known AI crawlers, and recorded the result as a verbatim count. A site lands in the blocker column only if its published file names at least one AI operator in a disallow rule.

Everything in this report is read directly from those files; nothing is estimated, modeled, or extrapolated. We do not infer intent from traffic, infer policy from rankings, or fill gaps with assumptions. When a site returned no parseable robots.txt — as chess24.com, ichess.net, and chessgames.com did — we record exactly that and exclude it from the rate, rather than guessing what it would have said.

The whole file set was content-hashed and sealed under sha 92ed5cd2858657d9 on 14 June 2026. That seal is what makes the figures reproducible: anyone re-fetching the same files can check them against the recorded counts. Because robots.txt is edited freely, a sealed point-in-time read is the only honest way to state a block rate — it describes one moment, not a trend.

The wider snapshot covers 883 sites across 88 categories, of which 743 returned a parseable policy. Chess is one small, named slice of that whole, and its 2-of-7 result is a literal count, not a sample estimate.

Key Takeaways

The headline is straightforward: Chess is a permissive vertical with two clear holdouts. The category's 28.6% rate sits below the 31.1% corpus average, and the split between open platforms and gated databases explains the gap better than any single number.

2 of 7 Chess sites with a policy block at least one AI crawler.
chesstempo.com and thechessworld.com are the only named blockers.
chess.com, lichess.org, chessbase.com, chessable.com, and newinchess.com leave AI crawlers a clear path.
chess24.com, ichess.net, and chessgames.com returned no parseable robots.txt.
Corpus-wide, 231 of 743 sites — 31.1% — block at least one AI crawler.

A point-in-time read is only the start. The value is watching whether these positions hold, because a robots.txt is edited the day a site changes its mind.

Frequently Asked Questions

Q: Why is Chess below the corpus block rate?

A: Of 7 Chess sites with a policy, only 2 disallow a named AI crawler — 28.6%, below the 31.1% corpus rate. Most chess platforms depend on search discoverability, so an open policy serves their growth; only the database- and instruction-heavy sites like chesstempo.com gate.

Q: Which Chess sites block AI crawlers, and which do not?

A: chesstempo.com and thechessworld.com block at least one named crawler. chess.com, lichess.org, chessbase.com, chessable.com, and newinchess.com leave them a clear path. Three sites — chess24.com, ichess.net, and chessgames.com — published no parseable robots.txt at all.

Q: Does a robots.txt disallow actually stop a crawler?

A: No. robots.txt is an honor-system convention, not an access control. Compliant operators respect it; nothing in the file enforces it. A 28.6% block rate measures stated intent across Chess sites, not what any bot is technically prevented from reading.

Q: Why count Chess against 7 sites instead of 10?

A: Three of the 10 Chess sites we checked — chess24.com, ichess.net, and chessgames.com — returned no parseable robots.txt, so there was no policy to classify. The 28.6% block rate is measured only against the 7 sites that published a readable file.

Put AI-Access Data to Work

This data turns into a recurring job the moment you stop reading it as a one-time count and start watching it for drift.

A chess-platform growth lead at a site like lichess.org or chessable.com should track whether peer platforms add or drop AI-crawler disallow rules. Set a weekly re-crawl of the seven policied Chess domains and alert the moment a current allower writes a new disallow line — that shift signals a competitor reclassifying its content as a guarded asset, a positioning cue worth acting on before a launch.

A competitive-intelligence analyst covering the games space should monitor whether chesstempo.com and thechessworld.com extend their blocks to new operators, and whether the broader 28.6% rate moves. The trigger is any change to the named-operator list; the cadence is weekly, tied to this same sealed baseline.

A retrieval-product engineer building an AI answer feature should re-check these policies before each ingest cycle, so the crawl respects the current disallow set rather than a stale copy. US Tech Automations runs exactly this kind of scheduled robots.txt and llms.txt monitoring, flagging policy drift and surfacing it on an AI-access dashboard.

See how the drone vertical stays fully open for contrast, or how fishkeeping sites gate even more lightly than chess. To automate this monitoring for your own watchlist, explore agentic workflows.

Zoom out: Chess is just one vertical in a much larger picture — our cross-industry study measures how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 92ed5cd2858657d9).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Chess Sites Block AI Crawlers? 2 of 7 Do.” https://ustechautomations.com/resources/blog/do-chess-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 92ed5cd2858657d9

Machine-readable data: CSV · JSON · All research & methodology