Research & Data

llms.txt Adoption: Who Publishes an AI Content Map?

Jun 13, 2026

While nearly half the web's prominent sites are busy blocking AI crawlers, a smaller group is doing the opposite — publishing a file that actively invites and guides them. The sealed data puts that group at 20 of 107 prominent sites, or 18.7%, serving an llms.txt file as of June 13, 2026. AI-access policy is not a one-way street toward closure; a real, if minority, countertrend is building.

An llms.txt is a proposed standard: a plain-text or Markdown file at a site's root that gives AI systems a curated map of the site's most important, AI-friendly content — the inverse of robots.txt. Where robots.txt says "don't go here," llms.txt says "here is what matters, and here is how to read it." This report measures which prominent sites have adopted it.

The figure comes from probing each site for a non-empty /llms.txt returning HTTP 200, across a curated set of 122 prominent websites in 10 categories; 107 returned a parseable robots.txt and the same set was probed for llms.txt. All counts are verbatim from fetches sealed point-in-time on June 13, 2026. As with the blocking data in this series, the value is not just the snapshot but the fact that it is dated and fixed: llms.txt files are edited and removed without notice, so a timestamped record of who published one is the only way to measure adoption as it actually moved.

Adoption Today: A Real but Early Signal

Metric	Value
Prominent sites checked	122
Sites serving an llms.txt	20
Share serving an llms.txt	18.7%
Sites blocking ≥1 major AI crawler	48
Share blocking ≥1 major AI crawler	44.9%

The two rates tell the whole story when read together. Blocking (44.9%) is more than twice as common as guiding (18.7%). The default reflex of a prominent site in 2026 is still defensive. But 18.7% is not noise — for a standard that did not meaningfully exist two years ago, roughly one in five prominent sites adopting it is a fast curve.

"As of June 13, 2026, 20 of 107 prominent sites — 18.7% — serve an llms.txt to guide AI systems, even as 44.9% block at least one AI crawler. Guiding and blocking are rising at the same time."

Who Is Publishing One

The 20 adopters in this snapshot skew heavily toward two profiles: developer-and-platform companies that treat machine readability as native, and consumer brands that want to shape how AI describes them.

Profile	Example adopters (from the sealed set)
Developer / platform	GitHub, Shopify, WordPress, Reddit, Pinterest
Commerce	Walmart, Target, Expedia
Finance / fintech	PayPal, Schwab, Coinbase
Media / streaming	Netflix, Spotify, Twitch
Education	Coursera, edX, Khan Academy, Duolingo
Tech media	Engadget, Slashdot

The pattern is the mirror image of the blocking data. The industries least likely to block — retail, finance, education, platforms — are the ones most likely to guide. Walmart, Target, Shopify, PayPal, Schwab, and Coinbase all serve an llms.txt because their interest is in being represented accurately and surfaced often by AI assistants. The education cluster is especially striking: Coursera, edX, Khan Academy, and Duolingo all publish one, consistent with a mission to be maximally usable by any system, human or machine.

Notably absent from the adopter list are the heavy blockers — the news publishers and premium-content sites that dominate the 82.4% news block rate. A site whose strategy is to keep AI out has little reason to publish a welcome map. The two behaviors are strategically coherent opposites.

The Complete Adopter Roster

For the record, here are all 20 sites in the curated set that served an llms.txt on the snapshot date, grouped by category:

Category	llms.txt adopters (sealed set)
Tech / platform	github.com, engadget.com, slashdot.org, shopify.com, wordpress.com
Retail	walmart.com, target.com
Social	reddit.com, pinterest.com
Finance	schwab.com, paypal.com, coinbase.com
Travel	expedia.com
Education	coursera.org, edx.org, khanacademy.org, duolingo.com
Entertainment	twitch.tv, netflix.com, spotify.com

What is most telling is the absence of an entire category: not a single News site in the curated set publishes an llms.txt. The segment that blocks AI crawlers hardest (82.4%) is also the segment that guides them least. News organizations have decided, almost unanimously, that their relationship with AI is adversarial rather than collaborative — and the empty news row in this table is the cleanest expression of that stance in the whole dataset.

The adopters, by contrast, cluster in categories where being legible to AI is a growth lever: platforms and developer tools where machine-readability is table stakes, retailers and fintechs that want to be the surfaced answer, and education providers whose mission is maximal usability. github.com publishing one is unsurprising; that walmart.com, target.com, schwab.com, paypal.com, and coinbase.com all do signals that mainstream consumer brands now treat AI representation as something to actively shape.

Blocking and Guiding Are Not Mutually Exclusive

It would be a mistake to read this as two separate camps. Some sites do both — block the training crawlers they distrust and publish an llms.txt to shape what the assistants they tolerate actually read. Reddit is the clearest example in the set: it appears among the llms.txt adopters while also maintaining an assertive crawler policy. That combination — "stay out of my training set, but here's how to represent me if you're answering a user" — may be the most sophisticated posture in the data, and likely the direction more large platforms move next.

The strategic frontier in 2026 is not the binary of open versus closed. It is the granular posture: which crawlers in, which out, and what guidance to publish for the ones you let in. Sites that treat robots.txt and llms.txt as a coordinated pair, rather than as boilerplate, are the ones acting deliberately.

How to Read an 18.7% Adoption Rate

A single adoption number is easy to misread, so it is worth being precise about what this one does and does not say. It is a point-in-time measurement of a curated set of prominent sites, not a projection and not a census of the web. Adoption among small sites, personal blogs, and documentation portals — many of which were early to the standard — is not captured here, and is almost certainly different. The figure describes the behavior of large, recognizable brands, which is the population most operators actually benchmark themselves against.

The standard itself is also still young and voluntary. There is no enforcement, no required schema, and no guarantee that any given AI system reads the file. That makes llms.txt adoption a signal of intent more than a guarantee of effect — a site that publishes one is declaring it wants to be machine-legible, even if the downstream consumption is still maturing. Read that way, 18.7% is best understood as the share of prominent brands that have made a deliberate choice to participate, versus the larger group that has simply not engaged with the question yet.

The most useful comparison is the one this series keeps returning to: guiding at 18.7% against blocking at 44.9%. Both behaviors are rising from a base of near-zero, and both are deliberate. A site doing neither is not neutral so much as undecided — and undecided is, for now, still the most common posture of all.

Put This Data to Work

For most operators, the practical question is not "should I block AI" but "is my site even legible to the AI systems I want to reach, and is my posture intentional?" Adoption at 18.7% means the llms.txt lever is still early enough to be a differentiator — and that most teams have not yet decided whether to pull it.

US Tech Automations helps operations and marketing teams act on exactly this. An automation specialist can audit whether your site serves an llms.txt and whether your robots.txt accidentally blocks the assistants you want surfacing you, then keep that posture monitored on a schedule. For a growth or content team pursuing AI-answer visibility, US Tech Automations can stand up a workflow that tracks llms.txt adoption across your competitive set and flags when a rival publishes one. The same sealed-fetch-and-diff machinery behind this research is what US Tech Automations reuses to keep a brand's machine-readable footprint deliberate rather than accidental — and to make sure the AI surfaces that increasingly decide discovery can actually read you.

Frequently Asked Questions

What is an llms.txt file?
It is a proposed standard file at a site's root that gives AI systems a curated, AI-friendly map of the site's key content — effectively the welcoming counterpart to robots.txt's restrictions.

Is 18.7% high or low?
For a standard this young, roughly one in five prominent sites is meaningful early traction. Compared with the 44.9% that block at least one AI crawler, guiding is still less than half as common as blocking.

Do AI systems actually use llms.txt?
Support is still emerging and varies by system; the standard is voluntary and not universally consumed yet. Publishing one signals intent and provides structure, but it is not a guarantee of how any given assistant behaves.

Can a site block crawlers and still publish an llms.txt?
Yes, and some do. The two files serve different purposes — one restricts access, the other guides whatever access is allowed — so a sophisticated operator can use both together.

Which industries adopt llms.txt most?
In this set, developer platforms, retail, finance, and education — the same industries that block AI crawlers least. Heavy blockers like news publishers are largely absent from the adopter list.

How is this different from a sitemap?
A sitemap lists URLs for search-engine crawlers to index; an llms.txt is aimed at AI systems and curates which content matters most and how it should be read, often in Markdown. They serve different audiences and can coexist — a site can have both.

Should my site publish one?
That depends on whether AI-answer visibility is a goal. If you want assistants to represent you accurately and surface you in responses, an llms.txt is a low-cost way to signal and structure that. If your strategy is to keep content out of AI entirely, your effort belongs in robots.txt instead. The 18.7% adoption rate means it is still early enough to be a differentiator either way.

Why is there no News site on the adopter list?
Because the news segment has chosen the opposite posture. It blocks AI crawlers more than any other category (82.4%), and publishing a welcome map for AI runs counter to that defensive strategy. The empty news row is consistent with an industry that views AI as a substitute for its product rather than a channel to it.

Will adoption keep rising?
The data here is a single point in time and cannot forecast on its own. What it shows is a standard moving from near-zero to roughly one in five prominent sites, concentrated in categories where machine-legibility is a growth lever — a profile that historically precedes wider adoption. Tracking the rate over successive snapshots is the only way to confirm the trajectory.

Key Takeaways

20 of 107 prominent sites (18.7%) serve an llms.txt to guide AI systems as of June 13, 2026 — a real countertrend to the 44.9% that block AI crawlers.
Adopters skew toward developer platforms (GitHub, Shopify, WordPress), commerce (Walmart, Target), finance (PayPal, Schwab, Coinbase), streaming (Netflix, Spotify), and education (Coursera, edX, Khan Academy, Duolingo).
The industries most likely to guide are the ones least likely to block — the mirror image of the blocking data.
Blocking and guiding are not exclusive; sites like Reddit do both, pointing to a more granular future posture.
All figures are verbatim from fetches sealed point-in-time on June 13, 2026, over a curated set of 122 prominent sites.

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “llms.txt Adoption: Who Publishes an AI Content Map?.” https://ustechautomations.com/resources/blog/llms-txt-adoption-which-major-sites-publish-ai-content-map-2026

Sealed snapshot sha256: 741353c4304216ee

Machine-readable data: CSV · JSON · All research & methodology