Research & Data

Do Equestrian Sites Block AI Crawlers? 2 of 9 Do

Jun 14, 2026

Two equestrian websites tell AI crawlers to stay out — and both of them are magazines, not stores. In our sealed June 2026 snapshot, 9 of the 10 Equestrian sites we checked returned a parseable robots.txt, and exactly 2 of those gate at least one AI crawler. That works out to a 22.2% block rate, comfortably below the corpus average, with the restriction concentrated entirely on the editorial side of the sport.

This report reads only the public robots.txt files of equestrian publishers, governing bodies, and tack-and-feed retailers. A robots.txt file is the small text file a site posts at its root to tell automated crawlers which paths they may fetch. We sealed the snapshot, hashed it, and counted what is published — no estimates.

2 of 9 Equestrian sites block at least one AI crawler.

The split tells the story. The editorial outlets thehorse.com and horsenation.com are the two blockers; the retailers and federations leave the door open. Across the full corpus the rate is higher — Equestrian sites post a 22.2% AI-crawler block rate while 28% of all sites gate at least one crawler. The pattern is not random: it follows the money.

Which Sites Are Gating — and Which Are Not

Of the 9 Equestrian sites with a published policy, 2 block an AI crawler and the rest allow all of them. The one remaining domain, practicalhorsemanusa.com, served no parseable robots.txt — an absence we report as-is, neither allow nor block.

The blockers are both content publishers: thehorse.com and horsenation.com, sites whose product is articles. The allowers are a mix of the sport's commercial and institutional core: the tack-and-feed retailers smartpakequine.com, doversaddlery.com, and statelinetack.com; the federation usef.org; and the magazines that chose openness — equusmagazine.com, equisearch.com, and horseandhound.co.uk.

Equestrian Siterobots.txtBlocks Any AI Crawler
thehorse.comPublishedYes
horsenation.comPublishedYes
horseandhound.co.ukPublishedNo
smartpakequine.comPublishedNo
doversaddlery.comPublishedNo
statelinetack.comPublishedNo
usef.orgPublishedNo
equusmagazine.comPublishedNo
equisearch.comPublishedNo
practicalhorsemanusa.comNo parseable robots.txt

The only two Equestrian blockers — thehorse.com and horsenation.com — are editorial sites.

What This 22.2% Block Rate Actually Means

A 22.2% rate places Equestrian below the corpus line, and the reason is structural. The sites with the strongest incentive to gate are the ones selling ad impressions and subscriptions against original writing; the retailers and the federation want to be found, not hidden.

For a tack-and-feed retailer like doversaddlery.com or statelinetack.com, an AI shopping agent that can read the catalog is a new shelf, not a threat. For an editorial site like thehorse.com, the same crawler reads valuable articles and may answer the reader's question without ever sending a click. That divergence is exactly why the two blockers are both publishers.

The federation usef.org adds a third logic to the mix. Governing bodies publish rules, results, and membership information that they generally want as widely available as possible — being summarized by an answer engine extends their reach rather than cannibalizing it. So the category's open majority is not one homogeneous group but two distinct ones: commercial sites chasing discovery and institutional sites chasing reach. Only the ad- and subscription-supported publishers have a clear reason to close, and even there, equusmagazine.com and horseandhound.co.uk chose to stay open while thehorse.com and horsenation.com did not.

Corpus-wide, 295 of 1053 sites block at least one AI crawler.

The same editorial-versus-commercial divide shows up in other hobby verticals. A vertical at the permissive floor makes a useful contrast: our companion report on whether bowling sites block AI crawlers covers a category at 0%, where even the trade press leaves crawlers unblocked.

How Equestrian Compares to Nearby Categories

Equestrian sits in a band of categories with light-to-moderate gating. The focused window below places it among its nearest neighbors in the block-rate ranking, so you can see where a 22.2% rate falls relative to similar hobbies. Every value is the verbatim sealed count.

CategorySitesWith robots.txtBlock Any AI CrawlerBlock Rate
Mycology1010220%
HR109222.2%
Skiing109222.2%
Archery109222.2%
Rockhounding109222.2%
Equestrian109222.2%
Podcasts1010220%
Crafts108225%

Equestrian shares its exact 22.2% rate with Skiing, Archery, Rockhounding, and HR — a tidy cluster of categories where two of nine policied sites gate. None of them approach the corpus's restrictive end. For a vertical with even fewer blockers, see our read on whether sailing sites block AI crawlers, which sits a notch lower.

The cluster is also instructive about how block rates form. Skiing, Archery, and Rockhounding are all enthusiast verticals with a similar mix of gear retailers, governing bodies, and a handful of content sites — and they land at the same 22.2% because, like Equestrian, only their editorial members tend to gate. Where a category climbs higher, it is usually because forums or trade publishers dominate the sample. Our analysis of how welding sites handle AI crawlers shows that effect plainly: at 37.5%, Welding runs above Equestrian precisely because a forum and a trade publication sit among its blockers.

Reading the distribution this way matters more than reading the single number. A 22.2% rate built from two editorial blockers is a very different market from a hypothetical 22.2% built from two retailers gating — the former leaves the commercial layer fully open, the latter would not. In Equestrian, the open layer is exactly the one a buyer cares about.

Which Bots Are Blocked Most Across the Corpus

Equestrian's two blockers are part of a broader corpus pattern in which a handful of AI crawlers absorb most of the disallow directives. The focused cut below shows the most-blocked bots across all 1053 sites — the tokens an equestrian publisher is statistically most likely to be targeting.

BotSites Disallowing (across all 1053 sites)Share
CCBot22121%
ClaudeBot19718.7%
GPTBot19718.7%
Bytespider19018%
Meta-ExternalAgent16816%

CCBot, the Common Crawl token, leads, with ClaudeBot and GPTBot tied just behind. When thehorse.com or horsenation.com gates a crawler, it is most likely one of these high-traffic tokens — the same ones doing the heavy lifting across the corpus.

The tight grouping near the top of this leaderboard is itself a finding. CCBot, ClaudeBot, and GPTBot are separated by only a few sites, which means publishers that decide to gate tend to gate the big three together rather than singling one out. A site rarely blocks GPTBot while welcoming ClaudeBot; the more common pattern is a blanket stance toward the most prominent training and answer-engine crawlers. For an equestrian publisher weighing whether to restrict, that is the practical reality: the decision is usually all-or-nothing across the leaders, not a surgical exclusion of one operator.

CCBot draws disallow directives from 221 sites across the corpus.

Put AI-Access Data to Work

The buyer with the clearest stake is the e-commerce growth or RevOps lead running an equestrian retail storefront like smartpakequine.com or doversaddlery.com. As AI agents answer "best winter turnout blanket" directly, catalog readability decides whether the brand appears in the answer. Their recurring job: re-crawl the equestrian set weekly and alert the moment a competitor such as statelinetack.com adds a crawler token to its disallow list — a rival going dark to AI is a discoverability opening to act on immediately.

The second ICP is the tack-and-feed retail operations manager who owns the site's technical config. Their workflow: monitor their own robots.txt so an accidental disallow of CCBot or GPTBot never quietly cuts answer-engine visibility, given that the two category blockers are editorial sites with different incentives. US Tech Automations runs that monitoring as scheduled robots.txt and llms.txt crawls with change alerts and an AI-access dashboard. See how agentic workflows automate this monitoring.

Reading the Sealed Numbers

Every figure here is a verbatim count from public robots.txt files captured in a single sealed snapshot on June 14, 2026; nothing is estimated, modeled, or extrapolated. We fetched each domain's robots.txt, parsed its user-agent and disallow directives, matched them against a fixed list of known AI crawler tokens, and counted. A site "blocks" a crawler only when its published file disallows that token from any path.

robots.txt is a public, voluntary standard — a request the crawler chooses to honor. The snapshot was content-hashed (sha d0b7ef205c390023) so the exact bytes behind every count can be re-verified later.

A word on scope keeps the figure honest. The 2-of-9 result describes the equestrian domains we sampled, not the entire equestrian web, and it counts only sites that returned a parseable policy — practicalhorsemanusa.com, which served none, is held out rather than assumed open. The strength of a sealed snapshot is that it draws a fixed, dated boundary around exactly what was read, so a later re-crawl can be compared against it line for line. That comparability is the whole point: the value is not just today's count but the ability to detect, with certainty, when it moves.

Frequently Asked Questions

Q: Does adding a crawler to robots.txt actually keep it out?

A: Only by cooperation. robots.txt is an honor-system file: it states a preference that compliant crawlers respect, but it does not technically block anything on its own. It is a published request, not a wall — which is why we measure what sites declare, not what they enforce.

Q: Which equestrian sites are the two blockers?

A: Both are editorial outlets: thehorse.com and horsenation.com. Together they account for the category's entire 22.2% block rate. The retailers, the federation usef.org, and the other magazines we checked — including equusmagazine.com and horseandhound.co.uk — all leave AI crawlers unblocked.

Q: Why do equestrian retailers stay open while the magazines gate?

A: Incentives differ. A retailer like smartpakequine.com gains visibility when an AI shopping agent can read its catalog, so blocking would cost it customers. A magazine such as thehorse.com risks having its original articles summarized without a click, which is a clearer reason to restrict access — and exactly the split the data shows.

Q: What does practicalhorsemanusa.com count as if it has no robots.txt?

A: It counts as neither an allow nor a block. Of 10 Equestrian sites, 9 returned a parseable policy and that one did not. We report the absence honestly rather than assuming it permits or denies crawlers, which keeps the 2-of-9 figure exact.

Equestrian sites post a 22.2% AI-crawler block rate.

Key Takeaways

Equestrian posts a 22.2% block rate: 2 of 9 policied sites gate at least one AI crawler, both of them editorial publishers, while the retailers and the federation stay open. That places the vertical below the 28% corpus average and squarely among neighbors like Skiing and Archery. The actionable signal is the editorial-versus-commercial split — and watching whether the open retailers ever change posture.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha d0b7ef205c390023).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Equestrian Sites Block AI Crawlers? 2 of 9 Do.” https://ustechautomations.com/resources/blog/do-equestrian-sites-block-ai-crawlers-2026

Sealed snapshot sha256: d0b7ef205c390023

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.