Research & Data

Do Pet Sites Block AI Crawlers? Sealed robots.txt Data

Jun 14, 2026

The Pet category carries an unusual structural feature that shapes how you read its block rate: of the 10 Pet sites we checked, only 7 returned a parseable robots.txt file. Three sites — petfinder.com, rover.com, and petco.com — returned nothing at all. The sites that did return a file show a 28.6% block rate (2 of 7), with chewy.com and thesprucepets.com as the blockers. The remaining sites — petmd.com, akc.org, aspca.org, petsmart.com, and vetstreet.com — allow all AI crawlers without restriction.

The gap between the 10 sites checked and the 7 with a parseable file is the most distinctive feature of this category in the June 14, 2026 sealed snapshot. A robots.txt file is a public policy signal placed at a site's root — its absence leaves the site's AI-access posture ambiguous. We know nothing about petfinder.com's, rover.com's, or petco.com's preferences from this snapshot because they issued no parseable file on the collection date. The analysis that follows is grounded only in what the sealed data contains.

How the Pet Category Breaks Down

The sealed snapshot shows a category that is structurally open, with the caveats noted above. Of the 7 Pet sites that returned a parseable robots.txt file, 5 allow all AI crawlers — a clear majority. The 2 blockers represent meaningfully different protective logics.

chewy.com is a large-scale e-commerce retailer whose competitive edge lies partly in its product catalog, customer data, and merchandising. Restricting AI crawlers is consistent with protecting proprietary commercial data from bulk scraping for training purposes. thesprucepets.com is an editorial content site — it sits within a large publishing network and restricts AI crawlers in ways common to publisher brands that monetize content through advertising and audience attention.

2 of 7 Pet sites with a parseable robots.txt block at least one AI crawler — a 28.6% block rate.

The five allowing sites represent a range of institution types. petmd.com and vetstreet.com are veterinary information resources where broad content distribution may serve their educational and referral missions. akc.org and aspca.org are nonprofit registries and advocacy organizations, and Nonprofit is the only category in the corpus with a 0% block rate — their open posture aligns with that pattern. petsmart.com, a major retailer like chewy.com, differs from its competitor in its robots.txt configuration, at least as of the snapshot date.

petmd.com, akc.org, aspca.org, petsmart.com, and vetstreet.com allow all AI crawlers without restriction in this snapshot.

chewy.com and thesprucepets.com are the only Pet-category sites that restrict AI crawler access.

Why the Pets Block Rate Sits Below the Corpus Average

Corpus-wide, 123 of 293 sites (42%) block at least one AI crawler. The Pets block rate of 28.6% sits noticeably below that line — but the reason is less about the pet industry's attitudes toward AI and more about the composition of the sites we sampled. Nonprofit-adjacent organizations (aspca.org, akc.org) naturally skew toward openness; retail-scale sites have mixed incentives; and editorial publishers vary by parent company and policy.

The categories that sit nearest to Pets in the ranking — Legal (28.6%) and RealEstate (28.6%) — also show this mix of institution types. All three categories contain a blend of informational resources and commercial platforms, without the dominance of content-publisher or data-platform interests that drive higher block rates in News (82.4%) or Gaming (88.9%).

The three sites with no parseable robots.txt file are worth noting separately. petfinder.com hosts adoptable pet listings — a service with significant user-contributed data that might ordinarily motivate a blocking posture. rover.com is a marketplace connecting pet owners and sitters, another data-rich context. petco.com is a direct retail competitor to chewy.com. The absence of a robots.txt for all three does not mean they allow AI crawlers; it means we cannot characterize their posture from this snapshot.

Where Pets Sits in the 32-Category Corpus

The table below covers all 32 categories, ranked by block rate, from the sealed snapshot covering 339 sites checked on June 14, 2026.

CategorySites CheckedWith robots.txtBlocking Any AIBlock Rate
Gaming99888.9%
News20171482.4%
Food1010770%
Tech1513969.2%
Entertainment99666.7%
Healthcare109666.7%
Music109666.7%
Parenting108562.5%
Reference1411654.5%
Science1010550%
Automotive109444.4%
HomeGarden109444.4%
Fashion97342.9%
Social1010440%
Sports1010440%
Fitness1010440%
Photography1010440%
Jobs108337.5%
Travel99333.3%
Weather106233.3%
Legal107228.6%
RealEstate107228.6%
Pets107228.6%
Crafts108225%
Finance1211218.2%
Retail1512216.7%
Education97114.3%
Government98112.5%
Crypto98112.5%
Religion109111.1%
Nonprofit10600%
Streaming101000%

Pets shares its 28.6% rate with Legal and RealEstate. All three categories sit below the 40% mid-tier cluster and above the Finance / Retail / Education / Government / Crypto / Religion / Nonprofit / Streaming group at the bottom. The Pets category is notably different from Legal and RealEstate in one way: it has a higher share of sites with no parseable robots.txt file, which affects how confidently one can characterize the category's overall posture.

For a look at a category sitting further up the ranking, see Do Crypto Sites Block AI Crawlers?, where a different kind of structural openness explains a similarly low block rate.

Which AI Bots Are Blocked Most Across All 293 Sites

The tables below describe the broader 293-site corpus, not just Pets. They show which crawlers and operators face the most restrictions across all categories on the sealed snapshot date.

AI BotSites Blocking (of 293)Block Rate
CCBot9733.1%
ClaudeBot8729.7%
Bytespider7525.6%
GPTBot7425.3%
Meta-ExternalAgent7023.9%
PerplexityBot6823.2%
Applebot-Extended6722.9%
Google-Extended6622.5%
Amazonbot5619.1%

CCBot (Common Crawl) leads with 97 blocking sites — 33.1% of all 293 with a parseable file. ClaudeBot (Anthropic) is blocked by 87 sites (29.7%). The two most-blocked bots both belong to operators frequently named in publisher blocking rules: Common Crawl as a general-purpose web archiving crawler, and Anthropic as an AI training operator.

Operator Blocked (all 293 sites)Sites Blocking
Common Crawl97
Anthropic93
Meta80
OpenAI77
ByteDance75
Perplexity69
Apple67
Google66
Cohere63
Diffbot60
Amazon56
Mistral23

Reading the Sealed Numbers

This report is a verbatim reading of public robots.txt files sealed June 14, 2026 under snapshot sha a5ca246fbdc79954. US Tech Automations Research fetched each domain's robots.txt, parsed its user-agent blocks, and flagged any site that listed at least one known AI crawler agent string in a Disallow rule. All figures are verbatim counts from the sealed snapshot; nothing is estimated, modeled, or extrapolated.

Sites without a parseable robots.txt are included in the sites count (10 for Pets) but excluded from the block-rate denominator (7 for Pets). This distinction matters for the Pet category more than for categories where all or nearly all sites returned a file.

The collection and sealing process:

  1. Fetch. Each domain's /robots.txt endpoint is retrieved on the snapshot date.

  2. Parse. User-agent blocks are matched against a fixed reference list of known AI crawler agent strings.

  3. Seal. Raw collected files are content-hashed into an append-only log, producing snapshot sha a5ca246fbdc79954.

  4. Aggregate. Per-domain flags are grouped by category and block rates computed from the sealed denominators.

Frequently Asked Questions

Q: Why does the Pet block-rate denominator use 7 instead of 10?

A: Three sites — petfinder.com, rover.com, and petco.com — returned no parseable robots.txt file on the snapshot date. Because there is no policy to read, these sites cannot be classified as blockers or allowers. The block rate of 28.6% divides the 2 blocking sites by the 7 sites that returned a parseable file. Including the three missing sites in the denominator would produce a derived number not present in the sealed data.

Q: Is the absence of a robots.txt file the same as allowing all AI crawlers?

A: No. A missing robots.txt means the site has issued no explicit public robots.txt-based instruction. Default crawler behavior in the absence of a file varies by operator. The sealed data does not allow us to infer intent from absence — we report what the file says, not what we assume.

Q: Why would chewy.com and petsmart.com have different robots.txt configurations if they compete in the same space?

A: Robots.txt policy is set at the engineering and legal team level, often influenced by corporate IP strategy, prior incidents, or parent-company standards. Two direct competitors can legitimately adopt different policies. This data captures posture at a single moment — petsmart.com may change its configuration at any time.

Q: How does the Pets 28.6% block rate compare to closely related categories?

A: Pets shares its 28.6% rate with Legal and RealEstate in this snapshot. All three sit below the 40% mid-tier cluster (Social, Sports, Fitness, Photography) and above the Finance and Retail groups at 18.2% and 16.7% respectively. The Parenting category, which might seem structurally similar to Pets in some ways, shows a markedly higher 62.5% block rate — driven by a different composition of editorial publishers versus service or nonprofit brands.

Q: What does the llms.txt deployment rate mean for this category?

A: Across all 293 sites in the corpus, 48 (16.4%) have deployed an llms.txt file — a newer convention that lets sites describe content for large language model training in structured form. This report covers only robots.txt blocking; llms.txt data is tracked at the corpus level and will be the subject of a separate analysis.

Put AI-Access Data to Work

The Pet category's 28.6% block rate — anchored on June 14, 2026 — is a stable reference, but the actionable value is in monitoring when and how that baseline shifts. Three audiences have concrete recurring jobs here.

An SEO lead for a pet-content brand or veterinary information resource tracks whether the currently-open sites — petmd.com, akc.org, aspca.org, petsmart.com, vetstreet.com — add any AI-crawler restrictions in coming months. The trigger is a weekly re-crawl of all 10 Pet domains, with an automatic diff against the sealed baseline. An alert on the day a new Disallow rule appears lets content teams prepare for potential changes in AI-generated answer surfaces before they affect traffic.

A data-pipeline engineer building a pet health or product-recommendation retrieval system maintains a live status list of the Pet domain set, automatically updated against sealed snapshots. The recurring job: each time a new snapshot is sealed, compare the current robots.txt state of the 10 Pet domains against the prior baseline and flag any change. This keeps training-data pipelines from inadvertently processing newly-blocked sites.

A publisher RevOps lead at a pet-industry editorial brand uses the sealed category data to benchmark their robots.txt posture. The monthly job is a self-crawl and diff: confirm that their own disallow configuration reflects deliberate policy and has not drifted through engineering deploys. Unintentional AI-crawler blocks on editorial content can suppress AI-referral discovery for weeks without appearing in standard web analytics.

US Tech Automations automates this monitoring with scheduled robots.txt crawls, change-diffing, and alerting pipelines — your team sees policy shifts the day they happen. Learn more at /platform/agentic-workflows.

For reference on how adjacent categories compare, see Do Parenting Sites Block AI Crawlers? (62.5% block rate) and Do Crafts Sites Block AI Crawlers? (25% block rate). The contrast between Parenting and Pets illustrates how category composition — not just industry proximity — drives the sealed block rate.

2 of 7 Pet sites block at least one AI crawler.

Pet sites carry a 28.6% AI-crawler block rate.

Across 293 sites, 123 block at least one AI crawler — a 42% rate.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha a5ca246fbdc79954).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Pet Sites Block AI Crawlers? Sealed robots.txt Data.” https://ustechautomations.com/resources/blog/do-pet-sites-block-ai-crawlers-2026

Sealed snapshot sha256: a5ca246fbdc79954

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.