Research & Data

Do Automotive Sites Block AI Crawlers? Sealed robots.txt Data

Jun 13, 2026

4 of 9 Automotive sites block at least one AI crawler.

Automotive sites block AI crawlers at a 44.4% rate.

72 of 157 sites block at least one AI crawler across the corpus.

Key Takeaways

4 of 9 Automotive sites with a parseable robots.txt block at least one AI crawler.

The Automotive block rate of 44.4% sits just below the corpus-wide rate of 45.9%.

4 of 9 Automotive sites with a parseable robots.txt block at least one AI crawler.

Automotive sits below the 45.9% corpus-wide block rate.

The Automotive category is one of the most evenly split in the June 2026 Closing Web snapshot. Four sites have declared at least one AI-crawler block; five remain fully open. The category as a whole lands almost exactly at the corpus average of 45.9%, making it a useful baseline reference for anyone trying to understand where their sector falls relative to the broader web.

This report draws exclusively from a point-in-time snapshot sealed June 13, 2026 (sha 9ceca3bdf0dfeaca). Every number is a verbatim count from that snapshot. Every figure is a raw read; nothing is estimated, modeled, or extrapolated.

What This Snapshot Measures

The US Tech Automations Research team collected and parsed public robots.txt files from 182 prominent websites across 16 content categories. Of those 182, a total of 157 returned a parseable robots.txt file. The Closing Web project checks for disallow directives targeting 9 named AI crawlers: CCBot, ClaudeBot, GPTBot, Bytespider, PerplexityBot, Meta-ExternalAgent, Applebot-Extended, Google-Extended, and Amazonbot.

Across the full corpus of 157 sites, 72 — or 45.9% — block at least one of those crawlers. That is the benchmark against which every category in this report is compared.

For the Automotive category specifically, 10 sites were checked. Of those 10, 9 returned a parseable robots.txt. Of those 9, exactly 4 have issued at least one AI-crawler block — a rate of 44.4%.

Of 10 Automotive sites checked, 9 returned a parseable robots.txt file, and 4 of those 9 block at least one AI crawler.

At 44.4%, Automotive sits just below the corpus-wide average of 45.9% across all 157 sites — the closest any category comes to the corpus mean.

Category Snapshot: Automotive

The table below summarizes the Automotive category results from the June 2026 sealed snapshot.

Category	Sites Checked	With robots.txt	Blocking Any AI	Block Rate
Automotive	10	9	4	44.4%

Nine of 10 Automotive sites returned a parseable robots.txt file — the highest robots.txt coverage rate in the category. One site, cars.com, returned no parseable robots.txt. Among the 9 with robots.txt files, the split is nearly even: 4 block while the rest allow.

The Blockers: edmunds.com, kbb.com, motortrend.com, jdpower.com

Four Automotive sites have placed AI-crawler disallows in their robots.txt files: edmunds.com, kbb.com, motortrend.com, and jdpower.com. These are among the most authoritative editorial and valuation sources in the automotive space. Their blocking posture suggests these organizations have made deliberate decisions about protecting the structured data, pricing models, and editorial content that differentiate their products in an AI-saturated environment.

It is important to note that a robots.txt disallow is an honor-system signal, not a technical lock. Compliant bots will respect it; non-compliant actors will not. The snapshot records declared intent — not enforced access control.

The Allowers: caranddriver.com, autotrader.com, carfax.com, roadandtrack.com, truecar.com

Five Automotive sites returned parseable robots.txt files that issue no disallow directives against any of the 9 tracked AI crawlers: caranddriver.com, autotrader.com, carfax.com, roadandtrack.com, and truecar.com. These sites have, at the time of this snapshot, left the door open to AI training and retrieval crawlers under the honor-system framework.

Whether that openness reflects an intentional strategy or a posture that has not yet been updated is outside the scope of a robots.txt snapshot. What is confirmed is their declared posture as of June 13, 2026.

The No-Robots Site: cars.com

cars.com returned no parseable robots.txt file. In the absence of a robots.txt, compliant AI crawlers typically default to full access. For competitive-intelligence purposes, cars.com has an undeclared AI-crawling posture as of the snapshot date.

All 16 Categories: Cross-Category Ranking

The table below shows all 16 categories from the June 2026 Closing Web snapshot, ordered by block rate. Automotive sits in the middle of the pack, just below the corpus average.

Category	Sites Checked	With robots.txt	Blocking Any AI	Block Rate
News	20	15	13	86.7%
Food	10	10	7	70%
Tech	15	13	9	69.2%
Entertainment	9	9	6	66.7%
Healthcare	10	9	6	66.7%
Reference	14	11	6	54.5%
Automotive	10	9	4	44.4%
Social	10	10	4	40%
Sports	10	10	4	40%
Travel	9	9	3	33.3%
Legal	10	7	2	28.6%
Real Estate	10	7	2	28.6%
Finance	12	11	2	18.2%
Retail	15	12	2	16.7%
Education	9	7	1	14.3%
Government	9	8	1	12.5%

News leads all categories at 86.7%. Automotive sits seventh overall, and uniquely close to the corpus-wide average. Categories above Automotive — News, Food, Tech, Entertainment, Healthcare, and Reference — all block at rates substantially higher than the corpus mean. Categories below — Social, Sports, Travel, Legal, Real Estate, Finance, Retail, Education, and Government — are progressively more permissive.

For readers who want to examine how neighboring categories are navigating AI-access decisions, the Real Estate category report offers a comparable look at a property-adjacent sector with a notably lower block rate of 28.6%.

Corpus-Wide Bot and Operator Leaderboards

The following tables capture which AI bots and which operators are most frequently blocked across all 157 sites in the full corpus — not just Automotive. These are corpus-wide figures.

Bot	Sites Blocking (of 157)	Block Rate
CCBot	58	36.9%
ClaudeBot	53	33.8%
GPTBot	45	28.7%
Bytespider	44	28%
PerplexityBot	42	26.8%
Meta-ExternalAgent	39	24.8%
Applebot-Extended	39	24.8%
Google-Extended	37	23.6%
Amazonbot	31	19.7%

CCBot, operated by Common Crawl, is the most widely blocked bot across all 157 sites, with 58 sites issuing a disallow. ClaudeBot follows with 53 sites. GPTBot and Bytespider are both blocked by more than a quarter of the full corpus. Amazonbot is the least-blocked of the 9 tracked bots at 19.7%.

Operator	Sites Blocking (of 157)
Common Crawl	58
Anthropic	55
OpenAI	47
Meta	45
ByteDance	44
Perplexity	42
Apple	39
Google	37
Cohere	36
Diffbot	36
Amazon	31
Mistral	15

Common Crawl and Anthropic are the most frequently blocked operators across the 157-site corpus. Mistral, with only 15 sites blocking its crawlers, is the least-blocked of the 12 tracked operators. The spread from 58 to 15 illustrates how unevenly operator-by-operator blocking is distributed across the web.

These leaderboard figures are corpus-wide. The Automotive category contributed 4 blocking sites, each with its own specific combination of named bots — the leaderboard does not break out per-category bot-level counts.

Additionally, across the full 157-site corpus, 27 sites (17.2%) have deployed an llms.txt file — a newer, opt-in mechanism for communicating AI-readable content preferences. That figure is a corpus-wide count from the same sealed snapshot.

For a comparison of how a high-blocking category differs from Automotive in its approach, the Food category report covers a sector where 7 of 10 sites actively block AI crawlers.

Methodology

The US Tech Automations Research team fetched the publicly accessible robots.txt file for each of the 182 sites in the June 2026 Closing Web corpus. Each file was parsed for user-agent strings matching any of the 9 tracked AI crawlers. A site is counted as "blocking" if at least one of those 9 bots is named in a disallow directive. A site is counted as "allowing" if a parseable robots.txt was found and none of the 9 bots appear in a disallow. A site is counted as "no robots.txt" if the file was absent or unparseable at snapshot time.

Nothing is estimated, modeled, or extrapolated. Every count is a direct, verbatim read from the snapshot sealed June 13, 2026 under sha 9ceca3bdf0dfeaca. No inferences are drawn about enforcement, legal standing, or actual crawler behavior. robots.txt communicates declared intent — not guaranteed outcome.

FAQ

Q: Does blocking a bot in robots.txt actually prevent it from crawling?

A: No. robots.txt is an honor-system protocol. Compliant bots will read and respect the file. Non-compliant actors can and do ignore it. This snapshot records what 10 Automotive sites have declared — not what any specific crawler has actually done. Legal and technical enforcement options exist independently of robots.txt.

Q: Why is the Automotive block rate almost exactly at the corpus average?

A: The snapshot captures a point-in-time posture. Automotive is unusual in how closely it mirrors the corpus mean of 45.9%. That may reflect a genuinely mixed industry — editorial and valuation sites (edmunds.com, kbb.com, motortrend.com, jdpower.com) have different data assets than marketplace and utility sites (autotrader.com, carfax.com, truecar.com), and those different business models may be driving different blocking decisions.

Q: What should I read into a site having no robots.txt file?

A: In this snapshot, only cars.com has no parseable robots.txt. That means no honor-system signal was found at snapshot time. Compliant crawlers typically default to full access when no file is present. It marks the site as having an undeclared posture — not necessarily an open one, since technical access controls can exist independently of robots.txt.

Q: Are the bot leaderboard numbers specific to Automotive sites?

A: No. The bot and operator leaderboards reflect counts across all 157 sites in the full corpus. They show which bots and operators face the widest blocking across the entire study — not just within Automotive. For Automotive-specific blocking, only 4 sites are relevant.

Q: How often does a sites robots.txt change?

A: robots.txt files can be updated at any time without notice. This snapshot reflects the state of 10 Automotive sites on June 13, 2026. Any site could add or remove AI-crawler disallows the following day. Monitoring for changes on a recurring basis is the only way to maintain an accurate picture of the current posture.

Put AI-Access Data to Work

The Automotive category sits at the corpus average — which makes it an especially useful reference point for teams that need to calibrate their AI-access monitoring strategy against a known midpoint.

An SEO or content lead at an automotive media company can use this data to map exactly which competitors are open to AI indexing and which have locked down. With 4 blockers and the rest open, the field is divided. A recurring automated job that re-fetches all 10 Automotive robots.txt files weekly and diffs against this sealed baseline — alerting whenever a previously open site (caranddriver.com, autotrader.com, carfax.com, roadandtrack.com, truecar.com) adds an AI disallow — turns a static snapshot into a living signal.

A publisher RevOps lead at an automotive data company can track whether AI platforms are drawing from open-access sites more heavily than from blocked ones. As more automotive editorial and pricing content gets blocked, retrieval gaps emerge that affect AI-generated answers about vehicle specs, pricing, and reviews. A monthly cadence job comparing current robots.txt states against this June 2026 baseline (sha 9ceca3bdf0dfeaca) makes that drift visible before it becomes a competitive intelligence blind spot.

A retrieval or data engineer building an automotive knowledge base or RAG pipeline needs to know — today and on an ongoing basis — which of these 10 sources have declared off-limits status under the honor system. Automating that re-check monthly and logging changes against the sealed baseline means compliance posture is always current. For a reference on how a more aggressive category handles the same monitoring challenge, see the Healthcare category report, where 6 of 9 sites block AI crawlers.

US Tech Automations builds agentic workflows that automate this recurring monitoring — fetching, parsing, diffing, and alerting on robots.txt changes so teams act on live signals rather than static point-in-time snapshots. See how agentic workflows handle AI-access monitoring at scale.

See where Automotive sites fit in the broader trend in our study of how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 13, 2026 (snapshot sha 9ceca3bdf0dfeaca).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Automotive Sites Block AI Crawlers? Sealed robots.txt Data.” https://ustechautomations.com/resources/blog/do-automotive-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 9ceca3bdf0dfeaca

Machine-readable data: CSV · JSON · All research & methodology