Research & Data

Who Blocks Amazonbot? 22 of 107 Top Sites Do

Q: How does Amazonbot compare to the lowest-blocked operator?

The least-blocked operator in this corpus is [MistralAI-User at 12 sites](/resources/blog/who-blocks-mistral-ai-crawler-2026). Amazonbot at 22 sites has nearly double that total, indicating a meaningfully higher profile among webmasters managing AI-access policy as of June 13, 2026.

Jun 13, 2026

As of June 13, 2026, 22 of the 107 top sites that returned a parseable robots.txt file had blocked the Amazonbot user-agent with at least one Disallow: / directive. That places Amazonbot below Diffbot (30 sites) and the cohere-ai crawler (27 sites) among the 12 AI operators tracked in this corpus, but the named blockers list still includes some of the most prominent publishers on the web.

"Blocking" means a site's robots.txt explicitly names the Amazonbot user-agent string with a Disallow: / directive on at least one path. The robots.txt standard communicates intent, not a technical enforcement mechanism.

Methodology and Data Integrity

This report is based on a point-in-time fetch of public robots.txt files from 122 prominent sites, sealed on June 13, 2026. Of those 122, 107 returned a parseable robots.txt. All counts in this report are verbatim from the sealed snapshot and reflect stated access policy as of that date only.

Every figure in this report is a verbatim count from the snapshot: nothing is estimated, modeled, or extrapolated. No percentages have been derived from sub-groups after the fact. The methodology treats each disallow rule as a binary presence: either the Amazonbot user-agent appears with a disallow path in a site's robots.txt, or it does not.

robots.txt is an honor-system standard that communicates a site operator's stated intent. It is not a technical enforcement mechanism. A compliant crawler will respect the directives; a non-compliant one will not. These numbers will not change as sites later edit their files — the snapshot is sealed.

22 of 107 sites with parseable robots.txt files block Amazonbot as of June 13, 2026.

How Often Amazonbot Is Refused

Amazon AI crawler operates under the Amazonbot user-agent string, tracked as a single agent in this corpus. The block count maps directly to sites that named it.

User-Agent	Sites Blocking
Amazonbot	22

For context: 48 of 107 sites block at least one AI crawler of any kind. Amazonbot at 22 sits below the corpus median for the headline operators, indicating that Amazon has a lower recognition profile in webmaster AI-access policy than some of the longer-established operators in this study.

22 of 107 top sites with a parseable robots.txt had explicitly blocked Amazonbot as of June 13, 2026. News and Tech together account for 16 of those 22 blocks.

22 of 107 sites block Amazonbot — 20.6% of the panel.

News drives 10 of Amazonbot 22 blocks.

9 of 107 sites block all 9 headline AI crawlers.

News (10) and Tech (6) together account for 16 of Amazonbot's 22 total blocks.

Which Industries Block Amazonbot

News and Tech publishers drive the majority of Amazonbot blocks, matching the pattern seen across virtually every AI operator in this corpus. The category breakdown reveals some notable structural differences from higher-block-rate peers.

Category	Sites Blocking Amazonbot
News	10
Tech	6
Social	2
Reference	1
Retail	1
Travel	1
Government	1

News leads with 10 sites — the same absolute News count as the cohere-ai crawler, even though cohere-ai's total is higher at 27. Tech adds 6, and Social contributes 2 — tumblr.com and medium.com, suggesting that user-generated-content platforms are specifically concerned about Amazon crawling community-authored content.

The Retail entry is particularly striking: ebay.com blocks Amazonbot. eBay and Amazon are direct marketplace rivals, and eBay has a clear business reason to restrict a crawler operated by its primary competitor. This is one of the clearest examples in the corpus of robots.txt being used as a competitive instrument rather than a bandwidth or legal measure.

Government and Travel each contribute 1 block (congress.gov and tripadvisor.com respectively). The Reference block is healthline.com, which has appeared as a blocker for other operators and is consistent with health-content publishers taking a cautious stance toward AI data extraction.

ebay.com scores 8 of 9 headline crawlers blocked — indicating a broad AI-access policy that includes Amazonbot as part of a deliberate competitive posture toward Amazon itself.

The Named Sites That Block Amazonbot

All 22 blocking sites are identified in the sealed snapshot. The table below shows all 22, ordered by headline-crawlers-blocked count.

Site	Category	Headline Crawlers Blocked (of 9)
bbc.com	News	9
bloomberg.com	News	9
usatoday.com	News	9
cnn.com	News	8
forbes.com	News	8
theatlantic.com	News	8
wired.com	Tech	8
arstechnica.com	Tech	8
cnet.com	Tech	8
zdnet.com	Tech	8
mashable.com	Tech	8
ebay.com	Retail	8
congress.gov	Government	8
washingtonpost.com	News	7
theguardian.com	News	7
newsweek.com	News	7
healthline.com	Reference	7
tripadvisor.com	Travel	7
apnews.com	News	6
tumblr.com	Social	6
medium.com	Social	6
venturebeat.com	Tech	4

The top three — bbc.com, bloomberg.com, usatoday.com — are broad-spectrum blockers that appear at the top of nearly every operator list in this corpus. They block all 9 headline AI crawlers, so Amazonbot is caught in a blanket policy rather than a targeted decision.

ebay.com blocks Amazonbot with a headline score of 8 — the only Retail entry in the 22-site blocker list.

The presence of theguardian.com here is a data point worth noting separately. At 7 headline crawlers blocked, The Guardian is a partial blocker that has made selective choices. Its inclusion in the Amazonbot list — despite not running a blanket deny policy — suggests some publishers are paying attention to Amazon's crawler specifically.

venturebeat.com sits at a headline score of 4, making it one of the more selective blockers in the corpus. Its presence indicates deliberate attention to Amazonbot rather than a side effect of a comprehensive AI-access deny list.

For teams building on any Amazon AI product that relies on crawled content, comparing the Amazonbot list to the OpenAI GPTBot list reveals which publishers have extended their AI-access policy to include Amazon specifically versus those running a blanket block. The MistralAI-User crawler with only 12 blockers shows how much lower Amazonbot could be if publisher awareness were lower.

Corpus-Wide Access Policy Context

Of the 122 sites checked in this sealed snapshot, 107 returned a parseable robots.txt. Of those 107, 48 block at least one AI crawler of any kind. 20 sites publish an llms.txt or equivalent structured access file. Only 9 sites block all 9 headline AI crawlers tracked — the most restrictive tier, at 8.4% of the parseable-robots corpus.

Amazonbot at 22 sits in the lower half of the 12 tracked operators by block count. The 26 sites between Amazonbot's 22 and the 48-site any-AI-crawler total represent properties that restrict some operators but have not added an Amazonbot rule. Whether that gap closes over time is outside the scope of this point-in-time snapshot.

The 9 sites that block all 9 headline AI crawlers (8.4% of the parseable-robots corpus) all appear in the Amazonbot blockers list — meaning 9 of Amazonbot's 22 blocks originate from that most-restrictive tier. The remaining 13 are publishers who have made deliberate choices to restrict Amazonbot without running a blanket deny policy. The ebay.com Retail block is the clearest case of a deliberate, operator-specific policy decision in the entire Amazonbot named-blockers list. That distinction — blanket vs. deliberate — matters for teams trying to interpret the risk profile of a given source domain in an Amazonbot-dependent retrieval pipeline.

Put This Data to Work

For a retrieval-pipeline engineer building on publicly crawled content from Amazon AI products, the 22-site block list is a meaningful constraint. News publishers represent 10 of the 22 blocks, meaning a significant share of authoritative editorial content is off-limits under Amazon's stated crawl policy.

The competitive retail dynamic between Amazonbot and ebay.com is a concrete example of a broader pattern: companies increasingly use robots.txt as a competitive tool. A team building on Amazon crawled data should audit whether competitor sites of Amazon's ecosystem are systematically excluded.

US Tech Automations builds automated policy-monitoring workflows that schedule periodic robots.txt fetches, diff them against a baseline, and alert when a target site adds or removes an Amazonbot block. For enterprise teams managing content-sourcing pipelines, that monitoring layer is the difference between discovering a policy change the day it happens versus months later when retrieval quality has already degraded.

US Tech Automations can integrate monitoring into an existing stack — Slack, webhook, or a scheduled digest — so data engineering teams never have to manually audit robots.txt files.

Frequently Asked Questions

Q: Does blocking Amazonbot in robots.txt stop Amazon from crawling a site?

A: It communicates intent under the honor-system robots.txt protocol. A crawler that respects the standard will honor the Disallow directive. robots.txt does not enforce compliance at a network or legal level — it records the site's stated preference.

Q: Does blocking Amazonbot affect Amazon product search or AWS services?

A: Amazonbot is Amazon AI-specific web crawler, distinct from the crawlers Amazon uses for product indexing, Alexa rankings, or other Amazon web properties. Blocking Amazonbot should not affect product listings, AWS services, or standard Amazon integrations.

Q: Why does ebay.com block Amazonbot?

A: The sealed snapshot identifies ebay.com as a blocker with a headline score of 8. The most straightforward explanation is competitive: eBay and Amazon are direct marketplace rivals, and eBay has a business interest in restricting Amazon automated access to its product and pricing data.

Q: Is this data current?

A: This report is a point-in-time snapshot sealed on June 13, 2026 (snapshot sha 741353c4304216ee). robots.txt files change over time and these figures are fixed to that date; they will not be updated as sites modify their files.

Q: How many total crawlers and operators were tracked?

A: 12 operators and 21 individual crawler user-agents were tracked across 122 prominent sites in 10 content categories; 107 returned parseable robots.txt files.

Q: How does Amazonbot compare to the lowest-blocked operator?

A: The least-blocked operator in this corpus is MistralAI-User at 12 sites. Amazonbot at 22 sites has nearly double that total, indicating a meaningfully higher profile among webmasters managing AI-access policy as of June 13, 2026.

Key Takeaways

22 of 107 sites with parseable robots.txt files block Amazonbot as of June 13, 2026.
News accounts for 10 of those 22 blocks; Tech adds 6 — together the majority of all Amazonbot refusals.
ebay.com is the only Retail blocker, with a headline score of 8 — a signal of deliberate competitive policy rather than blanket AI-access restriction.
Social platforms tumblr.com and medium.com both block Amazonbot, indicating that user-generated-content platforms are actively managing AI-crawl access.
48 of 107 sites block at least one AI crawler; Amazonbot at 22 sites sits in the lower tier of the 12 operators tracked.
theguardian.com appears in the Amazonbot list as a selective partial blocker (7 of 9 headline crawlers), suggesting some publishers are paying deliberate attention to Amazon AI access specifically.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 13, 2026 (snapshot sha 741353c4304216ee).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Who Blocks Amazonbot? 22 of 107 Top Sites Do.” https://ustechautomations.com/resources/blog/who-blocks-amazonbot-2026

Sealed snapshot sha256: 741353c4304216ee

Machine-readable data: CSV · JSON · All research & methodology