Research & Data

Do Auction Sites Block AI Crawlers? 2 of 7 Do

Jun 18, 2026

The auction trade is where the world's most famous houses meet a long tail of online marketplaces, and their robots.txt files split along an unexpected line. The marquee names mostly leave the door open. The two that gate could not be more different from each other — one shuts out a single crawler, the other shuts out nearly every one tracked.

2 of 7 Auction sites block at least one AI crawler.

Of the auction sites we checked, 7 returned a parseable robots.txt — the root-level file that tells automated agents which paths they may fetch — and 2 of those disallow an AI crawler. That works out to a 28.6% block rate, just above the corpus figure of 27.2%. Every number here is read straight from the sealed file; nothing is estimated, modeled, or extrapolated.

A robots.txt block is not a wall. It is a posted request — an honor-system line that names a crawler and asks it to stay out — and what makes auctions interesting is how few of the prestige houses bothered to post one. The two that did, bonhams.com and dorotheum.com, sit at opposite ends of the strictness range, which is the real story of this slice.

The Two Blockers Sit at Opposite Extremes

What separates this category from a tidy "houses gate, marketplaces do not" narrative is that both blockers are old-line European auction houses, and they took completely different approaches.

bonhams.com is the minimalist. Its robots.txt disallows exactly one AI crawler: Bytespider, the agent operated by ByteDance. Every other tracked token — the OpenAI, Anthropic, Google, and Amazon crawlers — is left free to fetch. A single-bot block like this usually means a site reacted to one specific crawler's behavior rather than adopting a blanket anti-AI stance.

dorotheum.com is the maximalist, and it is the most comprehensive block in this category by a wide margin. Its file disallows GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Meta-ExternalAgent, Amazonbot, and Applebot-Extended — spanning the OpenAI, Anthropic, Google, Common Crawl, ByteDance, Meta, Amazon, and Apple operators in one sweep. When a site names that many tokens, it is making a deliberate policy choice to keep its catalog out of AI training pipelines across the board.

The two auction blockers — bonhams.com and dorotheum.com — are both European houses, yet one gates a single crawler and the other gates eight.

The five that allow everything are a deliberate mix of the famous and the niche: sothebys.com, phillips.com, bringatrailer.com, saffronart.com, and artsy.net. Sotheby's and Phillips are two of the largest names in fine art and luxury sales; Bring a Trailer is the dominant online collector-car marketplace; Saffronart specializes in South Asian art; and Artsy is an art-discovery and sales platform. None of them disallows an AI agent. For a marketplace whose lots turn over constantly, being readable by an AI assistant keeps individual listings eligible to surface when a buyer asks an answer engine about a piece.

Reading the 28.6% Honestly

A 28.6% block rate puts auctions a hair above the corpus line, and the honest read is that this is a category split between two postures rather than one trending in a single direction. The denominator matters here. We do not divide by every auction site we looked at — we divide by the ones that published a parseable robots.txt, which is 7. That keeps the rate clean: a site with no readable policy is neither counted as an allow nor a block.

A building permit being pulled is a request to do work, not a finished job; in the same spirit, a robots.txt block is a request, not an enforced barrier. The standard is voluntary. bonhams.com and dorotheum.com publish lines asking specific crawlers to stay out, and each crawler decides on its own whether to honor them. We are reporting what the sites declared, not what any crawler actually did.

Three more auction domains complicate the picture by saying nothing parseable at all. christies.com returned a redirect that did not resolve to a file, while catawiki.com and artnet.com each refused our request. With no readable file at seal time, all three are logged as silent — excluded from the rate entirely, never counted as an allow or a block.

That Christie's, one of the two most storied houses in the world, lands in the silent bucket rather than the allow or block column is its own small caution: absence of a readable policy is not the same as an open-door policy.

Auction sites post a 28.6% AI-crawler block rate.

So the category reads as genuinely ordinary against the corpus, and that ordinariness is itself the signal. Auctions are not a sector mounting a coordinated defense of their listing data, nor one throwing the gates wide. The number is the weighted result of two houses choosing to gate — one narrowly, one broadly — while the rest of the readable field stays open.

Where Auctions Sit Among Similar Categories

A 28.6% reading lands auctions mid-pack, sharing its exact rate with a cluster of unrelated verticals. The focused window below shows auctions beside its nearest neighbors in the ranking, verbatim from the sealed snapshot — category name first, no rank column.

Category	Sites	With robots.txt	Block ≥1 crawler	Block rate
Beekeeping	10	10	3	30%
Auctions	10	7	2	28.6%
Legal	10	7	2	28.6%
RealEstate	10	7	2	28.6%
Pets	10	7	2	28.6%
Chess	10	7	2	28.6%
Fragrance	10	7	2	28.6%
Crafts	10	8	2	25%

Auctions share their 28.6% mark with Legal, Real Estate, Pets, Chess, and Fragrance — a crowded band where two blockers out of seven readable files is the common result. It is a sign that auctions are unremarkable against the average, behaving like a typical content-and-commerce category rather than a privacy-sensitive or fiercely-protective one. The contrast with how fragrance sites handle AI crawlers is instructive: same headline rate, but there two scent databases drive the number, while here two old-world houses do. The extremes table shows what the ends of the ranking look like:

Category	Sites	With robots.txt	Block ≥1 crawler	Block rate
Gaming	9	9	8	88.9%
News	20	17	14	82.4%
Grocery	10	7	1	14.3%
Hotels	10	3	0	0%

Auctions sit far below Gaming and News, where the overwhelming majority of policied sites gate, and well above the zero-block floor that categories like hotel booking sites define, where not a single readable file disallows a crawler.

Which Operators Get Gated Most

The two auction blockers add to a much larger corpus pattern, and knowing which operators get gated most tells a house which one a competitor reached for first. The cut below shows the most-disallowed operators across all 1123 sites with a parseable robots.txt, operator name first, count next.

Operator	Sites disallowing	Rate
Common Crawl	228	20.3%
Anthropic	217	19.3%
OpenAI	209	18.6%
Meta	196	17.5%
ByteDance	195	17.4%

Common Crawl tops the corpus blocklist, with Anthropic and OpenAI close behind. dorotheum.com's sweeping file disallows all five of these operators at once, joining the broad pattern of gating the highest-volume training crawlers. bonhams.com, by contrast, touches only the ByteDance line — the operator behind Bytespider, which 195 of the 1123 readable sites disallow.

Corpus-wide, 305 of 1123 sites block at least one AI crawler.

How the Auction Snapshot Was Sealed

These figures come from one point-in-time crawl of public robots.txt files, sealed June 18, 2026 under snapshot sha 74d390d8f5175d21. For each auction domain we fetched robots.txt at the root, parsed its user-agent and disallow directives, and recorded whether any AI crawler token carried a Disallow. We report verbatim counts; nothing is estimated, modeled, or extrapolated. Domains with no parseable file — christies.com, catawiki.com, and artnet.com — are logged as silent, neither allow nor block.

A sealed snapshot is a single, content-addressed read: anyone holding sha 74d390d8f5175d21 can re-derive the same seven readable files and the same two blockers. The counting rule is strict and worth stating plainly. A block is an explicit Disallow aimed at a named AI agent — GPTBot, ClaudeBot, Bytespider, and the other tracked tokens. A site can disallow administrative or search paths without naming an AI agent, and that does not count as a block here. Only a directive that names one moves a site into the blocker column, which is why the auction count is a clean 2.

US Tech Automations runs this read across 1374 sites checked, 1123 with a parseable robots.txt, spanning 138 categories. Auctions contribute 7 of those readable files, and we report its slice as exactly the 7 it is.

The method deliberately does not retry a slow host until a file appears, does not follow a redirect into a different domain's policy, and does not infer a block from a site that merely looks unfriendly to bots — which is exactly why christies.com, having answered with a redirect that did not resolve to a file, sits in the silent bucket rather than the allow column.

Frequently Asked Questions

Q: Which two auction sites block AI crawlers?

A: bonhams.com and dorotheum.com — both European auction houses. They are the two domains among the 7 with a parseable robots.txt that disallow an AI crawler, together making the 28.6% block rate. bonhams.com blocks only Bytespider, while dorotheum.com blocks eight separate AI tokens.

Q: Why does dorotheum.com block so many more crawlers than bonhams.com?

A: The two houses made different policy calls. dorotheum.com disallows GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Meta-ExternalAgent, Amazonbot, and Applebot-Extended — a deliberate blanket choice to keep its catalog out of AI training. bonhams.com names only Bytespider, the signature of a reaction to one specific crawler rather than a broad stance.

Q: Why are Christie's, Catawiki, and Artnet not counted?

A: None of the three returned a parseable robots.txt at the seal. christies.com answered with a redirect that did not resolve to a file, and catawiki.com and artnet.com each refused our request. With no readable file, all three are logged as silent and excluded from the rate rather than counted as allows or blocks.

Q: Does a robots.txt block actually stop an AI crawler?

A: Not by force. robots.txt is an honor-system standard: a cooperative crawler reads the file and complies, but the file enforces nothing technically. bonhams.com and dorotheum.com signal that named AI agents should stay out of their listings; each crawler decides whether to honor that request.

Put AI-Access Data to Work

For an auction-house digital or e-commerce lead — the person responsible for whether lots, lot descriptions, and provenance pages surface in AI shopping and answer engines — this snapshot is the baseline. The prestige field is mostly open: sothebys.com, phillips.com, and bringatrailer.com all allow every crawler, while only dorotheum.com gates broadly.

Set a recurring crawl that re-reads robots.txt for your own domain plus a peer set, and alert the moment a direct competitor adds an AI crawler token to its disallow list — because in a category this close to the corpus line, one house changing posture moves the whole picture.

A market-intelligence or competitive-research analyst tracking the collectibles and fine-art space is the second fit: they can monitor the same seven domains to catch when a silent site like christies.com finally publishes a readable policy, or when a single-bot blocker like bonhams.com widens its list, since either shift changes what AI assistants can say about a sale.

The catalog-protection instinct here rhymes with a far more guarded vertical — stock-media libraries gate at fully half their readable files, the posture an auction house would move toward if it ever decided its lot data needed the same fence. US Tech Automations runs these scheduled robots.txt crawls with change alerts so a policy shift surfaces the week it lands rather than at the next manual audit. See how the agentic monitoring works.

Corpus-wide, 298 of 1123 sites publish an llms.txt file.

Key Takeaways

Of the 7 Auction sites with a parseable robots.txt, 2 block at least one AI crawler — a 28.6% rate, just above the 27.2% corpus figure.
The two blockers are European houses at opposite extremes: bonhams.com disallows only Bytespider, while dorotheum.com disallows eight AI tokens across eight operators.
The five allowers include marquee names — sothebys.com, phillips.com, bringatrailer.com, saffronart.com, and artsy.net — none of which gate any crawler.
christies.com, catawiki.com, and artnet.com returned no parseable file and are excluded from the block-rate math as silent.
Corpus-wide, 305 of 1123 sites (27.2%) gate at least one crawler, so auctions land right on the line.

See where Auctions sites fit in the broader trend in our study of how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 18, 2026 (snapshot sha 74d390d8f5175d21).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Auction Sites Block AI Crawlers? 2 of 7 Do.” https://ustechautomations.com/resources/blog/do-auction-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 74d390d8f5175d21

Machine-readable data: CSV · JSON · All research & methodology