Research & Data

How Many Sites Block Applebot-Extended? robots.txt Data

Jun 13, 2026

Key Takeaways

31 of 107 top sites block Applebot-Extended.

Applebot-Extended is blocked at a 29% rate across 107 sites.

48 of 107 sites block at least one AI crawler.

Of the 122 prominent sites in our starting universe, 107 returned a parseable robots.txt file. Of those 107, exactly 31 block Applebot-Extended — a block rate of 29%. Among the 9 crawlers measured in this edition, Applebot-Extended ranks fifth, behind GPTBot (33 sites) and ahead of Meta-ExternalAgent (30 sites).

Applebot-Extended is the Apple AI-training opt-out token. Unlike some crawlers that gather content for general indexing, Applebot-Extended is specifically the token publishers use to signal that they do not want their content used for Apple AI-training purposes. Apple as an operator is blocked by 31 sites — matching the Applebot-Extended per-bot count, indicating that this token is the primary mechanism publishers use when targeting Apple AI access.

Of 107 prominent sites with a parseable robots.txt, 31 block Applebot-Extended — a 29% block rate, fifth highest among 9 crawlers measured in June 2026.

48 of 107 sites (44.9%) block at least one AI crawler; Applebot-Extended block rate of 29% falls below that corpus-wide threshold.


What Is Applebot-Extended and Why Do Publishers Block It

Applebot-Extended is the Apple AI-training opt-out token. Publishers include it in their robots.txt to specifically opt out of having their content used for Apple AI-training workloads. It is distinct from the standard Applebot user-agent, which governs general web indexing by Apple. The -Extended suffix signals the AI-training-specific context.

This design — a dedicated token for AI training versus general indexing — makes Applebot-Extended a useful case study. A publisher who blocks Applebot-Extended is making a targeted statement about AI-training data use, not a general statement about Apple search crawling. That specificity is visible in the data: Apple the operator is blocked by 31 sites, exactly matching the Applebot-Extended count, which indicates that most publisher rules specifically target the AI-training token rather than all Apple crawl activity.

The block rate of 29% is below the corpus-wide figure of 44.9% — meaning more than half of the sites that block any AI crawler do not specifically target Applebot-Extended. Whether that reflects awareness gaps, deliberate policy distinctions, or simply the timing of when rules were added cannot be determined from the robots.txt text alone.

For a comparison with the fourth-ranked GPTBot from OpenAI, see how many sites block GPTBot. For the full ranking across all 9 crawlers, see the leaderboard table in this report.


Methodology

US Tech Automations Research collected publicly accessible robots.txt files from 122 prominent sites on June 13, 2026. For each site, we fetched the file, parsed its directives, and checked whether Applebot-Extended would be covered by a broad disallow — either via a dedicated User-agent: Applebot-Extended section or a catch-all User-agent: * section with a broad disallow.

Of the 122 sites, 107 returned a parseable robots.txt file. The remaining 15 are excluded from all percentages and counts. Every figure is a raw read; nothing is estimated, modeled, or extrapolated. Every number in this report is a verbatim count from raw public file text. The snapshot is sealed with sha256 hash 741353c4304216ee, data window point-in-time, June 13, 2026.

MetricValue
Sites in starting universe122
Sites with a parseable robots.txt107
Sites blocking Applebot-Extended31
Block rate29%

Sites That Block Applebot-Extended

The 31 sites that explicitly block Applebot-Extended are concentrated in news media, technology publications, and entertainment. The health and reference segments also appear. The relative absence of e-commerce and financial destinations from the blocker list is a notable pattern for this crawler — those categories skew toward the allower side.

Major news outlets blocking Applebot-Extended include nytimes.com, washingtonpost.com, theguardian.com, bbc.com, cnn.com, apnews.com, bloomberg.com, forbes.com, theatlantic.com, usatoday.com, newsweek.com, and vox.com. Entertainment titles rollingstone.com, variety.com, hollywoodreporter.com, and billboard.com are in the blocker list.

Technology media blocking Applebot-Extended includes techcrunch.com, theverge.com, wired.com, arstechnica.com, cnet.com, zdnet.com, mashable.com, and venturebeat.com. The health segment is represented by healthline.com. The reference category is represented by dictionary.com.

User-generated and social content platforms quora.com, tumblr.com, and medium.com block Applebot-Extended. Review platform tripadvisor.com also blocks it. The e-commerce destination ebay.com is in the blocker list.

Notable allowers for Applebot-Extended span a very broad territory. Major news outlets reuters.com, wsj.com, businessinsider.com, latimes.com, and time.com all permit it. Technology media gizmodo.com, engadget.com, hackernews.com, slashdot.org, and github.com do not block it. Community platforms reddit.com, linkedin.com, pinterest.com, substack.com, wordpress.com, blogger.com, and vimeo.com allow Applebot-Extended.

The full e-commerce category — amazon.com, walmart.com, target.com, bestbuy.com, etsy.com, homedepot.com, wayfair.com, ikea.com, nordstrom.com, nike.com, and shopify.com — does not block Applebot-Extended. Financial services sites chase.com, bankofamerica.com, wellsfargo.com, fidelity.com, paypal.com, nerdwallet.com, bankrate.com, morningstar.com, marketwatch.com, fool.com, and coinbase.com all allow it. Reference and health sites wikipedia.org, britannica.com, merriam-webster.com, investopedia.com, webmd.com, medlineplus.gov, and cdc.gov do not block it.

Government portals usa.gov, irs.gov, sec.gov, whitehouse.gov, congress.gov, census.gov, nasa.gov, and uspto.gov all allow Applebot-Extended. Educational sites mit.edu, harvard.edu, stanford.edu, coursera.org, edx.org, duolingo.com, and scholar.google.com permit it. Entertainment platforms netflix.com, spotify.com, youtube.com, hulu.com, espn.com, and twitch.tv do not block Applebot-Extended. Travel destinations expedia.com, booking.com, airbnb.com, kayak.com, marriott.com, hilton.com, yelp.com, and lonelyplanet.com also allow it.


Cross-Bot Leaderboard (all 107 sites)

The full ranked leaderboard across the same 107-site corpus places Applebot-Extended in fifth position. The table below enables direct comparison with all 8 sibling crawlers.

BotSites BlockingBlock Rate
CCBot4037.4%
ClaudeBot3835.5%
Bytespider3734.6%
GPTBot3330.8%
Applebot-Extended3129%
Meta-ExternalAgent3028%
PerplexityBot2927.1%
Google-Extended2523.4%
Amazonbot2220.6%

Applebot-Extended at 31 is two sites behind GPTBot and one site ahead of Meta-ExternalAgent, making this a tight cluster in the middle of the leaderboard. The top three crawlers (CCBot, ClaudeBot, Bytespider) are separated from Applebot-Extended by a gap of six to nine sites, suggesting that publishers who address AI-crawler policy do not uniformly extend that policy to every bot.

For the sixth-ranked Meta-ExternalAgent, see how many sites block Meta-ExternalAgent. For the full CCBot story, see how many sites block CCBot.


Operator Leaderboard (all 107 sites)

This table aggregates blocking by operator, counting a site once if it blocks at least one crawler from that organization.

RankOperatorSites Blocking
1Common Crawl40
2Anthropic39
3ByteDance37
4OpenAI35
4Meta35
6Apple31
7Diffbot30
8Perplexity29
9Cohere27
10Google25
11Amazon22
12Mistral12

Apple ranks sixth at 31 on the operator leaderboard — matching the Applebot-Extended per-bot count exactly. When the operator count equals the per-bot count, it indicates that Applebot-Extended is the only Apple crawler token appearing in disallow rules across the 107 sites. No additional Apple-attributed token pushes the operator count higher.

Comparing Apple (31) with OpenAI (35) and Anthropic (39) shows that Apple attracts fewer disallow rules at the operator level than these two, even though Apple is a prominent consumer technology company. The ranking reflects current publisher policy as of the June 13, 2026 snapshot, not a structural or permanent ordering.

At the bottom, Mistral is blocked by only 12 sites — less than half the Apple operator count. The spread from 40 (Common Crawl) to 12 (Mistral) underscores how differentiated publisher policy is across the 12 operators in this edition.


Frequently Asked Questions

Q: What is the difference between Applebot and Applebot-Extended?

A: Applebot is the general Apple web crawler used for indexing and search. Applebot-Extended is the specific token Apple introduced for publishers to opt out of having their content used for Apple AI-training purposes. Blocking Applebot-Extended in robots.txt addresses AI-training use; it does not necessarily block general Apple indexing. This report measures only Applebot-Extended.

Q: Does blocking Applebot-Extended prevent Apple from using the site content for AI?

A: robots.txt is an honor-system standard. A compliant crawler respects disallow directives for Applebot-Extended. Compliance is not technically enforced. Whether Apple respects a given site robots.txt rule is verifiable only through server log analysis. The robots.txt entry represents a publisher preference signal, not a technical barrier.

Q: Why do the Apple operator count and the Applebot-Extended per-bot count match exactly?

A: When both numbers equal 31, it means no site in the corpus blocks an Apple-attributed crawler token other than Applebot-Extended. Every site counted in the Apple operator total was counted specifically for an Applebot-Extended rule. If Apple were to introduce additional AI-adjacent crawler tokens, publishers would need separate rules to cover them.

Q: What does the corpus-wide 44.9% figure mean relative to Applebot-Extended block rate?

A: 48 of 107 sites (44.9%) block at least one AI crawler of any kind. Applebot-Extended block rate of 29% is well below that line. A site that blocks only Applebot-Extended is counted in both figures; but many sites in the 44.9% group did not specifically include Applebot-Extended in their disallow rules. The gap illustrates that a site can have an AI-blocking policy without targeting every specific crawler.

Q: How can publishers verify that their Applebot-Extended rule is correctly formed?

A: A correctly formed rule requires a User-agent: Applebot-Extended directive followed by Disallow: / (or the relevant paths). Testing requires fetching your own robots.txt and parsing it against the exact user-agent string. Parsing this at scale across a defined site universe, on a recurring schedule, is exactly the kind of check that can be automated for any site portfolio.


Put AI-Access Data to Work

A publisher RevOps lead overseeing multiple content brands needs assurance that each property robots.txt policy is current, correctly formed, and covers the crawlers the editorial team intends to block. One misconfigured rule — or a rule added years ago that no longer covers all the relevant tokens — can undo a stated AI-access policy.

US Tech Automations automates this audit continuously: agentic workflows fetch robots.txt, parse per-bot rules against a defined crawler token list, flag malformed or missing directives, and route alerts to the right team. No manual checking, no stale policy audits. Explore the agentic workflow platform to add AI-access policy monitoring to your publisher operations.


Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 13, 2026 (snapshot sha 741353c4304216ee).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “How Many Sites Block Applebot-Extended? robots.txt Data.” https://ustechautomations.com/resources/blog/how-many-sites-block-applebot-extended-2026

Sealed snapshot sha256: 741353c4304216ee

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.