Who Blocks Apple's Applebot-Extended? 31 of 107 Sites Do
Apple's AI crawler, Applebot-Extended, is blocked by 31 of 107 prominent sites in our June 2026 robots.txt snapshot — a 29% refusal rate. That is lower than ByteDance's 34.6% and Meta's 32.7% from the same dataset, but still places Apple in the upper tier of restricted operators. The figure is notable given Apple's consumer-brand reputation: even a company with deep product trust faces broad content-publisher resistance when it fields an AI training crawler.
"Blocking" here means a site's robots.txt names Applebot-Extended with a Disallow: / directive covering a significant portion of the site. Apple operates two Applebot variants: the standard Applebot (used for Spotlight and Safari search) and Applebot-Extended (used for AI training). This study tracks Applebot-Extended specifically because it is the AI-facing crawler. Blocking it does not affect standard Applebot indexing.
Applebot-Extended is blocked at 31 of 107 sites — a 29% refusal rate.
All figures are verbatim counts from public robots.txt files fetched and sealed point-in-time on June 13, 2026, across a curated set of 122 prominent sites; 107 returned a parseable robots.txt and percentages are over those 107. robots.txt is an honor-system standard — it measures the site operator's stated intent, not a technical firewall. These numbers will not change as sites later edit their files.
How Often Apple Is Refused
Apple runs a single AI-specific crawler tracked in this study: Applebot-Extended. All 31 blocks come from this one user-agent, so the operator count and the per-bot count are identical. There is no secondary crawler muddying the numbers.
| User-Agent | Sites Blocking |
|---|---|
| Applebot-Extended | 31 |
That clean single-bot picture makes Apple's data straightforward to interpret. When a site blocks Applebot-Extended, the intent is unambiguous: the site wants to exclude Apple's AI from its content while remaining accessible to Apple's standard search and Siri crawlers. This targeted precision distinguishes Apple's blocker profile from operators like Meta, where publishers must navigate multiple user-agent strings.
"31 of 107 prominent sites — 29% — explicitly exclude Applebot-Extended from their content as of June 13, 2026, according to our sealed robots.txt snapshot."
In the broader corpus context: 48 of 107 sites (44.9%) block at least one of the 21 tracked AI crawlers. Apple's 29% sits below the corpus midpoint, but 31 sites across 7 content categories represent a deliberate, widespread policy response to Apple's AI data-collection ambitions.
31 Applebot-Extended blocks span 7 content categories across the 107-site corpus.
"News alone accounts for 12 of the 31 Applebot-Extended blocks — the single largest category share of any operator in this snapshot, relative to total block count."
For a direct peer comparison, see the Meta AI crawler report (35 sites, 32.7%) and the ByteDance Bytespider data (37 sites, 34.6%) from this same sealed dataset.
Which Industries Block Apple
News leads with 12 blocks, making it the single largest category by a wide margin. Tech follows with 8, Entertainment contributes 4, and Reference adds 3 — a category distribution that differs slightly from ByteDance and Meta in giving Reference a larger relative share.
| Category | Sites Blocking Applebot-Extended |
|---|---|
| News | 12 |
| Tech | 8 |
| Entertainment | 4 |
| Reference | 3 |
| Social | 2 |
| Retail | 1 |
| Travel | 1 |
The News-heavy pattern mirrors every other major operator in this study. Publishers such as the Washington Post, The Guardian, and AP News treat their archives as licensable assets, and they have been public about resisting AI training use without compensation or consent.
Reference at 3 sites (Healthline, Quora, Dictionary.com) is an elevated share compared with many other operators. Reference sites contain definitional, encyclopedic, or health content that is particularly high-value for AI training. That value proposition likely prompts proactive exclusion even from publishers with lower overall AI-block rates.
Notably absent from Apple's blocker set, compared to Meta and ByteDance: Finance and Government categories contribute no Apple blockers at all. Congress.gov and Fool.com, which block those other operators, do not appear in Apple's 31-site list. That gap may reflect timing — Apple's AI crawler is newer in the market — or a deliberate assessment that Apple's AI products pose lower immediate risk to those particular content categories.
The Social category contributes 2 sites and Retail just 1, consistent with a narrower footprint in commerce and community platforms relative to operators like Meta, where the social connection creates more direct competitive tension.
The Named Sites That Block Apple
The 12 representative sites below are drawn from the 31 total Applebot-Extended blockers, ranked by headline-crawler block count — a proxy for how comprehensive each site's AI-exclusion policy is.
| Site | Category | Headline Crawlers Blocked (of 9) |
|---|---|---|
| bbc.com | News | 9 |
| bloomberg.com | News | 9 |
| usatoday.com | News | 9 |
| nytimes.com | News | 8 |
| cnn.com | News | 8 |
| wired.com | Tech | 8 |
| arstechnica.com | Tech | 8 |
| ebay.com | Retail | 8 |
| rollingstone.com | Entertainment | 8 |
| variety.com | Entertainment | 8 |
| healthline.com | Reference | 7 |
| tripadvisor.com | Travel | 7 |
BBC, Bloomberg, and USA Today again show maximum scores of 9. Their Applebot-Extended block is one component of a total AI-exclusion policy. Sites with 8 — The New York Times, CNN, Wired, Ars Technica, eBay, Rolling Stone, Variety — follow a similarly broad playbook.
News (12) and Tech (8) produce 20 of 31 total Applebot-Extended blocks.
The full 31-site list also includes: Forbes, The Atlantic, ZDNet, Mashable, hollywoodreporter.com, billboard.com, washingtonpost.com, theguardian.com, newsweek.com, vox.com, theverge.com, apnews.com, techcrunch.com, tumblr.com, medium.com, quora.com, venturebeat.com, and dictionary.com. The presence of AP News (apnews.com, 6 headline blocks) and Dictionary.com (3 headline blocks) are two of the more distinctive entries in Apple's named-blocker set compared with other operators. For the named-site lists of other operators in the same snapshot, see the Anthropic ClaudeBot report.
Methodology and Data Integrity
The Closing Web snapshot was assembled by fetching the public /robots.txt file from each of the 122 curated domains on June 13, 2026. Each file was parsed into a structured user-agent-to-disallow map and sealed under snapshot sha 741353c4304216ee. Only sites that returned an HTTP 200 response with a parseable robots.txt were included in the denominator; 107 of 122 qualified.
Every figure in this report is a verbatim count drawn from that sealed snapshot. nothing is estimated, modeled, or extrapolated. Operator block counts are deduplicated at the domain level. Because Apple operates a single tracked AI crawler, no deduplication was required — the 31-site operator count equals the Applebot-Extended per-bot count exactly.
The snapshot also tracks 21 bots across 12 operators, with 48 of 107 sites (44.9%) blocking at least one. The 9 starred sites (8.4% of the 107-site base) represent the most prominent properties in the corpus; all 9 appear somewhere in the study's blocker lists, and the majority appear in Apple's 31-site set. The 20 sites that have published an llms.txt file (18.7%) represent a separate, voluntary disclosure layer that this report does not conflate with robots.txt blocking.
Put This Data to Work
Applebot-Extended's distinct user-agent string makes it one of the easier AI crawlers to monitor in a robots.txt pipeline. Because Apple publishes a clear differentiation between its search crawler and its AI crawler, a well-configured watch process can separate the two signals — tracking whether a site has updated to include Applebot-Extended after originally only blocking generic Applebot.
US Tech Automations builds exactly that kind of monitoring pipeline. The typical setup is a nightly scheduled job that fetches /robots.txt from a configured site list, parses user-agent blocks into a structured diff, and alerts a Slack channel or webhook when Applebot-Extended rules are added, removed, or modified. For a content team or SEO lead, that means you know within 24 hours if a publisher you partner with — or compete against — changes its Apple AI policy.
For retrieval-pipeline engineers building LLM-powered products, the Applebot-Extended block list is a useful negative signal: a site that already restricts Apple is likely running an AI-access review process and may soon add or tighten rules for other operators. A correlation layer that models this "bellwether" pattern across all 12 operators provides advance warning of policy tightening.
The Reference category outlier (3 blocks, higher than most operators proportionally) is actionable for any team in health tech, edtech, or consumer information. If your content competes with Healthline, Quora, or Dictionary.com in search, understanding how those publishers position their robots.txt toward AI crawlers informs your own positioning strategy. See the Perplexity crawler report for comparison — another operator where Reference sites punch above their numeric weight.
Frequently Asked Questions
Q: Does blocking Applebot-Extended affect Apple Search or Siri results?
A: No. Apple distinguishes between Applebot (used for Spotlight Search, Safari Suggestions, and Siri) and Applebot-Extended (used for AI training). Blocking Applebot-Extended should not affect standard Applebot indexing or your content's presence in Apple search features.
Q: Does blocking Applebot-Extended actually prevent Apple from training on my content?
A: robots.txt is an honor-system signal. A correctly configured Applebot-Extended exclusion tells Apple's compliant crawler to stay out, but it is not a technical barrier. The file measures stated intent; Apple's crawler infrastructure is expected to respect it.
Q: Why does Reference rank higher for Apple than for some other operators?
A: At 3 blocks out of 31, Reference is a small but notable slice of Apple's blocker set. The specific sites — Healthline, Quora, and Dictionary.com — contain high-utility definitional and health content that holds particular value for AI training. Publishers in those niches appear to be tracking AI crawler developments carefully.
Q: Is 29% a low block rate compared to other AI operators?
A: It is lower than ByteDance (34.6%) and Meta (32.7%) in this same snapshot, and well below the "any AI block" rate of 44.9% across the full corpus. However, 31 blocked sites across 7 categories is still a broad and deliberate pattern — not a marginal figure.
Q: How current is this data?
A: All figures come from a single fetch on June 13, 2026, sealed under snapshot sha 741353c4304216ee. robots.txt files can change at any time after that date; this report reflects only the state on June 13, 2026.
Key Takeaways
31 of 107 sites with parseable robots.txt (29%) explicitly block Applebot-Extended as of June 13, 2026.
Apple runs a single tracked AI crawler, so the operator count and per-bot count are identical at 31.
News (12 sites) and Tech (8 sites) account for more than two-thirds of all Applebot-Extended blocks.
Reference (3 sites) is proportionally elevated versus other operators, reflecting the high AI-training value of definitional and health content.
Apple's 29% block rate is lower than ByteDance (34.6%) and Meta (32.7%) in the same snapshot, but still spans 7 content categories.
Finance and Government are absent from Apple's 31-site list — a notable contrast with ByteDance and Meta at the same snapshot date.
Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 13, 2026 (snapshot sha 741353c4304216ee).
Get this data as a daily feed
The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.
Prefer to talk first? Contact us.
Cite this report
US Tech Automations Research, 2026-06 edition. “Who Blocks Apple's Applebot-Extended? 31 of 107 Sites Do.” https://ustechautomations.com/resources/blog/who-blocks-apple-applebot-extended-2026
Sealed snapshot sha256: 741353c4304216ee
Machine-readable data: CSV · JSON · All research & methodology
About the Author

Helping businesses leverage automation for operational efficiency.