Research & Data

Who Blocks PerplexityBot? 29 of 107 Top Sites Do

Jun 13, 2026

Perplexity's crawlers are blocked by 29 of 107 prominent sites in our June 2026 robots.txt snapshot — a 27.1% refusal rate. Perplexity enters this study as one of the smaller pure-AI-search operators, yet it still draws exclusions from a significant cross-section of publishers. The company operates two tracked user-agents — PerplexityBot and Perplexity-User — each of which generates its own block signal, making the per-bot numbers an interesting window into how sites think about different modes of AI access.

"Blocking" means a site's robots.txt explicitly names one of Perplexity's user-agents with a Disallow: / covering a meaningful section of the site. The operator figure of 29 is the deduplicated count of domains that block at least one Perplexity user-agent; the per-bot counts are separate and can exceed 29 in aggregate when a single site blocks both crawlers.

29 of 107 sites with parseable robots.txt block PerplexityBot as of June 13, 2026.

All figures are verbatim counts from public robots.txt files fetched and sealed point-in-time on June 13, 2026, across a curated set of 122 prominent sites; 107 returned a parseable robots.txt and percentages are over those 107. robots.txt is an honor-system standard — it measures the site operator's stated intent, not a technical firewall. These numbers will not change as sites later edit their files.

How Often Perplexity Is Refused

Perplexity runs two crawlers in this dataset: PerplexityBot (its primary indexing agent) and Perplexity-User (which emulates a human user session for real-time retrieval). PerplexityBot is blocked at 29 sites — equal to the operator total — while Perplexity-User is blocked at 17 sites. The gap between those two figures tells a specific story: some site operators are comfortable allowing user-mode access but want to restrict indexing-mode crawls, or their robots.txt templates have not been updated to include the Perplexity-User string.

User-Agent	Sites Blocking
PerplexityBot	29
Perplexity-User	17
Operator total (deduplicated)	29

The operator total matching the PerplexityBot count means every site that blocks Perplexity does so via PerplexityBot; there are no sites in this corpus that block Perplexity-User alone without also blocking PerplexityBot.

"29 of 107 prominent sites — 27.1% — block PerplexityBot, with 17 of those 29 also blocking Perplexity-User, as recorded in our point-in-time June 13, 2026 snapshot."

The dual-crawler structure is worth understanding if your team manages a robots.txt policy. Blocking PerplexityBot addresses standard web crawling; blocking Perplexity-User addresses the retrieval mode that powers Perplexity's live answer engine. A site that blocks only PerplexityBot may still have its content surfaced in Perplexity answers if the Perplexity-User string is not also excluded. The 12 sites that block PerplexityBot but not Perplexity-User may be unaware of this distinction.

17 of 29 PerplexityBot blockers also block Perplexity-User; 12 do not.

"The 12-site gap between PerplexityBot blocks (29) and Perplexity-User blocks (17) represents policies that address indexing but leave real-time retrieval open — a split not seen in operators with a single crawler string."

For a comparison with operators that use a single tracked crawler, see the Apple Applebot-Extended report (31 sites, single bot) and the ByteDance Bytespider data (37 sites, single bot).

Which Industries Block Perplexity

News dominates again at 12 blocks — the highest single-category count in Perplexity's profile. Tech follows at 6, Entertainment at 4. The distribution is slightly tighter across the remaining categories than for ByteDance or Meta, reflecting Perplexity's smaller overall footprint.

Category	Sites Blocking Perplexity Crawlers
News	12
Tech	6
Entertainment	4
Reference	2
Retail	2
Travel	1
Social	1
Government	1

The News category's 12-site count is proportionally high relative to the 29-site operator total: 12 of the 29 blockers are News properties. News publishers are Perplexity's most consistent opponents across robots.txt — which makes strategic sense. Perplexity's core product is an AI answer engine that synthesizes news and reference content in real time; publishers whose revenue depends on original reporting have direct competitive reasons to exclude it.

Tech media at 6 sites — Wired, Ars Technica, CNET, ZDNet, Mashable, The Verge — reflects a pattern seen across all operators in this study. These publications cover AI closely and tend to adopt explicit AI-crawler policies faster than publishers in less tech-adjacent categories.

Government (congress.gov, 1 site) and Social (linkedin.com, 1 site) make smaller appearances, consistent with their presence across other operator datasets. Reference at 2 — Quora and Investopedia — adds a finance-adjacent signal: Investopedia is the only explicitly financial-reference site to appear in Perplexity's named-blocker list.

Finance appears in some other operator profiles but not in Perplexity's category breakdown. The Travel category contributes 1 site (Yelp), and the Retail category contributes 2 (including Amazon, a large commerce platform that also appears in several sibling reports).

The Named Sites That Block Perplexity

The following 12 sites are drawn from the 29 total Perplexity blockers, selected for the highest headline-crawler block counts.

Site	Category	Headline Crawlers Blocked (of 9)
bbc.com	News	9
bloomberg.com	News	9
usatoday.com	News	9
nytimes.com	News	8
cnn.com	News	8
wired.com	Tech	8
arstechnica.com	Tech	8
ebay.com	Retail	8
rollingstone.com	Entertainment	8
congress.gov	Government	8
amazon.com	Retail	7
linkedin.com	Social	7

BBC, Bloomberg, and USA Today score 9 of 9 — maximum AI-exclusion breadth — confirming that their Perplexity block is one component of a total-AI-access policy. The presence of amazon.com (7) is notable: Amazon's 7-headline-crawler block score suggests a mid-tier policy posture rather than the maximum-exclusion stance of the top-scoring publishers.

News (12) accounts for 12 of 29 total Perplexity blocks — the top category.

The full 29-site list also includes: Forbes, The Atlantic, ZDNet, Mashable, variety.com, hollywoodreporter.com, billboard.com, washingtonpost.com, theguardian.com, newsweek.com, vox.com, theverge.com, apnews.com, yelp.com, quora.com, and investopedia.com. The presence of AP News (apnews.com, 6 headline blocks) and Investopedia (4 headline blocks) makes Perplexity's named-blocker set slightly distinct from some other operators. For a full named-site comparison with another news-heavy operator, see the Anthropic ClaudeBot report.

Methodology and Data Integrity

The Closing Web snapshot was assembled by fetching the public /robots.txt file from each of the 122 curated domains on June 13, 2026. Each file was parsed into a structured user-agent-to-disallow map and sealed under snapshot sha 741353c4304216ee. Only sites that returned an HTTP 200 response with a parseable robots.txt were included in the denominator; 107 of 122 qualified.

Every figure in this report is a verbatim count drawn from that sealed snapshot. nothing is estimated, modeled, or extrapolated. Operator block counts are deduplicated at the domain level so that a site blocking both PerplexityBot and Perplexity-User counts as 1 toward the 29-site operator total. The per-bot counts (29 and 17) are not additive and should not be summed to produce a headline figure.

The snapshot tracks 21 bots and 12 operators total. The 48 sites (44.9%) that block at least one AI crawler represent the ceiling above which Perplexity's 27.1% sits. The 9 starred sites (8.4%) are the most prominent properties in the corpus; all appear in the study's aggregate blocker lists. The 20 sites with an llms.txt file (18.7%) are a separate signal not conflated here with robots.txt exclusion.

Put This Data to Work

The PerplexityBot vs. Perplexity-User gap is the most actionable data point in this report for teams that manage content access policy. A site that blocks only PerplexityBot while leaving Perplexity-User unrestricted is signaling either an incomplete policy or a deliberate choice to allow real-time retrieval while blocking batch training. Understanding which side of that line your competitors sit on is valuable competitive intelligence.

US Tech Automations builds monitoring pipelines that parse this per-user-agent granularity automatically. A nightly fetch of /robots.txt across a site list, combined with a diff engine that flags user-agent-level changes, gives a content or SEO lead a daily report on which operators any given publisher is restricting — and whether they changed the Perplexity-User directive independently of PerplexityBot. That level of signal is not available from manual sampling.

For data teams building retrieval-augmented generation (RAG) systems, the Perplexity block list is a useful proxy for content that may be difficult to license or access via API. News publishers blocking Perplexity tend to be the same ones with active licensing negotiations or paywalls. A scoring layer that incorporates robots.txt posture as one signal in a broader content-access risk model — pairing robots.txt data with paywall detection, rate-limit signals, and known licensing agreements — is a natural extension of this dataset.

If your team monitors AI-access policy across the 12 operators in this study, the Meta AI crawler report is a useful contrast case for another dual-user-agent operator, and the OpenAI GPTBot data provides the highest-block-rate benchmark in the corpus.

Frequently Asked Questions

Q: Does blocking PerplexityBot stop my content from appearing in Perplexity answers?

A: Blocking PerplexityBot prevents Perplexity's standard indexing crawler from fetching your content. However, Perplexity also operates Perplexity-User for real-time retrieval. A complete exclusion requires blocking both user-agents. This data shows that 12 of the 29 sites blocking PerplexityBot do not also block Perplexity-User.

Q: What is the difference between PerplexityBot and Perplexity-User?

A: PerplexityBot is Perplexity's standard web crawler, similar in behavior to a search engine spider. Perplexity-User emulates a browser user-agent and is used for real-time content retrieval to power live answers. The two serve different functions in Perplexity's product architecture.

Q: Does blocking Perplexity crawlers hurt my search SEO?

A: No. Google, Bing, and Apple each use separate, independent user-agent strings for their search indexing crawlers. Blocking PerplexityBot or Perplexity-User has no known effect on Google Search, Bing, or any other major organic search ranking.

Q: Why do News sites dominate Perplexity's block list?

A: Perplexity's core product — an AI answer engine that synthesizes and cites published content — directly competes with the news publishers it indexes. Publishers like the New York Times, Washington Post, and AP News have financial and editorial incentives to prevent their content from being absorbed and summarized without compensation or traffic referral. 12 of the 29 Perplexity blocks fall in the News category.

Q: How current is this data?

A: All figures come from a single fetch on June 13, 2026, sealed under snapshot sha 741353c4304216ee. robots.txt files can change at any time after that date; this report reflects only the state captured on June 13, 2026.

Key Takeaways

29 of 107 sites with parseable robots.txt (27.1%) block at least one Perplexity crawler as of June 13, 2026.
PerplexityBot is blocked at 29 sites; Perplexity-User at 17 — the gap means 12 sites restrict indexing but not real-time retrieval.
News (12 sites) is proportionally Perplexity's most concentrated blocker category, at 12 of the 29-site operator total.
AP News and Investopedia are distinctive named blockers not commonly seen in other operators' lists at the same frequency.
48 of the 107-site corpus block at least one AI crawler (44.9%); Perplexity's 27.1% places it in the lower tier of the 12 tracked operators by block rate.
The 12-site gap between PerplexityBot and Perplexity-User blocks is the most actionable policy signal in this dataset for site operators reviewing their own robots.txt coverage.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 13, 2026 (snapshot sha 741353c4304216ee).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Who Blocks PerplexityBot? 29 of 107 Top Sites Do.” https://ustechautomations.com/resources/blog/who-blocks-perplexitybot-2026

Sealed snapshot sha256: 741353c4304216ee

Machine-readable data: CSV · JSON · All research & methodology