Research & Data

Do Travel Sites Block AI Crawlers? Sealed robots.txt Data

Jun 13, 2026

Travel sites occupy a distinctive position in the AI-crawling debate. Booking platforms, hotel chains, and review aggregators collectively hold some of the most commercially valuable content on the web: user reviews, price data, destination guides, and accommodation details that AI assistants actively seek to surface. Yet when US Tech Automations Research checked the robots.txt files of 9 prominent Travel sites in June 2026, the majority remained open to AI crawlers.

3 of 9 Travel sites block at least one AI crawler — a 33.3% rate.

That places Travel in sixth position among the 10 categories tracked in this Closing Web edition, below the corpus-wide 44.9% average (48 of 107 sites) but above the Finance, Retail, Education, and Government categories. The split within Travel is sharp: review-heavy properties restrict crawlers, while booking and hospitality platforms do not.

All data in this report comes from a sealed snapshot of public robots.txt files. Nothing is estimated, modeled, or extrapolated. Every figure is a direct verbatim count from the snapshot sealed June 13, 2026 (sha 741353c4304216ee).

What the Travel Data Shows

All 9 Travel sites in this corpus returned a parseable robots.txt file — every site in the corpus, the cleanest response profile across the categories checked. No Travel site in this corpus is missing a robots.txt. Of those 9, exactly 3 impose any AI-crawler block.

MetricCount
Travel sites checked9
Sites with parseable robots.txt9
Sites blocking at least one AI crawler3
Block rate33.3%
Sites with no robots.txtnone

The 3 blockers are tripadvisor.com, yelp.com, and lonelyplanet.com. The 6 non-blockers are expedia.com, booking.com, airbnb.com, kayak.com, marriott.com, and hilton.com.

All 9 Travel sites checked returned a parseable robots.txt — the highest response completeness of any category.

The clean robots.txt return rate is itself informative: every site in this corpus has made an active, machine-readable statement about crawl policy. The question is what that policy says.

The Blockers: tripadvisor.com, yelp.com, lonelyplanet.com

The 3 blocking sites share a content profile: user-generated reviews and editorial travel writing. tripadvisor.com and yelp.com are two of the largest repositories of user-written reviews on the internet. lonelyplanet.com publishes destination guides produced by travel writers and editorial staff.

For review and editorial platforms, the concern about AI-crawler access is legible. User reviews represent a crowdsourced knowledge base built over years. When AI systems train on or retrieve that content, they can synthesize answers about hotels or restaurants without the user ever visiting the source site. That reduces the platform's direct traffic and weakens the commercial case for hosting the reviews in the first place.

lonelyplanet.com's blocking posture is consistent with its editorial model. Destination guides represent significant writer labor; allowing AI systems to freely ingest that content for training or summarization is a meaningful policy question for any editorial travel brand.

tripadvisor.com, yelp.com, and lonelyplanet.com all restrict at least one AI crawler.

The specific bots restricted by each site are not broken out per-site in this sealed dataset — only the binary "blocks any" count is sealed. What the data does confirm is that all 3 chose to restrict, while the 6 non-blockers chose not to.

The Non-Blockers: Booking Platforms and Hotel Chains

The 6 non-blocking sites span booking platforms (expedia.com, booking.com, kayak.com), home-rental (airbnb.com), and major hotel chains (marriott.com, hilton.com). Their robots.txt files impose no restriction on known AI-crawler bot strings.

expedia.com is the only Travel site in this corpus with an llms.txt file — a voluntary AI-access declaration separate from robots.txt.

The open posture of booking platforms is worth examining. These sites exist to match users with reservations; their public-facing content is largely structured listings — prices, availability windows, property descriptions — rather than original editorial. Restricting AI crawlers from that content may offer less marginal protection than it would for a review or editorial property.

airbnb.com, marriott.com, and hilton.com also operate behind login and booking flows for their most sensitive data. robots.txt governs what automated systems see on the public-facing site; booking-flow data, guest records, and pricing algorithms are protected by authentication rather than crawl-policy files.

The decision not to block is not necessarily passive. For a booking platform, being accessible to AI assistants that surface travel recommendations may represent upside — if users ask an AI where to book a hotel and the platform is in the training set, that could generate referral-adjacent value. The sealed data records the policy choice; motivation is inferred.

Cross-Category Rankings

Travel blocks AI crawlers at 33.3%, below the 44.9% corpus average.

Travel ranks sixth of 10 categories. At 33.3%, it sits below the corpus-wide 44.9% average but above Finance, Retail, Education, and Government.

CategorySites CheckedWith robots.txtAny BlockerBlock Rate
News20171482.4%
Tech1513969.2%
Entertainment99666.7%
Reference1411654.5%
Social1010440%
Travel99333.3%
Finance1211218.2%
Retail1512216.7%
Education97114.3%
Government98112.5%

The category split is revealing. The highest-blocking categories — News (82.4%), Tech (69.2%), Entertainment (66.7%) — are dominated by editorial and content-creation platforms. Travel sits in a middle band, split between editorial-review properties that block and booking-transaction platforms that do not.

News sites lead the corpus at 82.4%, reflecting how strongly journalism and publishing have moved to protect content from AI training. Travel at 33.3% reflects a more divided sector.

Corpus-Wide Operator Leaderboard (All 107 Sites)

These counts are corpus-wide — the most-blocked AI operators across all 107 parseable sites, not Travel-specific.

AI OperatorSites Blocking (of 107)
Common Crawl40
Anthropic39
ByteDance37
OpenAI35
Meta35
Apple31
Diffbot30
Perplexity29
Cohere27
Google25
Amazon22
Mistral12

Common Crawl leads with 40 blocks across 107 sites. Anthropic follows at 39. Across the 12 operators tracked, Common Crawl faces 40 blocks, Anthropic 39, and ByteDance 37. These are global figures that span all categories — a single Travel site that blocks any of these operators contributes to these totals.

Anthropic is blocked by 39 of 107 sites corpus-wide.

Only 3 of 9 Travel sites block AI crawlers.

Across all 107 sites in the corpus, 48 block at least one AI crawler — 44.9%.

Travel at 33.3% sits below that line. The 3 Travel blockers represent a subset of the 48 total cross-corpus blockers.

Methodology

US Tech Automations Research fetched the robots.txt file for each of the 122 sites in the Closing Web corpus on June 13, 2026. Responses were categorized as parseable (returned a parseable robots.txt file with valid syntax), absent, or error. For the 107 parseable responses, we checked for 21 known AI-crawler bot strings across 12 operators. A site counts as "blocking" if any Disallow directive covers "/" under any AI-crawler user-agent.

The snapshot is point-in-time and sealed — nothing is estimated, modeled, or extrapolated. The snapshot is sealed at sha 741353c4304216ee. All figures are verbatim counts. The llms.txt entry for expedia.com was recorded as a separate boolean and does not affect the blocking count.

Who This Is For

This report is relevant for:

  • SEO and content teams at travel publishers and booking platforms tracking AI-access posture across competitors

  • Product leads at travel review platforms evaluating whether to follow tripadvisor.com and yelp.com

  • Data and retrieval teams at AI companies monitoring which travel sources restrict crawler access

  • Competitive intelligence teams mapping robots.txt policy trends in the travel sector

Whether the interest is protecting content or maintaining AI accessibility, understanding the current posture of named peers is the starting point.

Automating AI-Access Monitoring in Travel

The 3 blockers in Travel — tripadvisor.com, yelp.com, lonelyplanet.com — have made an active robots.txt policy choice. The 6 non-blockers have made an equally active one. Both choices can change without announcement; robots.txt files are updated at the webmaster level and rarely publicized.

For a competitive-intelligence or SEO team tracking travel-sector AI policy, the question is not just "what is the policy today" but "when did it change." Manual re-checks across a domain watchlist do not scale. US Tech Automations builds workflows that schedule robots.txt fetches, parse changes in Disallow directives for specific bot strings, and route alerts to the people who need them.

The same automation applies across all 10 categories in this corpus. A team monitoring Finance sites can use the same workflow pattern — as documented in the Finance report — to stay current on policy shifts without manually re-checking dozens of URLs.

For an SEO director at a travel booking platform, knowing the day a competitor changes its stance on GPTBot or ClaudeBot is a concrete strategic signal that can inform content positioning and AI-partnership decisions.

Key Takeaways

  • All 9 Travel sites in this corpus returned a parseable robots.txt — every site in the corpus, with no missing files.

  • 3 of those 9 block at least one AI crawler: tripadvisor.com, yelp.com, and lonelyplanet.com.

  • The 6 non-blockers are expedia.com, booking.com, airbnb.com, kayak.com, marriott.com, and hilton.com.

  • Travel ranks 6th of 10 categories at 33.3% — below the 44.9% corpus-wide average.

  • expedia.com is the only Travel site with an llms.txt file.

  • The blocker/non-blocker split maps closely onto editorial-review vs. booking-transaction content models.

Review and editorial travel properties block AI crawlers; booking platforms and hotel chains do not — as of June 2026.

FAQ

Q: Why do tripadvisor.com and yelp.com block AI crawlers while booking platforms do not?

A: The sealed data records the policy choice, not the stated rationale. What the data shows is a pattern: sites whose primary value is user-generated or editorial content block AI crawlers, while sites whose primary value is transactional listings do not. Review content represents an aggregated corpus that AI systems can summarize into direct answers; that makes the policy calculus different from a hotel booking flow where the valuable data is behind authentication.

Q: Does blocking AI crawlers in robots.txt guarantee those crawlers stay out?

A: No. robots.txt is an honor-system protocol. Compliant crawlers — operated by companies that have agreed to respect the standard — will follow Disallow directives. Non-compliant crawlers are unaffected. Stronger protections (authentication, rate limiting, IP blocking) provide technical enforcement that robots.txt does not.

Q: What is expedia.com signaling with its llms.txt file?

A: llms.txt is a voluntary, emerging convention where a site publishes a plain-text file describing its content and preferred AI-access terms. The presence of llms.txt alongside a non-blocking robots.txt suggests expedia.com wants to actively engage with AI operators rather than restrict them — communicating what the site contains and how it prefers that content to be used. The convention is not standardized or enforced; it is advisory.

Q: How does Travel compare to Entertainment in this corpus?

A: Entertainment sits at 66.7% (6 of 9 parseable sites blocking), placing it third overall. See the Entertainment report for a detailed breakdown. Travel at 33.3% has a similar sample size (9 sites with parseable robots.txt) but a very different outcome — Travel has 3 blockers where Entertainment has 6. The difference reflects Entertainment sites' heavy investment in original content: streaming platforms, music publishers, and entertainment trade press all have strong incentives to control AI access to their catalogs and coverage.

Q: Will these numbers change over time?

A: This report reflects a point-in-time snapshot sealed June 13, 2026. robots.txt files are not static; webmasters update them in response to legal developments, policy shifts, and commercial negotiations with AI operators. The sealed snapshot guarantees the figures in this report are accurate as of that date. Future editions of the Closing Web series will track changes over time.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 13, 2026 (snapshot sha 741353c4304216ee).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Travel Sites Block AI Crawlers? Sealed robots.txt Data.” https://ustechautomations.com/resources/blog/do-travel-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 741353c4304216ee

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.