Do Wedding Sites Block AI Crawlers? 4 of 8 Do
Wedding content sits at an unusual intersection of high commercial value and deeply personal user intent — and the robots.txt data from the June 2026 Closing Web edition reflects that tension directly. Of 10 Wedding sites included in the snapshot, 8 returned a parseable robots.txt file. Of those 8, exactly 4 block at least one AI crawler, landing at a 50% block rate. That puts the Wedding category right at the corpus midpoint, above the overall average of 39.3% across all 354 sites with parseable robots.txt files in this edition.
A robots.txt file is a publicly accessible text document that site operators use to communicate crawl-access policies to automated bots — including AI training agents. It is an honor-system standard: compliant bots follow the directives, but the mechanism is advisory, not technically enforced. What it does reveal is a site operator's stated intent, which is exactly what this sealed-snapshot dataset captures.
4 of 8 Wedding sites block at least one AI crawler.
Wedding sites post a 50% AI-crawler block rate.
Corpus-wide, 139 of 354 sites block at least one AI crawler.
Key Takeaways
The Wedding category is evenly split at the policy level, with half its parseable-robots.txt sites actively restricting AI crawlers and the other half allowing them.
4 of 8 Wedding sites with a parseable robots.txt block at least one AI crawler.
The Wedding block rate of 50% is above the corpus-wide rate of 39.3% across all 354 sites.
Across all 354 corpus sites, CCBot is the most-blocked AI bot — blocked by 109 sites at a rate of 30.8%.
The split between blockers and allowers in the Wedding category is not random. It maps closely to the underlying business model of each property. The blocking sites tend to be editorial-driven marketplaces and media brands; the allowing sites lean toward platform-model vendors and inspiration aggregators.
"4 of 8 Wedding sites with parseable robots.txt files block at least one AI crawler, placing the category at a 50% block rate."
"Across all 354 sites in the June 2026 Closing Web snapshot, 139 block at least one AI crawler — a 39.3% corpus-wide rate that Wedding exceeds."
The Blocking Sites and the Allowers
The four Wedding sites actively blocking AI crawlers through robots.txt directives are weddingwire.com, brides.com, greenweddingshoes.com, and offbeatbride.com. Three of these are established editorial and vendor-directory properties that have built substantial libraries of reviews, vendor listings, and planning content over many years. The fourth — offbeatbride.com — is a long-standing editorial brand with a distinctive voice and a loyal readership. Both types of properties have strong reason to restrict AI training access: their value lies in curated, searchable content that competes directly with the outputs AI models produce.
The four Wedding sites with parseable robots.txt files that allow every crawler are zola.com, bridalguide.com, junebugweddings.com, and stylemepretty.com. Zola is primarily a registry and planning tool; its open posture may reflect a business model less reliant on organic editorial search traffic. Bridalguide, junebugweddings, and stylemepretty are inspiration-focused properties; their robots.txt files impose no AI-crawler restrictions at this snapshot date.
The remaining two sites — theknot.com and weddingchicks.com — returned no parseable robots.txt at the time of the snapshot. This does not indicate a policy decision one way or the other; it simply means no readable robots.txt was available. The dataset records what is present, not what is absent.
weddingwire.com, brides.com, greenweddingshoes.com, and offbeatbride.com are the four Wedding sites blocking at least one AI crawler.
The per-site picture across the Wedding category is summarized below.
| Site | Has robots.txt | Blocks AI Crawler |
|---|---|---|
| weddingwire.com | Yes | Yes |
| brides.com | Yes | Yes |
| greenweddingshoes.com | Yes | Yes |
| offbeatbride.com | Yes | Yes |
| zola.com | Yes | No |
| bridalguide.com | Yes | No |
| junebugweddings.com | Yes | No |
| stylemepretty.com | Yes | No |
| theknot.com | No | — |
| weddingchicks.com | No | — |
For comparison, the Outdoor category shows a 60% block rate among sites with parseable robots.txt — driven by similar dynamics around proprietary editorial and user-generated content. And Genealogy at 40% offers a parallel case of a category split along commercial vs. open-mission lines.
Where Wedding Lands in the 40-Category Ranking
The Wedding category at 50% block rate ranks above the corpus average but below the top tier of restrictive categories. Gaming (88.9%) and News (82.4%) anchor the high end; Streaming and Dating anchor the low end at 0%.
| Category | Sites Checked | With robots.txt | Any Blocker | Block Rate |
|---|---|---|---|---|
| Gaming | 9 | 9 | 8 | 88.9% |
| News | 20 | 17 | 14 | 82.4% |
| Food | 10 | 10 | 7 | 70% |
| Tech | 15 | 13 | 9 | 69.2% |
| Entertainment | 9 | 9 | 6 | 66.7% |
| Healthcare | 10 | 9 | 6 | 66.7% |
| Music | 10 | 9 | 6 | 66.7% |
| Parenting | 10 | 8 | 5 | 62.5% |
| Outdoors | 10 | 5 | 3 | 60% |
| Reference | 14 | 11 | 6 | 54.5% |
| Science | 10 | 10 | 5 | 50% |
| Wedding | 10 | 8 | 4 | 50% |
| Automotive | 10 | 9 | 4 | 44.4% |
| HomeGarden | 10 | 9 | 4 | 44.4% |
| Fashion | 9 | 7 | 3 | 42.9% |
| Social | 10 | 10 | 4 | 40% |
| Sports | 10 | 10 | 4 | 40% |
| Fitness | 10 | 10 | 4 | 40% |
| Photography | 10 | 10 | 4 | 40% |
| Genealogy | 10 | 10 | 4 | 40% |
| Jobs | 10 | 8 | 3 | 37.5% |
| Travel | 9 | 9 | 3 | 33.3% |
| Weather | 10 | 6 | 2 | 33.3% |
| Beauty | 10 | 6 | 2 | 33.3% |
| Legal | 10 | 7 | 2 | 28.6% |
| RealEstate | 10 | 7 | 2 | 28.6% |
| Pets | 10 | 7 | 2 | 28.6% |
| Crafts | 10 | 8 | 2 | 25% |
| Finance | 12 | 11 | 2 | 18.2% |
| Retail | 15 | 12 | 2 | 16.7% |
| Education | 9 | 7 | 1 | 14.3% |
| Government | 9 | 8 | 1 | 12.5% |
| Crypto | 9 | 8 | 1 | 12.5% |
| Books | 9 | 8 | 1 | 12.5% |
| Religion | 10 | 9 | 1 | 11.1% |
| Insurance | 10 | 9 | 1 | 11.1% |
| Productivity | 10 | 10 | 1 | 10% |
| Nonprofit | 10 | 6 | 0 | 0% |
| Streaming | 10 | 10 | 0 | 0% |
| Dating | 10 | 5 | 0 | 0% |
Wedding at 50% sits alongside Science at 50% and well above Finance (18.2%) and Retail (16.7%), both of which have large, complex sites with more ambiguous content-protection rationale. The fact that Wedding matches Science — a category with substantial academic and institutional content — suggests that wedding planning content has a higher perceived value for protection than many consumer categories.
Who Gets Disallowed — Corpus-Wide Bot Data
The bot-level blocking data covers all 354 sites with parseable robots.txt files, not just the Wedding category. These figures represent corpus-wide counts.
| Bot | Sites Blocking (of 354) | Block Rate |
|---|---|---|
| CCBot | 109 | 30.8% |
| ClaudeBot | 96 | 27.1% |
| GPTBot | 83 | 23.4% |
| Bytespider | 83 | 23.4% |
| Meta-ExternalAgent | 78 | 22% |
| Google-Extended | 76 | 21.5% |
| Applebot-Extended | 74 | 20.9% |
| PerplexityBot | 73 | 20.6% |
| Amazonbot | 64 | 18.1% |
CCBot is blocked by 109 of 354 corpus sites, making it the most-targeted bot. ClaudeBot follows at 96 sites. GPTBot and Bytespider are tied at 83. At the operator level across all 354 sites: Common Crawl faces blocks from 109 sites, Anthropic from 104, Meta from 89, and OpenAI from 87. The pattern across operators suggests that many blocking sites target several AI companies simultaneously rather than singling out one.
Reading the Sealed Numbers
This report reflects a point-in-time snapshot sealed June 14, 2026 — nothing is estimated, modeled, or extrapolated. Every count in this post is a verbatim figure from the sealed dataset identified by snapshot sha 27ca61d890a647db.
The methodology:
Crawl. Each site's robots.txt was fetched directly from its canonical location on June 14, 2026 as part of the 418-site, 40-category Closing Web edition.
Parse. Files were parsed for user-agent directives that name known AI crawlers using a fixed token list.
Classify. Each site was assigned to one of three states: no parseable robots.txt, parseable with at least one AI-crawler block directive, or parseable with no AI-crawler block directives.
Seal. The complete dataset was content-hashed and stored with sha 27ca61d890a647db for auditability.
The 40 categories span 418 sites total; 354 returned a parseable robots.txt; 139 of those block at least one AI crawler — a 39.3% corpus-wide rate.
Frequently Asked Questions
Q: How does the Wedding category compare to categories that deal with similarly personal content?
A: Wedding at 50% sits above Dating (0%), which shares the personal life-events space, and above Parenting (62.5%) in terms of absolute block count but below Parenting in rate. The contrast with Dating is striking: not a single Dating site in this snapshot blocks an AI crawler, while Wedding sits at 50%. The difference likely reflects the editorial and vendor-directory nature of Wedding content versus the matchmaking-platform model of Dating sites.
Q: What does it mean that theknot.com has no parseable robots.txt?
A: It means the snapshot did not find a readable robots.txt file at theknot.com on June 14, 2026. This report makes no claim about theknot.com allowing or blocking AI crawlers — only that no parseable robots.txt was present. Any other access controls the site may use are outside the scope of this dataset.
Q: Why would a wedding inspiration site choose to allow AI crawlers?
A: Platform and inspiration businesses that rely on user-generated content and broad distribution may view AI crawler access as neutral or positive — AI-generated results that cite their content may drive referral traffic. Properties with strong brand recognition and direct booking flows may be less concerned about AI training on their content than editorial businesses that depend on organic search for discovery.
Q: If a site blocks one AI crawler, does that mean it blocks all of them?
A: Not necessarily. Robots.txt files can target specific user-agent strings, and the directives for each bot are independently specified. A site that blocks CCBot may or may not block GPTBot. This dataset records whether each site blocks at least one AI crawler — individual bot-level counts are tracked at the corpus level, not per category, in this edition.
Put AI-Access Data to Work
The Wedding category findings are directly actionable for three types of organizations.
Wedding vendor marketplace operators need to track when direct competitors shift their AI-access posture. If weddingwire.com lifts its block on GPTBot, that is a meaningful competitive signal — either the market has shifted or a licensing deal was reached. The workflow: re-crawl the eight Wedding sites with parseable robots.txt weekly, alert the product and SEO team on any disallow-rule change, and compare against the sealed June 14 baseline from this snapshot.
Bridal media content strategists at editorial properties like brides.com and greenweddingshoes.com need to know when peer publications adjust their stance. If allowers like stylemepretty.com add blocking directives, that signals industry consensus is shifting. The monitoring workflow: scan all 10 Wedding sites weekly, flag any new bot name appearing in a disallow directive, and route the finding to the editorial director and legal team for licensing consideration. See also how Genealogy sites balance open access with content protection as a structural parallel.
AI developer relations teams at operators like Anthropic (ClaudeBot, blocked by 96 corpus sites) and OpenAI (GPTBot, blocked by 83 corpus sites) can use category-level data to prioritize partnership conversations. Wedding is a mid-blocking category above the corpus average — a useful reference point for content-licensing discussions with editorial properties in adjacent lifestyle verticals.
US Tech Automations automates this monitoring through scheduled robots.txt crawls, real-time change alerts, and an AI-access policy dashboard keyed to sealed baselines. Start tracking AI-access drift in your category at /platform/agentic-workflows.
Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 27ca61d890a647db).
Get this data as a daily feed
The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.
Prefer to talk first? Contact us.
Cite this report
US Tech Automations Research, 2026-06 edition. “Do Wedding Sites Block AI Crawlers? 4 of 8 Do.” https://ustechautomations.com/resources/blog/do-wedding-sites-block-ai-crawlers-2026
Sealed snapshot sha256: 27ca61d890a647db
Machine-readable data: CSV · JSON · All research & methodology
About the Author

Helping businesses leverage automation for operational efficiency.