Research & Data

Do Crafts Sites Block AI Crawlers? Sealed robots.txt Data

Jun 14, 2026

The Crafts category presents a picture that is at once simple and structurally interesting. Of 10 Crafts sites checked in the June 2026 Closing Web snapshot, 8 returned a parseable robots.txt — joann.com and craftgawker.com did not — and of those 8, exactly 2 carry at least one AI-crawler disallow directive: thesprucecrafts.com and allfreesewing.com. That yields a 25% block rate, comfortably below the 42% corpus average across all 293 sites.

What makes this noteworthy is not the blocking sites but the allower set. Ravelry.com, instructables.com, michaels.com, hobbylobby.com, craftsy.com, and makezine.com all allow every tracked AI crawler without restriction. That list includes the two largest US craft retailers (Michaels and Hobby Lobby), the dominant knitting and crochet community (Ravelry), and a major DIY project repository (Instructables). The commercial and community pillars of the Crafts category are open; the blockers are editorial media properties, not retailers or platforms.

2 of 8 Crafts sites block at least one AI crawler.

Crafts sites carry a 25% AI-crawler block rate.

Across 293 sites, 123 block at least one AI crawler — a 42% rate.

Key Takeaways

2 of 8 Crafts sites with a parseable robots.txt block at least one AI crawler.

Thesprucecrafts.com and allfreesewing.com carry at least one AI-crawler disallow directive.

The Crafts block rate of 25% sits well below the 42% corpus average across all 293 sites.

Across all 293 sites in this June 2026 edition, 123 block at least one AI crawler.

"2 of 8 Crafts sites with parseable robots.txt block at least one AI crawler — a 25% block rate as of June 14, 2026."

"Joann.com and craftgawker.com returned no parseable robots.txt file; they are excluded from the block-rate denominator."

Who Gates the Crawlers Here — and Who Does Not

The two blocking sites share a structural similarity: both are content-editorial properties whose primary product is written instructions and patterns. The Spruce Crafts is a Dotdash Meredith property, a large publisher that has adopted AI-crawler restrictions across several of its brands; its disallow stance reflects a corporate policy more than a crafts-specific concern. AllFreeSewing.com is a pattern repository that may be protecting the user-generated content it hosts from being reproduced in AI outputs without compensation to its contributor community.

The six allowers span the full commercial and community spectrum. Ravelry.com is a remarkable case — it hosts hundreds of thousands of user-created knitting and crochet patterns, yet its robots.txt does not disallow AI crawlers. Instructables.com, owned by Autodesk, similarly hosts a vast trove of user-contributed how-to content and remains open. Michaels and Hobby Lobby are retailers whose web presence is primarily product catalog and store-locator content; robots.txt blocking offers them no obvious IP protection. Craftsy.com and Makezine.com round out the open set as a video-course platform and a maker-culture magazine, respectively.

The pattern here — editorial publishers more likely to block, community platforms less likely — echoes what we see in other content categories. Retailers and community hubs prioritize discoverability; media companies with ad-supported content have stronger incentives to keep AI crawlers out of their text archives.

For a category where community content hosts are largely open, compare the pet sites report, where a 28.6% block rate reflects a similar mix of open platforms and selective editorial publishers. The religion sites report is also relevant — at 11.1%, it shows that community-mission sites with large free content libraries tend toward openness even more strongly.

The Crafts Block Rate in Full Corpus Context

The table below covers all 32 categories in the June 2026 edition. Crafts sits in the lower third of the distribution.

Category	Sites Checked	With robots.txt	Blocking	Block Rate
Gaming	9	9	8	88.9%
News	20	17	14	82.4%
Food	10	10	7	70%
Tech	15	13	9	69.2%
Entertainment	9	9	6	66.7%
Healthcare	10	9	6	66.7%
Music	10	9	6	66.7%
Parenting	10	8	5	62.5%
Reference	14	11	6	54.5%
Science	10	10	5	50%
Automotive	10	9	4	44.4%
HomeGarden	10	9	4	44.4%
Fashion	9	7	3	42.9%
Social	10	10	4	40%
Sports	10	10	4	40%
Fitness	10	10	4	40%
Photography	10	10	4	40%
Jobs	10	8	3	37.5%
Travel	9	9	3	33.3%
Weather	10	6	2	33.3%
Legal	10	7	2	28.6%
RealEstate	10	7	2	28.6%
Pets	10	7	2	28.6%
Crafts	10	8	2	25%
Finance	12	11	2	18.2%
Retail	15	12	2	16.7%
Education	9	7	1	14.3%
Government	9	8	1	12.5%
Crypto	9	8	1	12.5%
Religion	10	9	1	11.1%
Nonprofit	10	6	0	0%
Streaming	10	10	0	0%

Crafts at 25% falls in a cluster with Finance (18.2%), Retail (16.7%), and Pets (28.6%) — sectors where the majority of players see some benefit in being found by AI systems. The contrast with the top of the table is stark: Gaming (88.9%) and News (82.4%) are categories where text and IP-intensive content is the core product and scraping litigation is well-established.

The Operator-Level Picture Across All 293 Sites

These figures are corpus-wide counts from all 293 sites with parseable robots.txt files in this edition.

Operator	Sites Blocking (of 293)
Common Crawl	97
Anthropic	93
Meta	80
OpenAI	77
ByteDance	75
Perplexity	69
Apple	67
Google	66
Cohere	63
Diffbot	60
Amazon	56
Mistral	23

Bot Token	Sites Blocking (of 293)	Block Rate
CCBot	97	33.1%
ClaudeBot	87	29.7%
Bytespider	75	25.6%
GPTBot	74	25.3%
Meta-ExternalAgent	70	23.9%
PerplexityBot	68	23.2%
Applebot-Extended	67	22.9%
Google-Extended	66	22.5%
Amazonbot	56	19.1%

Common Crawl remains the most-blocked operator across the full corpus (97 sites), reflecting its role as a foundational dataset for many AI training pipelines. Mistral appears least (23 sites) — a function of its relative newness in a landscape where robots.txt files are often updated reactively, not proactively.

Methodology — How the Snapshot Was Sealed

The Closing Web snapshot was sealed June 14, 2026 (sha a5ca246fbdc79954). For each of the 339 domains, the team fetched the robots.txt at the domain root and parsed every User-agent block. A domain is classified as "blocking" if any of the 9 tracked AI crawler tokens appeared in a disallow directive. A domain that returns no parseable robots.txt is recorded separately and is excluded from the block-rate denominator — hence the Crafts denominator is 8, not 10.

nothing is estimated, modeled, or extrapolated. Every count is a verbatim read from the sealed file set. The 42% corpus rate (123 of 293) reflects this same methodology applied across all 32 categories.

Robots.txt is an honor-system standard. A disallow directive is a public declaration, not a technical barrier. Crawlers that respect the standard will skip disallowed paths; crawlers that do not may still access them. This data measures declared intent, not compliance or enforcement.

Frequently Asked Questions

Q: Why does the Crafts block rate use 8 as the denominator rather than 10?

A: Joann.com and craftgawker.com returned no parseable robots.txt during the June 14, 2026 snapshot. Those sites cannot declare a block or allow in robots.txt if no file exists, so they are excluded from the rate calculation. The 25% block rate is computed over the 8 sites that did return a parseable file.

Q: Could Ravelry block AI crawlers in the future, changing the rate?

A: Yes. Robots.txt is updated at the domain owner's discretion. This snapshot is point-in-time. No trends or forecasts are possible from a single cross-sectional observation. Re-running the snapshot at a later date and comparing the two reads is how drift is detected.

Q: Why would a craft retailer like Michaels or Hobby Lobby allow AI crawlers?

A: Retailers benefit from their product catalog and store information appearing in AI-generated answers about where to buy supplies. Blocking AI crawlers could reduce their visibility in those contexts, which is a distribution cost with no obvious offsetting protection benefit for product data they already publish publicly.

Q: How does the Crafts block rate fit into the broader corpus?

A: At 25%, Crafts sits in the lower third of the 32-category ranking. The corpus average is 42% across 293 sites. Crafts is well below that line, in a cluster with Retail (16.7%), Finance (18.2%), and Pets (28.6%) — sectors that share a discoverability-first orientation. Our fitness sites report covers another category in that same 40% band for a useful contrast.

Q: What is the significance of the two missing robots.txt files in the Crafts sample?

A: Sites without a parseable robots.txt have not published a machine-readable AI-access policy. In the absence of a robots.txt, well-behaved crawlers typically treat all paths as accessible by default. The two sites without files — joann.com and craftgawker.com — are neither blocking nor explicitly allowing; they simply have no declared policy as of this snapshot date.

Put AI-Access Data to Work

The Crafts category data is directly useful for three types of practitioners, each with a recurring workflow.

An SEO or content-strategy lead at a craft media brand or publisher needs to understand which peer domains have placed AI-access restrictions on their content. If The Spruce Crafts is disallowing CCBot while Instructables is not, content that appears in AI-generated craft tutorials will disproportionately surface from the open domains. A practical workflow: re-check the Crafts category robots.txt set weekly; flag any newly blocking domain whose disappearance from AI outputs changes your competitive landscape; adjust your content brief to target topics where open-access peers are thin.

A publisher RevOps lead at a pattern or tutorial platform — think Ravelry or Craftsy — should audit their own robots.txt monthly against the operator leaderboard (Common Crawl at 97 corpus-wide, Anthropic at 93) and confirm that their current policy is intentional. An operator they meant to allow or block could be missing from the file due to a stale configuration. A recurring monthly audit against the sealed baseline catches silent drift.

A retrieval or data-pipeline engineer building a craft-content knowledge base or recommendation system needs to know which Crafts domains are currently crawl-permitted. The 6 allower sites in this snapshot form a confirmed starting set. A useful workflow: maintain a domain allowlist derived from this snapshot; re-verify each domain weekly; remove any newly-blocking domain from the fetch queue before the next indexing run.

US Tech Automations builds and operates exactly this kind of automated monitoring — scheduled robots.txt fetches, per-domain change detection, and policy-drift alerts the moment a new disallow entry appears. Set up a monitoring workflow at agentic workflows.

Zoom out: Crafts is just one vertical in a much larger picture — our cross-industry study measures how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha a5ca246fbdc79954).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Crafts Sites Block AI Crawlers? Sealed robots.txt Data.” https://ustechautomations.com/resources/blog/do-crafts-sites-block-ai-crawlers-2026

Sealed snapshot sha256: a5ca246fbdc79954

Machine-readable data: CSV · JSON · All research & methodology