Research & Data

Do Origami Sites Block AI Crawlers? 2 of 8 Do

Jun 14, 2026

Origami sites mostly leave the door open to AI. Of the 10 Origami sites we checked, 8 returned a parseable robots.txt, and only 2 of those eight tell at least one AI crawler to stay out. That is a 25% block rate — well under what the wider web posts.

This is a sealed-snapshot report. Every figure below is a verbatim count read from public robots.txt files captured on June 14, 2026, and frozen under snapshot sha 5d5458529dab2773. We did not survey site owners or model anything; we read what the files say. The full edition spans 1197 sites across 120 categories, 993 of which returned a parseable robots.txt — Origami is one narrow slice of that whole.

2 of 8 Origami sites block at least one AI crawler.

The two sites that gate are origamiusa.org and paperkawaii.com. The published-policy sites that allow every crawler are origami-resource-center.com, happyfolding.com, origamiway.com, origamispirit.com, origami-instructions.com, and origami.me. Two more — gilad.co.il and origamiclub.com — returned no robots.txt at all, which is a different state from an explicit allow.

A single sentence captures the slice for anyone skimming: of the eight Origami sites that publish a robots.txt, two ask an AI crawler to stay away and six leave every crawler welcome, putting the category just under the corpus norm. A robots.txt file, for the unfamiliar, is a plain-text file at a domain's root that lists which automated agents may or may not fetch which paths — the web's standard, voluntary way of stating crawl preferences.

What the 25% Origami Block Rate Actually Means

A robots.txt block is a request, not a wall. The file is an honor-system standard: well-behaved crawlers read it and comply, but nothing in the protocol forces compliance. So when origamiusa.org disallows an AI user-agent, it is publishing a preference that cooperating bots respect.

The Origami slice sits below the corpus line. Corpus-wide, 285 of 993 sites block at least one AI crawler. That works out to a 28.7% block rate across the full snapshot, and Origami's 25% lands just under it.

Of 8 Origami sites with a published robots.txt, 6 allow every AI crawler we tracked.

For a craft built on freely shared diagrams and folding tutorials, a permissive posture is unsurprising. Diagram libraries, instruction archives, and community hubs generally want their material found, indexed, and surfaced — and that instinct extends to answer engines now reading the same pages.

Origami's 25% block rate sits below the corpus-wide 28.7%, with origamiusa.org and paperkawaii.com the only gatekeepers.

You can see the same pattern across adjacent hobbies. We measured it in our look at how leathercraft supply sites handle AI crawlers, where a single gatekeeper sets the rate for an otherwise open vertical. The openness also rhymes with what we found in our report on embroidery sites and AI crawlers, another diagram-and-pattern craft.

The two gatekeepers are worth reading individually. origamiusa.org is a long-standing membership organization with a deep archive of diagrams and event material, so a protective stance over that catalog is consistent with a publisher guarding original work. paperkawaii.com is a creator-led tutorial site, where the author may simply prefer that AI systems not ingest the original folding instructions wholesale. Neither posture is unusual; what is notable is how few sites in the category share it.

Where Origami Sits Among Similar Hobbies

Origami clusters with several other low-block categories. The focused window below shows Origami and its nearest neighbors in the block-rate ranking — the categories folding in right around the same level.

CategorySitesWith robots.txtBlock ≥1 crawlerBlock rate
Knitting97228.6%
Crafts108225%
InteriorDesign44125%
Space98225%
BoardGames108225%
DiscGolf108225%
Origami108225%
HR109222.2%
Skiing109222.2%
Archery109222.2%

Origami shares its 25% rate with Crafts, BoardGames, DiscGolf, and a couple of others — a tight band of maker and tabletop hobbies that treat their content the same way. The folding-paper world reads less like a guarded publisher and more like a shared library.

For contrast, the extremes are stark.

CategoryWith robots.txtBlock ≥1 crawlerBlock rate
Gaming9888.9%
News161381.3%
Pickleball1000%
Tea1000%

Gaming and News gate aggressively; hobby and lifestyle verticals like Pickleball and Tea barely gate at all. Origami sits firmly with the second group. A similar enthusiast openness shows up in our report on how candlemaking sites treat AI crawlers, where the published policies left every bot welcome.

Corpus-wide, 285 of 993 sites block at least one AI crawler.

That corpus figure is the yardstick for every category in the edition. Against it, the gap between Gaming's 88.9% and Origami's 25% is the real story of this snapshot: gating is concentrated in a handful of high-stakes verticals — news, games, large tech and media properties — while the long tail of hobbies and crafts stays open. Origami is squarely in that long tail, and its handful of published policies makes the category an unusually clean place to watch for the first sign of that pattern shifting.

Who Gets Disallowed Across the Corpus

When a site does gate, which crawlers does it name? The leaderboard below is corpus-wide — counted across all 993 sites, not just Origami — and shows the operators most often disallowed.

OperatorSites disallowing (across all 993 sites)
Common Crawl211
Anthropic201
OpenAI193
Meta184
ByteDance183

Common Crawl leads at 211 sites, with Anthropic and OpenAI close behind. These are the operators a gating site is most likely to name. In Origami, only origamiusa.org and paperkawaii.com do any naming at all, so the category contributes little to these totals.

How the Snapshot Was Sealed

The method is deliberately boring, because boring is what makes the numbers trustworthy. We fetched the public robots.txt file for each site in the category, parsed every User-agent and Disallow directive, and recorded whether any line targets an AI crawler we track. A site counts as a blocker if it disallows at least one such agent on any path; everything else is an allow or a no-file state.

The reading is then content-hashed and frozen, so the snapshot sha 5d5458529dab2773 pins exactly these bytes. If a robots.txt changes tomorrow, this report does not — it remains the June 14, 2026 state. The counts are plain tallies: 10 sites checked, 8 with a parseable file, 2 blocking. Nothing is estimated, modeled, or extrapolated, and no figure here is derived by arithmetic on the corpus totals.

One honest caveat shapes how to read Origami specifically. With only eight published policies in the category, a single site changing its file moves the rate by a visible amount. That sensitivity is exactly why the drift-monitoring use case below matters: in a small category, one new disallow line is a real signal, not noise.

There is a useful practical reading for anyone in the folding world. The data says most Origami instruction — six of eight published policies — is openly available to AI answer engines, so a beginner asking an assistant how to fold a crane is likely getting an answer drawn from these very sites.

The two that gate, a membership archive and a creator's tutorial site, are the ones with the most original work to protect, which is a coherent pattern even at this small scale. Whether more sites join them is a question a single snapshot cannot answer — but a sealed baseline read again over time can show the drift the moment it starts.

Origami sites post a 25% AI-crawler block rate.

Key Takeaways

  • Of 10 Origami sites checked, 8 published a parseable robots.txt and 2 of those block at least one AI crawler — a 25% rate.

  • The two gatekeepers are origamiusa.org and paperkawaii.com; the rest of the published policies welcome every crawler.

  • Origami's 25% sits just below the 28.7% corpus rate and clusters with Crafts, BoardGames, and DiscGolf.

  • Across all 993 sites, Common Crawl is the most-disallowed operator at 211 sites.

  • robots.txt is an honor-system signal, so these counts measure stated intent, not enforced blocking.

Frequently Asked Questions

Q: Does blocking a crawler in robots.txt actually stop it?

A: Not on its own. robots.txt is a published preference under an honor-system standard — compliant crawlers read it and back off, but the file has no power to enforce anything. A disallow line on origamiusa.org signals intent that cooperating AI operators respect, nothing more.

Q: Which Origami sites block AI crawlers?

A: Two of the eight sites with a published policy: origamiusa.org and paperkawaii.com. The other six published policies — including origami-resource-center.com, happyfolding.com, and origami.me — allow every AI crawler we tracked.

Q: Why does Origami block AI crawlers less than the overall web?

A: Origami's 25% rate sits under the 28.7% corpus rate. The craft runs on freely shared diagrams and tutorials, so most community and instruction sites want their pages found and surfaced rather than walled off — a posture it shares with Crafts and BoardGames at the same level.

Q: What about the Origami sites with no robots.txt?

A: Two sites, gilad.co.il and origamiclub.com, returned no robots.txt at all. That is not the same as an allow rule — there is simply no published file to read. We count those separately and never infer intent from their absence; nothing is estimated, modeled, or extrapolated.

Q: How does Origami compare to its nearest-neighbor categories?

A: Origami's 25% matches Crafts, BoardGames, and DiscGolf exactly, and sits just above the 22.2% band of HR and Skiing. It is a tight cluster of maker and tabletop hobbies that all gate lightly. The category is firmly in the open half of the corpus, far from the Gaming and News end where most sites block.

Q: Does a 25% block rate hurt how Origami pages show up in AI answers?

A: Only for the two sites that gate. origamiusa.org and paperkawaii.com signal that compliant AI crawlers should skip them, which can keep their pages out of retrieval-based answers. The six allowers — origami.me, happyfolding.com, and the rest — remain fully eligible, so most Origami instruction content stays available to answer engines.

Put AI-Access Data to Work

The first buyer for this slice is a horizontal one: an AI-search and GEO agency tracking which client-eligible corpora stay open to retrieval. For an agency managing dozens of craft and hobby clients, the recurring job is to re-crawl this Origami set weekly and alert the moment a named allower — say origamiway.com or origami.me — adds an AI user-agent to its disallow list, because a newly gated page drops out of answer-engine eligibility. The same watch runs across every category at once, with the 25% baseline here as the anchor for drift.

A category-native second ICP is an origami-paper and supply ecommerce lead who sells into this audience. That buyer can monitor whether the instruction sites their customers rely on — origamiusa.org, happyfolding.com — stay crawlable, since AI-surfaced tutorials shape where folders learn and buy. US Tech Automations automates that monitoring with scheduled robots.txt crawls, change alerts, and an AI-access dashboard. See how agentic monitoring workflows run on this kind of signal.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 5d5458529dab2773).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do Origami Sites Block AI Crawlers? 2 of 8 Do.” https://ustechautomations.com/resources/blog/do-origami-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 5d5458529dab2773

Machine-readable data: CSV · JSON · All research & methodology

About the Author

Garrett Mullins
Garrett Mullins
Workflow Specialist

Helping businesses leverage automation for operational efficiency.