Research & Data

Do HR Sites Block AI Crawlers? 2 of 9 Do

Jun 14, 2026

HR technology is a category where most of the major platforms want to be found — and the robots.txt data confirms it. Of the 10 HR sites we checked, 9 returned a parseable robots.txt file, and only 2 of those 9 block at least one AI crawler. That is a 22.2% block rate, well below the corpus-wide average of 33.4% across all 479 sites.

The distinctive finding here is which sites block and which do not: the 2 blockers are trade editorial outlets, while the 7 allowors are enterprise HR software platforms and professional associations whose entire growth model depends on discoverability.

A robots.txt file is the standard plain-text document that domain operators publish to direct crawlers — AI training bots, retrieval-augmented generation spiders, and search engines alike — on what they may access. Compliance is voluntary; the standard is honored by reputable operators but not technically enforced. What the file reveals is the site operator's stated posture toward automated content access at the time of the snapshot.

2 of 9 HR sites block at least one AI crawler.

HR sites post a 22.2% AI-crawler block rate.

Corpus-wide, 160 of 479 sites block at least one AI crawler.

Key Takeaways

2 of 9 HR sites with a parseable robots.txt block at least one AI crawler.

The HR block rate of 22.2% sits well below the corpus-wide average of 33.4% across all 479 sites.

tlnt.com and hrexecutive.com are the 2 HR blockers; 7 platforms and associations remain fully open.

Of 10 HR sites checked, 9 returned a parseable robots.txt; 2 of those 9 block at least one AI crawler.
Both blockers are trade editorial outlets; the 7 allowers are enterprise HR software platforms and professional associations.
gusto.com returned no parseable robots.txt; no stance can be attributed to it from this snapshot.
Corpus-wide, 160 of 479 sites (33.4%) block at least one AI crawler across all 56 categories.
CCBot (Common Crawl) is the most-blocked bot across the full 479-site corpus at 124 sites.

The Sites Behind the 2 of 9

The 2 sites that block at least one AI crawler are tlnt.com and hrexecutive.com. Both are trade publications serving HR professionals — TLNT (talent management editorial) and HR Executive (industry news and event coverage). Their blocking posture is consistent with the pattern seen in other editorial-heavy categories: proprietary articles, exclusive research reports, and subscriber-content paywalls are assets worth protecting from AI training pipelines.

"2 of 9 HR sites with a parseable robots.txt block at least one AI crawler as of June 14, 2026 — a 22.2% rate, well below the corpus-wide 33.4%."

The 7 sites that allow all crawlers are shrm.org, workday.com, bamboohr.com, adp.com, greenhouse.io, lever.co, and hrdive.com. This group is a cross-section of the HR ecosystem: SHRM is the dominant professional association; Workday, BambooHR, ADP, Greenhouse, and Lever are HR software platforms; HRDive is a trade news outlet that chose a permissive stance despite being in the same media category as the 2 blockers.

2 of 9 HR sites block at least one AI crawler — tlnt.com and hrexecutive.com.

shrm.org, workday.com, bamboohr.com, adp.com, greenhouse.io, lever.co, and hrdive.com each allow all tracked AI crawlers.

The software platforms form the permissive core of this category. Enterprise HR platforms grow through content marketing — thought leadership articles, salary benchmarks, compliance guides — that they actively want AI assistants and search engines to surface to HR practitioners. Blocking AI crawlers would undercut their top-of-funnel content strategy. That commercial logic explains why the 7 software and association sites are permissive while the 2 editorial outlets are not.

The remaining site — gusto.com — returned no parseable robots.txt in this snapshot. Gusto is a payroll and HR platform, and its absence from the policy landscape simply means no explicit robots.txt signal was detected. We cannot attribute blocking or allowing to gusto.com from this data.

What This Block Rate Signals for HR

The 22.2% HR block rate sits below the corpus average by a meaningful margin but is not near the zero-block floor occupied by Toy, Construction, Manufacturing, and Logistics. It reflects a category in transition: the dominant platforms are open, but the trade press is beginning to restrict access.

This split is likely to widen over time as AI tools increasingly compete with HR publications for practitioner attention. Platforms like Workday and ADP want their content cited in AI answers; trade magazines like HR Executive and TLNT want their content licensed or paywalled, not scraped for training. The divergence in robots.txt posture maps onto that commercial tension.

Compare the HR picture to accounting sites, where 4 of 8 sites block (50%) — a much higher rate, driven partly by financial standards bodies and subscription publications that treat their content as licensed intellectual property rather than open marketing material.

The HR block rate of 22.2% reflects a category where software platforms dominate and trade editorial is only beginning to restrict AI access.

For a category where the editorial vs. platform split has resolved further toward open access, the toy category's 0% illustrates what a fully permissive landscape looks like when every participant benefits from wide AI discoverability.

How HR Compares to Its Nearest Neighbors

The focused window below centers on HR's position in the block-rate ranking, showing the categories immediately above and below it — all values verbatim from the sealed allCategoriesRanked data.

Category	Sites Checked	Sites with robots.txt	Sites Blocking Any AI Crawler	Block Rate
Pets	10	7	2	28.6%
Crafts	10	8	2	25%
HR	10	9	2	22.2%
Finance	12	11	2	18.2%
Retail	15	12	2	16.7%
Education	9	7	1	14.3%
Government	9	8	1	12.5%

HR sits in a band of categories — Crafts, Finance, Retail, Education, Government — where blocking is present but uncommon. Notably, Finance (18.2%) and Retail (16.7%) are both below HR despite being larger, more commercially significant categories, which suggests that enterprise software and retail-commerce platforms are even more permissive than the HR sector.

At the opposite end of the corpus, the high-blocking categories are:

Category	Block Rate
Gaming	88.9%
News	82.4%
Food	70%

Those three categories share concentrated proprietary content — live scores, breaking news, recipes and editorial — that sites actively protect. HR's content profile is more distributed, with software platforms publishing largely for discovery rather than protection.

"HR's 22.2% block rate sits below the corpus average, driven by 7 permissive software and association platforms versus 2 trade editorial blockers."

Reading the Bot Leaderboard

Because only 2 HR sites block any crawler, the per-category bot breakdown is not granular enough to reveal patterns. The corpus-wide leaderboard — across all 479 sites — shows which bots face the most resistance and frames what an HR site would be joining if it chose to add blocks.

Bot	Sites Blocking (all 479 corpus sites)	Share
CCBot	124	25.9%
ClaudeBot	108	22.5%
GPTBot	97	20.3%
Bytespider	96	20%
Meta-ExternalAgent	86	18%
Applebot-Extended	83	17.3%
Google-Extended	83	17.3%
PerplexityBot	75	15.7%
Amazonbot	73	15.2%

CCBot leads with 124 sites because Common Crawl has been a disallow target for years, predating the current AI-training debate. ClaudeBot and GPTBot reflect blocks added specifically in response to generative AI training since 2023 — both above 97 sites across the corpus. For HR, the implication is that any trade publication that decides to restrict AI access is most likely to start with CCBot and the OpenAI/Anthropic crawlers, given how heavily those are targeted corpus-wide.

CCBot is blocked by 124 of 479 corpus sites — the top of the bot leaderboard in the June 2026 snapshot.

Methodology: How the Snapshot Was Sealed

US Tech Automations fetched the robots.txt file for each of the 572 sites across 56 categories, stored each file verbatim, and sealed the collection under snapshot hash 4e7c4a4a3c720f06 on June 14, 2026. Parsing applied a 9-token AI crawler recognition set — CCBot, ClaudeBot, GPTBot, Bytespider, Meta-ExternalAgent, Applebot-Extended, Google-Extended, PerplexityBot, and Amazonbot — and flagged any Disallow directive covering the root or full site as a block against that bot.

Sites returning no parseable file (gusto.com in this category) are listed separately and excluded from block rate calculations. nothing is estimated, modeled, or extrapolated — every figure in this report is a verbatim count from the sealed snapshot.

Collect. Fetch https:///robots.txt for each corpus site; store the raw file.
Parse and classify. Match User-agent tokens against the 9-bot list; record root-level Disallow rules as blocks.
Seal. Compute sha256 across the collected files; publish hash 4e7c4a4a3c720f06 alongside the counts.

The llms.txt signal is tracked separately at the corpus level: 102 of 479 sites (21.3%) publish an llms.txt file as of this snapshot. That figure is a distinct standard from robots.txt blocking and is not conflated with the 22.2% HR block rate.

Frequently Asked Questions

Q: Why do HR software platforms allow AI crawlers when HR trade media restricts them?

A: The content economics differ. HR platforms — Workday, BambooHR, ADP, Greenhouse, Lever — publish content marketing (guides, benchmarks, compliance articles) that they want AI tools to surface in practitioner searches. Their product pages and documentation benefit from AI indexing. Trade publications like TLNT and HR Executive publish exclusive editorial whose commercial value depends on paywalls or licensing arrangements — AI training pipelines bypass that model.

Q: What does gusto.com's absence from the robots.txt index mean?

A: Gusto returned no parseable robots.txt in this snapshot. It is listed in noRobotsSites. This means no explicit signal about crawler access exists for that domain in this data. Most AI crawler operators default to treating a missing file as permissive, but Gusto's terms of service or other mechanisms may impose restrictions that are not visible in robots.txt.

Q: Is 22.2% likely to increase for HR sites?

A: This report covers a single sealed snapshot — June 14, 2026 — and makes no forward-looking claims. The cross-sectional data captures the state at one moment. Monitoring for drift requires ongoing re-crawls. What we can say from the data is that the current split between editorial blockers and platform allowers reflects a structural difference in content strategy that is unlikely to reverse quickly.

Q: Which bots would HR sites most likely target if they added restrictions?

A: Based on the corpus-wide leaderboard, CCBot (124 sites), ClaudeBot (108 sites), and GPTBot (97 sites) are the most commonly targeted. Sites that begin restricting AI access typically start with those three, given their prevalence in existing block lists and the public attention on Common Crawl, Anthropic, and OpenAI training pipelines.

Q: Does SHRM allow all AI crawlers despite publishing proprietary research?

A: According to this snapshot, shrm.org is in the allowerSites list — it returned a parseable robots.txt and that policy does not block any of the 9 tracked AI crawlers. SHRM does publish member-only and licensed research, but it has not extended that protection to the robots.txt layer as of June 14, 2026. The sealed data records its current posture; future updates could change this.

Put AI-Access Data to Work

A payroll-data ops lead at an HR platform — someone responsible for maintaining partner integrations and monitoring what competitive intelligence and aggregators can access about their platform's public-facing pricing, feature, and help documentation — has a direct use for this data.

The actionable workflow: weekly automated re-crawl of the 10 HR sites in this corpus, with an alert the moment any currently permissive site — particularly workday.com, bamboohr.com, or adp.com — adds a Disallow directive for a bot that the platform uses for competitive content monitoring. The trigger is a change in the robots.txt file; the cadence is weekly. A change by a major competitor signals a shift in their content-access posture that may warrant a policy review internally.

An HR tech content-intelligence analyst tracking which sites remain open for training-data sourcing or retrieval-augmented generation over HR content can use the allowerSites list as a confirmed-permissive corpus. The 7 permissive sites in this snapshot represent a substantial body of professional HR content — compensation guidance, compliance updates, recruitment best practices — that is openly accessible via crawler. Monitoring for any shift to blocking converts the snapshot into an early-warning system.

A data-pipeline engineer building an AI assistant for HR professionals — one that answers questions about compliance, benefits, or talent strategy — needs to know which sources can be reliably crawled and which cannot. The two blocking sites (tlnt.com, hrexecutive.com) should be excluded or accessed through licensed means. The 7 permissive sites can be crawled according to their published policies. For a comparison with how a neighboring professional-services vertical handles the same question, see do accounting sites block AI crawlers.

US Tech Automations automates robots.txt monitoring with scheduled crawls, change-diff alerts, and a per-category AI-access dashboard so policy shifts are detected the moment they happen — without any manual re-checking.

Build automated HR content-access monitoring on the platform

See where HR sites fit in the broader trend in our study of how many top websites block AI crawlers.

Source: US Tech Automations Research — Closing Web edition; figures are verbatim counts from public robots.txt files sealed June 14, 2026 (snapshot sha 4e7c4a4a3c720f06).

Get this data as a daily feed

The numbers in this report come from a permit feed we monitor daily. Leave your email and we will follow up about a daily feed for your ZIPs and categories.

Prefer to talk first? Contact us.

Cite this report

US Tech Automations Research, 2026-06 edition. “Do HR Sites Block AI Crawlers? 2 of 9 Do.” https://ustechautomations.com/resources/blog/do-hr-sites-block-ai-crawlers-2026

Sealed snapshot sha256: 4e7c4a4a3c720f06

Machine-readable data: CSV · JSON · All research & methodology