The machines have moved in
Cloudflare sees over 200 trillion Internet interactions every single day. The data from Q1 2026 is unambiguous: we are witnessing the most significant platform shift in the history of the web — the transition from a human-centric Open Web to a machine-centric Programmable Web.
The tipping point has arrived. Cloudflare data on HTML page requests by client type shows that AI bots and non-AI automated bots combined now account for more than 50% of all HTML page requests. For the first time in the web's history, machines outnumber people as the primary consumers of Internet content — not in aggregate data pipelines, but at the level of individual page loads.
Within bot traffic, the share attributable to AI crawlers grew 4.2 percentage points quarter-over-quarter, from 17.5% in Q4 2025 to 21.8% in Q1 2026. Meanwhile, traditional search engine crawling — the traffic that has historically returned value to publishers — declined 5.2 percentage points. The machines are not replacing search. They are replacing the deal.
"Bot activity has completely decoupled from human utility. The crawl-to-referral ratio has inverted. We're calling this the Great Divergence."
— Matthew Prince, CEO, CloudflareThe Great Divergence
The fundamental collapse is not hard to find in the data. The bot category breakdown tells the story of a web that has pivoted away from human utility:
Bot Traffic by Category — Q1 2026 vs Q4 2025
| Category | Q4 2025 | Q1 2026 | Change | Share |
|---|---|---|---|---|
| Search Engine Crawler | 36.3% | 31.1% | ▼ −5.2pp | |
| AI Crawler | 17.5% | 21.8% | ▲ +4.2pp | |
| SEO & Analytics | 12.2% | 13.3% | ▲ +1.1pp | |
| Advertising & Marketing | 13.0% | 10.6% | ▼ −2.4pp | |
| Page Preview | 6.6% | 6.6% | → flat | |
| Monitoring & Analytics | 3.6% | 3.7% | ▲ +0.1pp | |
| AI Search | 2.2% | 3.2% | ▲ +1.0pp | |
| Webhooks | 2.5% | 3.1% | ▲ +0.6pp | |
| Aggregator | 2.7% | 2.5% | ▼ −0.2pp |
The Top Bot Operators
Among individual bot operators, Google still commands 35.5% of all verified bot traffic — but its share dropped 6.2 percentage points QoQ. OpenAI is now the #3 bot operator on the entire Internet, at 9.5%, and growing.
AI Services Are Now Internet Infrastructure
One measure of how deeply AI has embedded itself in the Internet: ChatGPT is the #8 most-trafficked service globally, ahead of Amazon Shopping, TikTok, and Netflix. It is the #12 most-visited domain in the world.
Note: Lower rank number = higher traffic. Y-axis inverted. Source: Cloudflare Radar Internet Services.
DeepSeek surged from #9–10 in the AI ranking to #4 in the single week of February 2, 2026 — immediately following global attention on DeepSeek-R1. A single model release can reshape the entire competitive landscape within days.
The Asymmetry of Extraction
The core dysfunction of the current AI web is not that bots exist — it's that they consume without reciprocating. Among all AI bot crawl activity in Q1 2026:
89.3% of all AI crawler requests are extractive, consuming content to build models that may eventually route around the source entirely. Only 8.1% powers a search product that sends users back to original content.
The Individual Crawler Pecking Order
The individual crawler ranking reveals the precise hierarchy of extraction. These are the ten most active crawlers on the web, ranked by share of all crawler traffic:
| # | Crawler | Operator | % of Crawlers | Volume |
|---|---|---|---|---|
| 1 | Googlebot | 23.7% | ||
| 2 | GPTBot | OpenAI | 17.0% | |
| 3 | ClaudeBot | Anthropic | 14.4% | |
| 4 | Meta-ExternalAgent | Meta | 12.6% | |
| 5 | BingBot | Microsoft | 7.9% | |
| 6 | Amazonbot | Amazon | 5.3% | |
| 7 | FacebookExternalHit | Meta | 3.1% | |
| 8 | YandexBot | Yandex | 2.6% | |
| 9 | PetalBot | Huawei | 2.5% |
GPTBot and ClaudeBot together account for 31.4% of all web crawler traffic — nearly a third of the entire crawler ecosystem — yet the referral products attached to them remain nascent. They consume at the scale of Google; they return traffic at a fraction of it.
Google's Structural Crawl Advantage
Googlebot's dominance in crawler traffic understates its true advantage. Cloudflare data shows Googlebot successfully accessed individual pages almost 1.7× more than ClaudeBot, 1.76× more than GPTBot, 3× more than Meta-ExternalAgent, and 3.26× more than Bingbot. The gap widens dramatically for smaller crawlers: Googlebot saw 14.9× more unique pages than Applebot, 167× more than PerplexityBot, and over 700× more than CCBot. Out of all sampled unique URLs on Cloudflare's network, Googlebot crawled roughly 8% of them.
This coverage asymmetry is self-reinforcing. Publishers are effectively unable to block Googlebot because Google's 90%+ share of the search market means blocking it destroys organic traffic. Yet that same Googlebot — used for search indexing — is also the mechanism through which Google populates AI Overviews and AI Mode with live content, returning little if any traffic to the source. Publishers face an impossible choice: allow full extraction, or disappear from search. No other AI company operates with this structural immunity. The UK's Competition and Markets Authority (CMA) designated Google as having Strategic Market Status in search precisely because of this dynamic, and opened a consultation in January 2026 on conduct requirements — including whether Googlebot should be split into separate crawlers for search indexing vs. AI grounding.
The Industry Heatmap: What AI Is Actually Consuming
Cloudflare Radar data on AI bot vertical distribution reveals which industries are most exposed — and what kind of content they should be protecting.
What Each Vertical Should Protect
| Vertical | AI Crawl Share | Primary Content at Risk |
|---|---|---|
| Shopping & Retail | 31.1% | Product descriptions, pricing, imagery, reviews |
| Internet & Telecom | 16.7% | Infrastructure docs, API references, technical content |
| Computer & Electronics | 14.9% | Technical documentation, code, specs |
| News, Media & Publications | 9.2% | Articles, analysis, journalism — highest text quality per request |
| Business & Industry | 5.1% | B2B content, industry reports, market data |
| Travel & Tourism | 4.0% | Listings, reviews, itineraries, pricing |
| Finance | 2.9% | Market data, financial reporting, disclosures |
Shopping and retail is the most-crawled category at 31.1% — not because consumers are using AI to shop (yet), but because product data is rich training corpus for AI models that aim to assist with purchasing decisions. The crawl happens long before any agentic commerce product launches.
News and Media accounts for 9.2% of AI crawl activity — a figure that understates the impact. A retail product description is one sentence. A news article is thousands of words of structured, fact-checked prose. The training value per crawl request is not equal.
HTML/Text — publishers, news, blogs. Language models want words. Product Data & Images — e-commerce. Vision models and shopping agents want catalog data. Video — entertainment platforms. ByteSpider (ByteDance) is the 7th-largest AI user agent at 3.4% of AI bot traffic. Code — developer platforms. GitHub Copilot ranked #6 in the global AI services category.
From Robots.txt to Honest Bot Detection
Publishers are responding with the only tool most of them have: a 47-year-old text file. GPTBot and ClaudeBot are the most restricted crawlers in the robots.txt ecosystem. In Cloudflare's verified bot catalog, 37 of 200 tracked bots are now AI crawlers or AI assistants — a category that barely existed two years ago.
Why Robots.txt Is Not Enough
First: compliance is voluntary. Our AI crawl timeseries shows that AI bot traffic hit its single-day peak of Q1 2026 on February 9 — even as blocking rates have been rising throughout the quarter. Volume is growing faster than restrictions can contain it.
Second: shadow scrapers exist. Our data shows 0.44% of AI bot crawl activity comes from bots declaring no purpose — Undeclared crawlers operating without transparency. These are bots that deliberately misidentify themselves, wearing a Googlebot mask to crawl as if they have permission they have not sought.
Third: the past is already gone. Models trained on data crawled before publishers erected their defenses have already consumed that content. The window of prevention is partly closed.
Fourth: blocking is not neutral. Publishers overwhelmingly target AI crawlers — not Googlebot. Among Cloudflare customers using AI Crawl Control between July 2025 and January 2026, the number of websites actively blocking crawlers like GPTBot and ClaudeBot was nearly 7× higher than those blocking Googlebot. The asymmetry is not ignorance — it is economic coercion. Publishers cannot afford to disappear from Google Search, so they absorb the extraction.
The UK's CMA, in its January 2026 consultation on Google's conduct requirements, identified this dynamic explicitly: publishers "have no realistic option but to allow their content to be crawled for Google's general search." The CMA proposed publisher opt-out controls, but Cloudflare — along with major UK publishers including the Daily Mail Group, the Guardian, and the News Media Association — has argued the proposal does not go far enough. The only structurally effective remedy is mandatory crawler separation: requiring Googlebot to split into distinct crawlers for search indexing vs. AI grounding, so publishers can allow one while blocking the other. Unlike other AI operators (OpenAI and Anthropic each operate purpose-specific crawlers), Google uses a single dual-purpose bot — giving it access no competitor can match.
Web Bot Auth: Cryptographic Verification
This is why Cloudflare built Web Bot Auth — a new standard using cryptographic signatures to verify a bot's identity at the protocol level. If a bot claims to be Googlebot, it signs its requests with a key only Google holds. If the signature fails, the claim is false. Web Bot Auth does not rely on bots choosing to be honest; it makes dishonesty technically detectable.
AI agents — bots that actively browse, fill forms, and execute tasks — currently represent 0.16% of verified bot traffic. Small today, but the fastest-growing subcategory. These agents cannot be managed with robots.txt alone. Cryptographic identity is the necessary infrastructure for the Programmable Web.
AI Search: A New Referral Ecosystem
The AI Search category — crawlers attached to AI-powered search products that do return traffic to original sources — is one of the most strategically important in the ecosystem. It grew 1.0 percentage point QoQ — the fastest-growing bot subcategory this quarter.
| Operator | Share of AI Search Traffic | Product |
|---|---|---|
| Apple | 58.8% | Applebot → Siri, Spotlight, Apple Intelligence |
| OpenAI | 41.0% | OAI-SearchBot → ChatGPT Search |
| Cloudflare | 0.13% | Cloudflare AI Search |
| Brave | 0.07% | Brave Search AI |
Apple is the dominant AI search crawler, driven by Applebot's role in powering Siri and Apple Intelligence. Apple's crawl activity was highly volatile in Q1 2026 — near zero on some days, spiking by late March — suggesting active product development. A company with 2.35 billion active devices and a first-party AI search product is a fundamentally different referral engine than anything that preceded it.
OAI-SearchBot spiked noticeably during peak AI news periods (Jan 22–24, Feb 9, Mar 10–11), suggesting its crawl intensity tracks closely with ChatGPT usage surges. As AI search products mature and publishers begin demanding referral as a condition of access, this is the category that matters most.
The AI Agent Layer: From Crawlers to Action
The most nascent but strategically significant development is the emergence of AI assistants as a distinct traffic category. These are not crawlers passively indexing content — they are agents actively executing tasks: browsing pages, filling forms, clicking buttons, completing purchases.
| Operator | Share of AI Assistant Traffic |
|---|---|
| OpenAI | 86.0% |
| Cloudflare (Browser Rendering) | 10.7% |
| DuckDuckGo | 2.5% |
| Mistral AI | 0.26% |
| Meta | 0.25% |
| Manus | 0.22% |
| Amazon | 0.03% |
| Devin AI | 0.03% |
The 21 verified AI assistant bots in our catalog include Amazon Bedrock AgentCore Browser (deployed across 9 AWS regions), Amazon's "Buy For Me" agent, Devin AI, CartAI, Apify, and Browserbase.
An agent that can complete a purchase does not need a publisher to survive — it disintermediates the human who would have visited the page, clicked the link, and converted. The "Zero-Click" future is not a search phenomenon alone. It is an agent phenomenon.
Security: Automated Aggression at Scale
The Programmable Web has a kinetic dimension that goes beyond data extraction. Q1 2026 saw a pronounced acceleration in application-layer attack volume, with the quarter's peak occurring on March 22 — the most intense single day of L7 attacks in our records. The final two weeks of March were the most sustained attack cluster of the quarter.
The Mirai Resurgence
At the network layer, Q1 2026 saw a dramatic shift in attack composition directly attributable to botnet evolution:
Mirai-family botnet floods grew from 7.9% to 34.4% of all network-layer attack traffic — a 4.3× increase in a single quarter. This is the clearest data fingerprint of the Aisuru-Kimwolf botnet: a coordinated army of 1–4 million malware-infected Android TV devices that produced the world-record 31.4 Tbps attack in late 2025. The botnet is still active, still expanding.
L7 Attack Composition — Q1 2026
| Mitigation Product | Q4 2025 | Q1 2026 | Change |
|---|---|---|---|
| WAF | 51.1% | 51.4% | ▲ +0.25pp |
| DDoS Protection | 43.5% | 44.3% | ▲ +0.77pp |
| Access Rules | 3.0% | 1.9% | ▼ −1.1pp |
| Bot Management | 0.47% | 0.59% | ▲ +0.11pp |
Q1 2026: Major Internet Disruptions
Our traffic anomaly data logged 100 verified disruption events in Q1 2026. The most significant:
Feb 28, 07:00–07:15 UTC · 7+ ISPs offline within 15 minutes CRITICAL
Mar 4–5, 16–17, 21–22 · ~23h each · ETECSA + country-level HIGH
Mar 15–17 · Country-level, 5 total disruptions Q1 HIGH
Jan 12–26 continuous; Mar 18 country-level (9.5h); Mar 20 TCI outage HIGH
The AI Inference Economy
Beyond what bots consume, Cloudflare's AI inference data reveals what developers are building on top of AI infrastructure in Q1 2026.
Top Models by Deployment
| # | Model | Share of Inference Accounts | Volume |
|---|---|---|---|
| 1 | Meta Llama 3 8B Instruct | 40.6% | |
| 2 | Stable Diffusion XL 1.0 | 13.4% | |
| 3 | Whisper (OpenAI) | 8.3% | |
| 4 | Meta Llama 4 Scout 17B | 7.0% | |
| 5 | M2M-100 1.2B (Translation) | 5.4% |
Text generation dominates at 62.9% of all AI inference, confirming that the primary use of AI infrastructure is language. Meta's Llama 3 is the most widely deployed model on Cloudflare's network at 40.6% of all inference accounts — Meta's open-weight strategy has produced the Internet's most-used inference model. The newly released Llama 4 Scout already accounts for 7.0% of inference accounts despite launching during the quarter — the fastest adoption curve we have observed for any new model.
Restoring the Balance
The data from Q1 2026 tells a coherent story. The web's economic engine — the deal between creators and distributors — is under structural strain. The machines have arrived in force. They are consuming at scale, returning at a trickle, and showing no signs of slowing.
The Great Divergence is not a coming threat. It is the current state.
Cloudflare's position is unchanged: the free lunch era of AI training is over. We are giving publishers the tools to create scarcity — the only leverage that creates value in a world of infinite automated extraction. We are building the infrastructure for the Programmable Web to be governed: cryptographic bot identity via Web Bot Auth, granular access controls, and a marketplace where the savings AI companies realize from efficient access flow back to the creators who made the content worth consuming.
The time to act is now. Block your content. Set your price. Reclaim your content independence before silence is treated as permission.