The second internet is here, and most websites are invisible to it
ChatGPT, Gemini, Claude, and Grok now answer questions for hundreds of millions of people every week. Most websites have done nothing to prepare for it. Here's what changed, why it matters, and what to do about it.
There are two internets now.
The first one — the one we've all built websites for — is read by people. They land on your homepage, scroll, click through to a product page, maybe read your blog, decide if they trust you. Twenty-five years of accumulated craft (design, copywriting, SEO) is aimed at that reader.
The second internet is read by machines. ChatGPT reaches your site, parses what it can, and decides whether to quote a sentence of yours in its answer to a real human's question. The human never visits you. They just hear what ChatGPT decided to say about you — paraphrased, summarised, sometimes wrong — and move on with their day.
These two internets share most of the same infrastructure. They run on the same web standards, the same DNS, the same TCP. But they read your site very differently, and what they reward is different too.
What's actually happening
Roughly 800 million people now use ChatGPT every week. Perplexity gets 100 million queries a month. Claude is increasingly the default for technical work. Google's AI Overviews appear above the search results for an estimated 20% of all queries, and for shopping queries, the number is higher.
For all of these systems, the model is the same: a person asks something, the AI answers, and one or two sources are cited as the basis for that answer. Sometimes the user clicks the citation. Often they don't. The citation is the prize.
Citations don't go to the highest-ranking page. They go to the page the model can read most cleanly — the one with structured content, clear hierarchy, machine-readable summaries, and explicit signals about what the page is about.
This is why traditional SEO is not enough. SEO optimises for position in a list of blue links. AEO optimises for being the chosen citation in a generated answer. They overlap. They aren't the same.
What models actually look at
When a model decides whether to quote your page, it looks at things that historically didn't matter very much:
Your /llms.txt file. This is a new convention from llmstxt.org. It's a curated, Markdown-formatted index that tells LLMs which content on your site is worth ingesting. Almost nobody has one. The sites that do see meaningful citation lifts.
Your robots.txt directives for AI crawlers specifically. Most robots.txt files address Googlebot and Bingbot from a decade ago. They don't say anything about GPTBot, ClaudeBot, PerplexityBot, or Google-Extended. The defaults end up being whatever each crawler picks. Sometimes that means your content is being trained on; sometimes it means you're being silently excluded.
Whether your server can return Markdown when asked for it. AI crawlers increasingly send Accept: text/markdown and skip pages that respond with HTML soup full of script tags and tracking pixels. Your CMS probably can't do this. The fix is roughly twenty lines of edge middleware.
Your structured data. Schema.org JSON-LD is no longer just for rich snippets in Google's SERPs. LLMs use it as a hint about what your page is about — Organization, Product, Article, Recipe, FAQ. Pages with valid JSON-LD are dramatically more likely to be cited correctly.
Your headings and content hierarchy. Models read documents the way humans skim — by heading, then by paragraph. If your homepage has four H1s and no H2s, it reads as a single undifferentiated lump. Models will rarely cite from an undifferentiated lump.
None of these are mysterious. They're concrete, technical, and easy to fix in an afternoon. The hard part is knowing they exist.
Agents are coming next
Everything above describes how AI engines cite your site. The next wave is agents — AI systems that take actions on behalf of a user. "Book me a table at a restaurant in Soho with vegetarian options for tomorrow night." "Find me a developer portfolio template, and add this color palette to it." "Compare these three insurance policies and tell me which one to buy."
For these tasks, agents don't read your homepage. They look for /.well-known/agent-skills/index.json — a machine-readable description of what your site can do — and decide whether to interact with you based on what's there. Most sites have nothing at that URL. The few that do are about to become a lot more useful, and a lot more visited.
Beyond agent-skills there's MCP (Model Context Protocol — Anthropic's spec for connecting agents to tools), API catalogs (RFC 9727), and OAuth metadata (RFC 8414, 9728) for authenticated agent access. None of these matter for human visitors. All of them matter for the next decade of automated traffic.
So what do you do
The honest answer is: get a baseline of where your site is today, then fix what's broken in priority order.
That's the job we built AISEOLab for. Run a free scan and you'll see, in about thirty seconds, exactly what AI engines and agents see when they look at your site — what's there, what's missing, what's broken. Generate the missing files with one click. Monitor for regressions every day. The whole thing is free for one site, forever.
We're not the only way to do this. You can hand-write /llms.txt and a tightened robots.txt yourself in a couple of hours. You can validate your Schema.org JSON-LD at validator.schema.org. You can read the MCP spec and ship a server card. We'd rather you did any of those things than do nothing — the second internet is here whether you're ready for it or not.
But we built this to make the work boring and obvious instead of intimidating. That's our whole pitch. Nothing mystical. No trust-us-bro. Just here's what's missing, here's what to do about it, here's the file.
If that sounds useful, scan a site. If you'd like to talk to us, hello@aiseolab.ai. We answer every email.