How to automate SEO and AEO with Claude

Technical SEO checklist (AI-assisted)

Technical SEO is the work of making a site crawlable, indexable and renderable so search engines and AI answer engines can actually read it, the foundation every ranking sits on. This guide covers the core checks (crawl budget, index coverage, rendering, robots, sitemaps and status codes) and how an AI agent runs them in minutes instead of a manual audit.

What is technical SEO and why does it matter for AI search?

Technical SEO is the work of making sure a search engine, and now an AI answer engine, can reach a page, store it, and read what is on it. It sits underneath the parts of SEO most people think about. Content SEO decides what you say and on-page SEO decides how you say it, but technical SEO decides the prior question that makes both of them count: can the machine actually get to the page and see it at all. A brilliant article that the crawler cannot reach, or that an engine refuses to index, or whose words appear only after a script runs, ranks for nothing. Technical SEO is the plumbing, and when the plumbing leaks, every other effort drains away with it.

It helps to picture three gates a page has to pass through before it can ever rank. The first is crawl: a bot has to be allowed to fetch the URL and has to be able to find it through a link or a sitemap. The second is index: once fetched, the engine has to decide the page is worth storing in its database and considering for queries. The third is render: the engine has to see the actual content, which is not guaranteed when that content is painted by JavaScript after the initial HTML arrives. Miss any one gate and the page is invisible no matter how good it is. Most of what falls under technical SEO is simply making sure none of those three gates is accidentally shut.

This matters more than ever for AI search, not less. An AI answer engine cannot cite a page it never fetched, never indexed, or could not read because the text was locked behind a script. The large language models behind tools like ChatGPT, Perplexity and Google’s AI Overviews are fed by crawlers, and those crawlers obey the same robots rules, follow the same links and hit the same status codes as a classic search bot. If anything they are less forgiving: several of them do not execute JavaScript at all, so a page that renders its content client-side can be perfectly visible to a human and completely blank to the model. Technical health is the price of admission to being quoted by an AI answer, and a site that skips it is invisible to the very systems everyone is now racing to appear in.

One distinction prevents a common confusion. Technical SEO is not the same as page speed. Core Web Vitals, the performance side of the picture covered in its own guide on loading, interactivity and visual stability, measure how fast and steady a page feels once it loads. Technical SEO here is the earlier question of whether the page can be reached and read at all. A page can be lightning fast and still be deindexed by a stray directive, and a slow page can rank fine if everything else is sound. The two reinforce each other, but they answer different questions, and fixing one does nothing for the other.

Which technical checks have the biggest ranking impact?

Most technical SEO problems trace back to a short list of high-impact issues, and a handful of them can sink an entire site overnight while costing almost nothing to fix. The honest way to prioritise is by blast radius: start with the checks that can deindex pages wholesale, then work down to the ones that quietly erode performance over time.

At the top of the list sits accidental blocking, because it is the cheapest mistake to make and the most expensive to ignore. A single Disallow: / in robots.txt tells every crawler to stay out of the whole site, and a stray noindex directive in a meta tag or HTTP header tells engines to drop a page they have already fetched. These two are different tools that people constantly confuse: robots.txt controls crawling (whether the bot may fetch the URL) while the robots meta tag controls indexing (whether a fetched page may be stored). The dangerous combination is blocking a page in robots.txt while also trying to noindex it, because if the bot cannot crawl the page it never sees the noindex, and the URL can linger in the index as a bare link. The first thing any audit checks is that nothing important is blocked and nothing important is noindexed, because no other optimisation matters if the page is shut out at the door.

Canonicalisation is the next tier. A canonical tag tells engines which version of near-duplicate pages is the real one, consolidating the ranking signals of all the copies onto a single chosen URL. Sites generate duplicates constantly without meaning to: the same page reachable with and without a trailing slash, with and without www, over HTTP and HTTPS, or with tracking parameters stuck on the end. When canonicals are missing or point to the wrong URL, the engine has to guess, and it often guesses badly, splitting authority across copies or indexing the parameter-laden version instead of the clean one. Getting canonicals right is how you stop competing with yourself.

Status codes and redirects form the third tier, and they are where crawl budget quietly bleeds. Every URL should return the right HTTP status: 200 for a live page, 301 for a permanent move, 404 or 410 for something genuinely gone. The traps are subtle. Redirect chains, where A points to B which points to C, waste crawl and dilute the signal that should pass straight through. Redirect loops break the page entirely. And the most insidious of all is the soft 404, a page with no real content that still returns a 200 status, so the engine keeps crawling and indexing emptiness instead of treating it as gone. For multilingual sites there is a fourth check, hreflang, which tells engines that the English and Turkish versions of a page are the same content for different audiences rather than duplicates competing with each other, and broken or non-reciprocal hreflang sends the wrong language to the wrong searcher. None of these is glamorous, but each is a leak, and a site that plugs them keeps every drop of authority it has earned.

How do crawl budget and index coverage actually work?

Crawl budget is the number of URLs a search engine is willing to fetch from your site in a given window, and it is shaped by two forces. The first is crawl rate, how fast the engine can fetch without overloading your server, which a slow or error-prone site lowers on its own. The second is crawl demand, how much the engine actually wants your pages, which rises with popularity and freshness and falls for stale or low-value URLs. For a small site of a few hundred pages, crawl budget is rarely the bottleneck, and worrying about it is usually wasted effort. For a large site of tens or hundreds of thousands of URLs it becomes decisive, because if the bot spends its budget on junk it never reaches the pages that matter.

This is where the soft 404s, redirect chains and infinite parameter URLs from the previous section turn into a real cost rather than a tidiness issue. A faceted navigation that generates a unique URL for every combination of filters can spawn millions of near-identical pages, and a crawler let loose on that maze burns its entire budget wandering combinations no one searches for, while your new product pages sit undiscovered. The fix is to stop the engine wasting itself: block the infinite spaces, return honest 404s for empty pages, collapse redirect chains, and keep the sitemap pointed only at the canonical URLs you actually want indexed. The sitemap’s job is narrow and often misunderstood. It is a discovery aid that suggests which URLs exist and when they last changed, not a ranking lever, and stuffing it with low-value or non-canonical URLs teaches the engine to trust it less.

Index coverage is the other half of the equation, and it is where crawling turns into actual presence. Being crawled does not guarantee being indexed. An engine fetches far more URLs than it keeps, and a page can fall into one of several limbo states: discovered but not yet crawled, crawled but not indexed because the engine judged it thin or duplicative, or indexed but barely shown. The gap between the URLs you have published and the URLs an engine has actually stored is the single most revealing number in technical SEO, because it tells you how much of your site is doing nothing. Search Console’s page indexing report names each state explicitly, so the work is to read that report, understand why each excluded page was excluded, and either fix the cause or accept that the page should not be indexed.

AI search adds a layer that did not exist a few years ago: a fleet of new crawlers with their own budgets and their own permissions. GPTBot, ClaudeBot, PerplexityBot, Google-Extended and others fetch the web to feed answer engines, and each one is governed separately in your robots.txt. This creates a decision most sites have never consciously made. Block these crawlers and you protect your content from being used to train or answer, but you also remove yourself from the AI answers that increasingly sit above the classic results. Allow them and you become eligible to be cited, at the cost of your content feeding those systems. There is no universally right answer, but there is a wrong way to arrive at one, which is to block them by accident with an over-broad rule and never notice you have vanished from the place your audience is starting to look.

How does an AI agent run a technical SEO audit?

A technical audit is exhaustive, rule-bound and deeply repetitive, which is precisely the kind of work an AI agent does better than a person, and that is the forgehouse angle. A human auditing a large site clicks through URLs, copies status codes into a spreadsheet, squints at robots.txt, and runs out of patience long before the long tail. An agent does not get bored. It can crawl every URL on the site, record each one’s HTTP status, read its robots directives and canonical tag, check whether it appears in the sitemap, and build the whole picture in memory in a single pass.

The order of operations mirrors the three gates. First the agent walks the crawl: starting from the homepage, following internal links and the sitemap, it maps which URLs exist and which are reachable, surfacing orphan pages that no link points to and pages buried too many clicks deep. This is the same link-graph walk that powers the internal linking and topic-cluster audit, which is why the two checks belong in the same pass. Next it checks the index gate, comparing the URLs it found against what an engine has actually stored. The Search Console API exposes the page indexing report and per-URL inspection programmatically, so the agent can pull which pages Google has discovered, crawled and indexed straight from Search Console and flag every URL that is published but missing, or indexed but should not be. Finally it checks the render gate by fetching each page’s raw HTML and comparing it against the rendered DOM, catching content that appears only after JavaScript runs.

What comes back is not a wall of green and red but a ranked, deduplicated fix list: these pages are accidentally noindexed, these return soft 404s, these have canonical tags pointing at the wrong URL, these redirect through three hops, this whole section is invisible without JavaScript. Each finding carries its severity and its fix, so the work is prioritised before a human ever looks at it. A page-by-page manual audit that would take days collapses into a pass measured in minutes, and because the agent runs the same checks every time, a regression shows up the moment it appears rather than months later when traffic has already dropped. If you would rather not run this crawl-and-fix audit across your whole site every month, Vorkaz can run the technical checks for you and fix what it finds.

How do you fix render and JavaScript SEO issues?

The render gate is where the most expensive and least obvious technical problems live, because they are invisible to the human looking at the page. JavaScript SEO is the discipline of making sure an engine sees the same content a visitor does, and the reason it is hard is that engines render in two waves. A crawler fetches the raw HTML first and indexes whatever is in it immediately. Then, if it has spare resources, it queues the page for a rendering service that executes the JavaScript and sees the final result, but that second wave can lag and is never guaranteed. If your content exists only after the script runs, you are betting your ranking on a render pass that may be delayed or skipped, which is a far bigger gamble than it looks.

The mechanism behind the risk is straightforward once you name it. A client-side rendered (CSR) page ships a near-empty HTML shell and builds the content in the browser with JavaScript, so the first wave sees almost nothing. A server-side rendered (SSR) or statically generated (SSG) page ships the full content in the initial HTML, so the first wave sees everything and the render gate is never a question. This is why the fix for almost every JavaScript SEO problem is the same: get the content that matters into the HTML the server sends, through SSR, static generation or pre-rendering, so the engine never has to wait for a script to see your words. The way to test whether you have a problem is to look at the raw HTML directly rather than the rendered page, by viewing source or fetching the URL, and checking whether your headings and body text are actually there. Search Console’s URL inspection tool shows the crawled HTML and the rendered version side by side, which settles the question definitively.

The same principle catches a family of related traps. Content hidden behind an interaction that loads it on demand, like a tab whose text is fetched only when clicked, is invisible to a crawler that never clicks; content already present in the DOM but visually hidden behind a tab or accordion is fine, because it is in the HTML. Infinite scroll that loads more items as you scroll, with no paginated URLs behind it, hides everything past the first screen from a bot that does not scroll. Lazy-loaded images and sections need a fallback that exposes them to the crawler. The throughline is simple: anything that depends on a browser behaviour the bot does not perform, a click, a scroll, a script, is a risk, and the safe default is to make the important content present in the initial HTML without requiring any of them.

For AI search this is not a tiebreaker but a hard wall. Where Google at least attempts a second render wave, many of the crawlers feeding AI answer engines do not execute JavaScript at all, so a client-side rendered page is not merely slow to be seen by them, it is entirely blank. The same SSR or SSG fix that protects classic indexing is what makes your content readable to AI answer engines in the first place, which is one more reason the two goals point in the same direction rather than competing. This whole render-crawl-index discipline is one layer of the full AI SEO automation workflow, and the crawl-and-check connector that runs it is exactly what the SEO & AEO Pro Kit packages as a repeatable audit you can point at any site.

← How to automate SEO and AEO with Claude