We Don't Just Design for Humans Anymore

The moment that started it

htmlctl is, at its core, a tool for deploying static sites with precision. Atomic releases. Byte-identical promotion. Deterministic HTML rendering. All the things that make a site reliable once it is live. What I had not thought enough about was what happens before a human visits the URL — in the half-second when a crawler, a social card renderer, or an AI indexer decides what your site is.

I pasted an htmlctl link into a tweet draft. The preview loaded. Blank rectangle. No title, no description, no image. The kind of result that makes you look like you do not care about your own product, even when you have spent weeks getting the deployment pipeline exactly right.

The fix for the tweet was ten minutes of YAML. But it made me realize that making it effortless — not just possible — required building something properly. Marketing should not be an afterthought glued onto deployment. It should be part of the same declarative model you use for everything else.

Three audiences your site already has

When you publish a site, you instinctively think about the human reader: the layout, the typography, the load time. What you may not immediately think about is that your URL is processed by at least two other audiences before most humans ever see it.

Who reads your site before your users do

Social media link renderers — Twitter, Slack, iMessage, LinkedIn — all fetch your page the moment someone pastes your URL. They look for og:title, og:image, twitter:card. If those are missing, you get a blank box instead of a rich preview. The first impression many people have of your content happens in that preview, before they click.

Search engines are doing something similar but at scale, continuously: following robots.txt to understand what you want indexed, reading sitemap.xml to discover every page you publish, checking canonical URLs to understand the authoritative version of each piece of content.

And now there is a third category: AI crawlers, RAG indexers, and agent-driven browsing tools. These are increasing in volume and in consequence. They also respect robots.txt. They also use canonical metadata to understand what they are reading. They are not going away.

"We don't just design for humans anymore. Every URL is a handshake with a machine before it reaches a person."

What we shipped

The goal was simple: make full discoverability the default, not a configuration project. Four features landed in quick succession.

Social Previews

Automatic OG Image Generation

1200×630 PNG cards generated at build time for every page. Served at /og/<pagename>.png. Auto-injected into og:image and twitter:image when a canonical URL is set.

Branding

Declarative Favicon Support

Favicons declared in website.yaml, source files in branding/. Materialized to /favicon.svg, /favicon.ico, /apple-touch-icon.png — part of desired state, tracked by diff.

Crawl Control

robots.txt Generation

Typed YAML config in website.yaml. Default: allow-all. Ordered group-based policy for fine-grained control. Generated at build time, promoted byte-identical.

Discovery

sitemap.xml Generation

Auto-generated from all declared pages. Respects robots: noindex. Uses per-page canonical when available, falls back to publicBaseURL + route. Appends Sitemap: to robots.txt automatically.

Zero config for eighty percent of sites

The design philosophy was: automation by default, control when you need it. For the common case — a public site where everything should be indexed and previewed — the entire discoverability stack activates with about fifteen lines of YAML.

website.yaml — full discoverability stack

spec:
  defaultStyleBundle: default
  head:
    icons:
      svg: branding/favicon.svg
      ico: branding/favicon.ico
      appleTouch: branding/apple-touch-icon.png
  seo:
    publicBaseURL: https://example.com
    robots:
      enabled: true     # generates /robots.txt with allow-all default
    sitemap:
      enabled: true     # generates /sitemap.xml + Sitemap: line in robots.txt

That is it. After htmlctl apply, the server generates /robots.txt, /sitemap.xml (with every crawlable page listed), /favicon.svg, /favicon.ico, and an OG card PNG for each page. Every page that has a canonical URL gets og:image and twitter:image injected automatically.

If you need more control — blocking specific paths from crawlers, excluding a page from the sitemap with robots: noindex, or providing your own OG image for a specific page — the primitives are all there. Control is available; it is just not required.

The OG image pipeline

Social preview cards are the most immediately visible win. The server renders deterministic 1200×630 PNGs at build time using pure Go — no headless browser, no Puppeteer, no external service. Title, description, and site name are composed onto a dark-glass card with embedded fonts and served at /og/<pagename>.png.

The cache key is a hash of the card content. Identical input means an identical cache hit — no re-render on subsequent builds when metadata has not changed. Changing the title invalidates the cache for that page only. It is fast, deterministic, and runs entirely inside the same release build pipeline that already handles HTML rendering and asset copying.

Everything generated at build time

This is the architectural constraint that matters most. Every one of these artifacts — the OG PNGs, robots.txt, sitemap.xml, favicon files — is generated during htmlctl apply, at release build time. When you run htmlctl promote --from staging --to prod, those artifacts are copied byte-identical. No rebuild. No environment-specific rewriting.

This means the social preview you test on staging is precisely what production will serve. The robots.txt you inspect on staging is exactly what the Google bot will fetch. There is no hidden difference between environments, no surprise when you promote.

The one thing to be deliberate about: publicBaseURL and canonical URLs should reflect your production domain even in staging configs, because sitemap URLs derive from them and promoting carries those values unchanged. The server will warn you if it detects staging-host canonicals during a promote to production.

SEO in the age of AI is not optional

There is a version of this conversation where someone argues that SEO is old thinking — that AI search will replace traditional crawl-and-index. I think that argument misses what is actually happening. AI search still depends on crawl and index. The training data that powers large language models comes largely from the web. The retrieval pipelines that augment AI assistants at query time fetch from the same canonical web infrastructure that search engines have used for decades.

robots.txt is how you tell automated systems — whether they are Googlebot or an LLM training crawler — what you want indexed. sitemap.xml is how you proactively surface your content graph to any automated agent that will listen. Canonical URLs are how you tell the web that this page, not any staging or preview version, is the authoritative one. These are not legacy formats. They are the universal protocol layer for machine-readable discoverability.

If you are building a site with htmlctl today, you are presumably building it because you care about the content you publish. Getting that content in front of humans, search engines, and AI systems that can recommend it, cite it, or summarize it — all of that starts with the same fifteen lines of YAML. The automation handles the rest.

If you have an htmlctl site in production, add spec.seo.publicBaseURL to your website.yaml, enable robots and sitemap, drop your favicon files in branding/, and re-apply. Your next tweet will have a preview card. Your next page will appear in the sitemap. The machines will know you exist.