The moment that started it
htmlctl is, at its core, a tool for deploying static sites with precision. Atomic releases. Byte-identical promotion. Deterministic HTML rendering. All the things that make a site reliable once it is live. What I had not thought enough about was what happens before a human visits the URL — in the half-second when a crawler, a social card renderer, or an AI indexer decides what your site is.
I pasted an htmlctl link into a tweet draft. The preview loaded. Blank rectangle. No title, no description, no image. The kind of result that makes you look like you do not care about your own product, even when you have spent weeks getting the deployment pipeline exactly right.
The fix for the tweet was ten minutes of YAML. But it made me realize that making it effortless — not just possible — required building something properly. Marketing should not be an afterthought glued onto deployment. It should be part of the same declarative model you use for everything else.
Three audiences your site already has
When you publish a site, you instinctively think about the human reader: the layout, the typography, the load time. What you may not immediately think about is that your URL is processed by at least two other audiences before most humans ever see it.
Who reads your site before your users do
Social media link renderers — Twitter, Slack, iMessage, LinkedIn — all fetch your page the moment someone pastes your URL. They look for og:title, og:image, twitter:card. If those are missing, you get a blank box instead of a rich preview. The first impression many people have of your content happens in that preview, before they click.
Search engines are doing something similar but at scale, continuously: following robots.txt to understand what you want indexed, reading sitemap.xml to discover every page you publish, checking canonical URLs to understand the authoritative version of each piece of content.
And now there is a third category: AI crawlers, RAG indexers, and agent-driven browsing tools. These are increasing in volume and in consequence. They also respect robots.txt. They also use canonical metadata to understand what they are reading. They are not going away.
"We don't just design for humans anymore. Every URL is a handshake with a machine before it reaches a person."
What we shipped
The goal was simple: make full discoverability the default, not a configuration project. Four features landed in quick succession.
Social Previews
Automatic OG Image Generation
1200×630 PNG cards generated at build time for every page. Served at /og/<pagename>.png. Auto-injected into og:image and twitter:image when a canonical URL is set.
Branding
Declarative Favicon Support
Favicons declared in website.yaml, source files in branding/. Materialized to /favicon.svg, /favicon.ico, /apple-touch-icon.png — part of desired state, tracked by diff.
Crawl Control
robots.txt Generation
Typed YAML config in website.yaml. Default: allow-all. Ordered group-based policy for fine-grained control. Generated at build time, promoted byte-identical.
Discovery
sitemap.xml Generation
Auto-generated from all declared pages. Respects robots: noindex. Uses per-page canonical when available, falls back to publicBaseURL + route. Appends Sitemap: to robots.txt automatically.
Zero config for eighty percent of sites
The design philosophy was: automation by default, control when you need it. For the common case — a public site where everything should be indexed and previewed — the entire discoverability stack activates with about fifteen lines of YAML.
website.yaml — full discoverability stack
spec: defaultStyleBundle: default head: icons: svg: branding/favicon.svg ico: branding/favicon.ico appleTouch: branding/apple-touch-icon.png seo: publicBaseURL: https://example.com robots: enabled: true # generates /robots.txt with allow-all default sitemap: enabled: true # generates /sitemap.xml + Sitemap: line in robots.txt
That is it. After htmlctl apply, the server generates /robots.txt, /sitemap.xml (with every crawlable page listed), /favicon.svg, /favicon.ico, and an OG card PNG for each page. Every page that has a canonical URL gets og:image and twitter:image injected automatically.
If you need more control — blocking specific paths from crawlers, excluding a page from the sitemap with robots: noindex, or providing your own OG image for a specific page — the primitives are all there. Control is available; it is just not required.
The OG image pipeline
Social preview cards are the most immediately visible win. The server renders deterministic 1200×630 PNGs at build time using pure Go — no headless browser, no Puppeteer, no external service. Title, description, and site name are composed onto a dark-glass card with embedded fonts and served at /og/<pagename>.png.
The cache key is a hash of the card content. Identical input means an identical cache hit — no re-render on subsequent builds when metadata has not changed. Changing the title invalidates the cache for that page only. It is fast, deterministic, and runs entirely inside the same release build pipeline that already handles HTML rendering and asset copying.
Everything generated at build time
This is the architectural constraint that matters most. Every one of these artifacts — the OG PNGs, robots.txt, sitemap.xml, favicon files — is generated during htmlctl apply, at release build time. When you run htmlctl promote --from staging --to prod, those artifacts are copied byte-identical. No rebuild. No environment-specific rewriting.
This means the social preview you test on staging is precisely what production will serve. The robots.txt you inspect on staging is exactly what the Google bot will fetch. There is no hidden difference between environments, no surprise when you promote.
The one thing to be deliberate about: publicBaseURL and canonical URLs should reflect your production domain even in staging configs, because sitemap URLs derive from them and promoting carries those values unchanged. The server will warn you if it detects staging-host canonicals during a promote to production.
SEO in the age of AI is not optional
There is a version of this conversation where someone argues that SEO is old thinking — that AI search will replace traditional crawl-and-index. I think that argument misses what is actually happening. AI search still depends on crawl and index. The training data that powers large language models comes largely from the web. The retrieval pipelines that augment AI assistants at query time fetch from the same canonical web infrastructure that search engines have used for decades.
robots.txt is how you tell automated systems — whether they are Googlebot or an LLM training crawler — what you want indexed. sitemap.xml is how you proactively surface your content graph to any automated agent that will listen. Canonical URLs are how you tell the web that this page, not any staging or preview version, is the authoritative one. These are not legacy formats. They are the universal protocol layer for machine-readable discoverability.
If you are building a site with htmlctl today, you are presumably building it because you care about the content you publish. Getting that content in front of humans, search engines, and AI systems that can recommend it, cite it, or summarize it — all of that starts with the same fifteen lines of YAML. The automation handles the rest.
If you have an htmlctl site in production, add spec.seo.publicBaseURL to your website.yaml, enable robots and sitemap, drop your favicon files in branding/, and re-apply. Your next tweet will have a preview card. Your next page will appear in the sitemap. The machines will know you exist.