Friday, May 29, 2026

We Don't Have Hooves: AI Logos, Cognitive Shedding, and the Exoskeleton Economy

I spent a weekend writing a small Rust tool that composites AI company logos onto a Redbubble full-print t-shirt canvas. Not because anyone asked. Halfway through tuning the layout I noticed something faintly absurd: every logo I was tiling looked like a tyre tread. OpenAI’s swirl is a directional rain pattern. Gemini’s four-pointed star is a Bridgestone Blizzak winter sipe. Anthropic’s ringed A is a sidewall stamp. Nvidia’s eye is a hub cap. Lay them next to each other on a 7000-pixel canvas and the whole thing reads less like an industry portrait and more like a Beaurepaires catalogue page.

Logos From Loops

The visual grammar of AI companies has converged on a small family of forms: radial symmetry, interlocking arcs, gradients from cool to warm, the occasional offset chevron. They are tread patterns. They are tread patterns because tread patterns are what you draw when you want to suggest motion, grip, and continuous contact with a surface you are moving over – which is exactly the brand promise of a model that turns prompts into outputs without you needing to know how the road underneath works.

The tool generates 21 of these tread-pattern logos – the leaders, challengers, niche players, and visionaries from the April 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, plus the dozen companies tracked by isaiprofitable.com – and tiles them across the canvas under four layouts: a jittered grid, a hex pack, an arabesque mask built from Lissajous curves under a wallpaper symmetry group, and a Voronoi packer that wraps cleanly at the edges. The arabesque mode is the one that looks most like a real design, which is fitting because the brand identities themselves are arabesques of the same underlying motif.

Nothing in the code is novel. What is striking is how little code it takes to produce something that passes for a branding agency deliverable. Spend a few minutes tweaking curve density and symmetry group and you can cover most of the Fortune 500’s recent AI rebrand portfolio without engaging a designer. None of it requires aesthetic judgement. All of it requires a loop.

That is worth sitting with before moving on to the money.

The Gartner Thermometer and the Sequoia Deficit

The 2024 and 2025 Gartner Hype Cycles placed generative AI at or near the Peak of Inflated Expectations, the top of the curve before the long slide into the Trough of Disillusionment. That topological framing is rough but useful. More arresting is the arithmetic underneath it.

Sequoia Capital published a piece in mid-2024 estimating that AI infrastructure – primarily Nvidia GPU clusters – required roughly $600 billion in annual revenue to justify the capital being committed, while observable AI-attributable revenue across the industry was a fraction of that figure. The gap has a name in finance: “productive deployment lag.” In plain language: the shovels are being sold, the mines are being dug, and the gold has not appeared at the rate the tunnel length implies it should. On isaiprofitable.com, the only company with a green checkmark next to its name is Nvidia, the one selling the shovels.

The 21-tread-pattern t-shirt is a small monument to this gap. Nvidia at the hub, eleven others on the sidewall burning through capital at rates that make the GPU revenue arithmetic interesting, one wallpaper symmetry group holding them all together on a cotton substrate.

The more interesting question is not whether AI is profitable yet. It is why AI is already affecting how we think, even before the P&L sheets catch up.

The Fur Analogy and Cognitive Shedding

Around 1.5 million years ago, Homo erectus started losing body hair. The proximate cause was almost certainly thermoregulation: an upright, running hunter on the African savannah generates core heat faster than a fur coat can dissipate it under midday sun. So the fur went. But heat still needed managing. The solution was clothing, an external thermal regulation artifact that did the job the body had previously handled internally.

This is a pattern that runs through the entire history of human technology. We replace innate biological functions with external artifacts. We then build manufacturing chains around those artifacts. We build economic systems around those chains. The original biological capacity atrophies from disuse or is simply never developed in the first place, because the artifact is already there when the individual arrives.

We do not have hooves. We have shoes. We do not have gills. We have submarines. We do not have the magnetoreceptors that birds use on migration routes. We have GPS. In each case the external artifact is not merely a substitute; it is a platform. It invites elaboration. The shoe becomes the boot, the boot becomes the orthotic, the orthotic becomes a running shoe industry, and the running shoe industry eventually funds the biomechanics research that feeds back into the next generation of the artifact.

I spent six months in late 2022 and early 2023 as an Enterprise Architect contractor at Bridgestone Australia in Adelaide, wiring up the interfaces between SAP, Salesforce, and the mobile mechanic booking platforms that schedule a van to a customer’s driveway to swap a set of tyres. The work was unglamorous integration plumbing – canonical data models, idempotent message contracts, the usual distributed-systems hygiene – but it gave me a tourist’s window into how a tyre and car parts business actually operates. Every node in the system was eventually a physical thing: a passenger radial sitting on a pallet in a Wingfield warehouse, a wheel alignment slot at a Beaurepaires bay, a truck retread coming back from its second life on a B-double. The software was a thin coordination layer over a very heavy industrial substrate.

The history of the company that owns that substrate is the part that stuck with me. Bridgestone – ブリヂストン – started in 1931 not as a tyre company but as a spinoff of a 久留米 (Kurume), 九州 (Kyushu) maker of 足袋 (tabi) split-toe footwear. The founder, 石橋正二郎 (Shojiro Ishibashi), had been adding rubber soles to cotton tabi since 1923 under the Asahi brand, and rubber soles turned out to be the gateway material to rubber tyres. The name Bridgestone is a literal English inversion of his surname: 石 (ishi, stone) and 橋 (hashi, bridge), reversed and translated, picked partly because it sounded export-ready and partly to echo Firestone, the American tyre company he admired. (Bridgestone would eventually acquire Firestone in 1988 for $2.6 billion, which is the kind of historical loop that only happens in long-lived industrial companies.) The katakana spelling ブリヂストン preserves an archaic ヂ where modern Japanese would write ジ, a piece of pre-war orthography the brand keeps the way Mitsubishi keeps its three diamonds.

What I took away from those six months is that Bridgestone is not really a tyre conglomerate that used to make shoes. It is a rubber compounding and supply-chain organisation that noticed rubber was useful for several different external human exoskeletons – first feet, then cars, then conveyor belts, hoses, seismic isolators under buildings, and now sustainably-sourced guayule and dandelion-derived natural rubber to hedge against Hevea brasiliensis geopolitics. The material is the through-line. The product is whichever exoskeleton the century happens to need.

I always jokingly refer to BridgeStone as - “Big Rubber”.

The Car Was Always an Exoskeleton

The automotive vocabulary has colonised AI discourse so completely it has become invisible. We “steer” language models. We apply “guardrails.” We talk about “autopilot” and “co-pilot” and “lanes” and “on-ramps.” There is a vehicle for sale right now whose entire value proposition is that the steering wheel becomes optional once the AI is mature enough. Almost nobody stops to ask why this particular metaphor landed so naturally.

Here is one answer: cars and AI are both exoskeletons.

A car is not a vehicle in the narrow sense. It is a kinetic exoskeleton that extends the range, speed, load-bearing capacity, and weather tolerance of a soft-bodied primate who would otherwise top out at 5 km/h on a good day. The car does not move instead of the human; it extends the human’s movement by several orders of magnitude at the cost of an elaborate support infrastructure: roads, fuel networks, insurance markets, traffic law, emissions regulation, and, yes, tyre companies.

AI is doing the same thing to cognition. The language model does not think instead of the human; it extends the human’s language-generation and pattern-retrieval capacity by several orders of magnitude at the cost of an equally elaborate support infrastructure: GPU clusters, training pipelines, safety teams, regulatory frameworks, and eventually a Gartner hype cycle or two. The car vocabulary arrived so naturally because the thing being described was already familiar. We built the kinetic exoskeleton first. The cognitive one came later, but the shape of the dependency relationship – human plus artifact plus infrastructure – is identical.

What Gets Shed

The cognitive functions most visibly affected are the ones that involve generation under uncertainty: drafting a first sentence, naming a variable, composing an email, sketching a logo. These are precisely the tasks that feel slow and effortful without assistance, and precisely the tasks that AI handles smoothly. The friction is real. The assistance is real. The atrophy risk is also real.

Musicians who practise scales develop neural pathways that do not form if the scales are skipped. Writers who struggle through first drafts develop a tolerance for ambiguity that does not develop if a model always resolves it for them. The logos from the code above look entirely plausible. They look like every other AI logo in the current design vocabulary. They were generated without any cognitive friction whatsoever, which is both the point and the problem. The friction was where the design decision lived.

This does not mean AI assistance is bad, any more than wearing shoes is bad. It means the physiotherapy question matters: are you also doing the cognitive equivalent of the exercises that keep the underlying capacity from going the way of the body hair or your quads since you have spent 2 days chasing the zebra you speared and are waiting to run out of steam and die.

The Shoe Company Principle

Ishibashi’s insight in 1931 was not that feet needed covering; the Asahi tabi side of the business already had that locked up. His insight was that rubber was a platform material for exoskeletal extension, and that the same compounding science, vulcanisation know-how, and distribution muscle that made a good rubber-soled tabi could make a good passenger tyre. He followed the material from one form of human locomotion extension to the next, and a hundred years later that bet is still paying out across tyres, conveyor belts, and seismic isolators.

The AI companies following the GPU stack from language generation to reasoning to planning to embodied robotics are doing exactly this. The H100 is the rubber: a platform material whose applications are not exhausted by the first product built on it. The chat interfaces, the co-pilots, the logo generators – these are the 足袋. What comes next on that manufacturing chain – the 自動車 (jidousha, automobile) tyre equivalent for cognition – is the part worth watching, and worth surviving the trough of disillusionment for.

Whether the current crop of AI businesses is profitable by the time the next Gartner report drops is the wrong question. Whether the infrastructure being built can outlast the hype cycle long enough to produce the genuine platform shift is the question Ishibashi would have asked, probably while measuring rubber tensile strength in a factory that used to make footwear. Japanese has a phrase for that patient industrial cultivation: 改善 (kaizen, continuous improvement). It is not the same word as ハイプ (haipu, hype), and the two have rarely been observed in the same building.

The last shoe company that asked the wrong question about its own product is still making pneumatic rubber cylinders for the machines that replaced the horses we no longer needed to breed. Every AI logo on the t-shirt is a tread pattern drawn by people who do not yet know which century of exoskeleton they are designing for. That beats a PowerPoint ecosystem built around the profitability question, and it is the bet I would rather be inside of when the trough arrives.


The code is at ai_logos_shirt – a small Rust crate, MIT licensed, build instructions in the README. Ping me if you have strong opinions about cognitive atrophy, the Sequoia revenue gap, or split-toe rubber footwear.

Saturday, May 23, 2026

From Digital Atlas to Gazet: Building a Geospatial Agent Stack for Enterprise AI

The same thought keeps coming back whenever I look at geospatial AI: too much useful work is still trapped in slow, manual interfaces.

Someone opens a portal, clicks through search forms, downloads files, copies a bounding box, jumps to a geocoder, then checks an internal approvals tool. None of this is mysterious; it is just fragmented.

That is why MCP matters here. Not because every problem needs another chatbot, but because a composed agent stack can turn UI choreography into a question, a traceable tool chain, and a defensible answer.

The shape of that stack is getting clearer:

  1. A discovery layer for authoritative datasets and services.
  2. A place-resolution layer for turning ambiguous human geography into valid geometry.
  3. A retrieval and processing layer for actual data operations.
  4. A governance layer so this can survive contact with enterprise reality.
  5. An orchestration layer that routes work to the right model and the right tool at the right time.

Themes carried from the earlier LinkedIn post

The LinkedIn thread that led to this draft had four recurring ideas, and they still hold:

  1. Most geospatial AI pain is not in model quality. It is in UI choreography.
  2. Narrow components with explicit contracts beat one giant, vague “do-everything” agent.
  3. Portals are temporary. Durable value sits in services, workflows, and evidence trails.
  4. The end state is not one-shot answers. It is persistent monitoring plus governed recomputation.

This post extends those ideas with concrete stack boundaries, standards, and orchestration patterns.

Digital Atlas of Australia as the discovery layer

In daa_mcp, I wrapped the Digital Atlas of Australia search API as a Go MCP server. The target is the Digital Atlas endpoint at https://digital.atlas.gov.au/api/search/v1, with a very narrow tool surface:

  • atlas_list_collections
  • atlas_search_items
  • atlas_answer_query

That narrowness is deliberate. Discovery tools should be boring, explicit, and provenance-heavy. If an agent cannot reliably name the dataset, map, or document it found, and where it came from, it should not improvise higher-order spatial reasoning.

This is the first important pattern for enterprise AI engineering: do not ask the model to remember the catalog. Ask the catalog.

In practice, a user should be able to ask “what hazard datasets are relevant around this place?” and get source URLs, approximate bounds, and the inventory used to produce the answer. That alone is a major improvement over portal-clicking.

DevSeed’s Gazet is the missing place-resolution layer

The more interesting development lately is not another catalog wrapper. It is the rise of narrower geospatial AI components that do one thing well.

Development Seed’s Gazet is a very good example. It is a lightweight natural-language geocoder and GIS operations system over Overture and Natural Earth parquet datasets. The point is not to sound impressive. The point is to turn a phrase like “the northern half of India” or “districts along the Ganges” into valid geometry and spatial SQL.

That matters because geospatial systems usually break before the raster math starts. They break at the human-language boundary: fuzzy place names, overlapping regions, ambiguous jurisdictions, rivers that matter politically but not administratively, borders people describe one way and datasets encode another.

DevSeed’s public write-up on Gazet is refreshing because it avoids frontier-model mysticism. They fine-tuned a Qwen 3.5 0.8B model, generated around 70k validated training pairs, and built a system that runs on CPU, packages into a single container, and emits valid geometry. The model card shows an 812 MB Q8 GGUF artifact. That is software engineering.

The surrounding design is exactly the structure geospatial AI needs:

  • fuzzy candidate search against Overture divisions and Natural Earth
  • DuckDB-backed in-memory spatial SQL
  • GeoJSON output rather than vague textual summaries
  • a path toward execution-feedback loops and CodeAct-style tool use

Gazet is not an MCP server today, but it is exactly the kind of component that should sit behind one. A Digital Atlas MCP answers “what data exists?”. A Gazet-style place tool answers “what geometry did the user mean?”. Keeping those jobs separate reduces hallucination at both layers.

Skills and MCPs are complementary, not interchangeable

Development Seed’s public skills repository reinforces the same architectural idea from another angle. Skills package workflow knowledge and output structure; MCP servers package live tool access.

That distinction matters. A skill can tell an agent how to build a VEDA story, export an issue to markdown, or set up a Python repo. An MCP server lets that same agent call live APIs, inspect datasets, or resolve fresh operational context.

Enterprise AI work needs both. One gives you procedure. The other gives you reach.

STAC, GDAL, and the rest of the operational path

Once discovery and place resolution are handled, the rest of the geospatial chain looks familiar again.

STAC remains the right interoperability layer for item and asset selection. GDAL, QGIS-style tooling, OSM connectors, Earthdata accessors, and similar MCPs remain the right place for the sharp-edged operational work: clipping, reprojection, vector joins, raster access, metadata inspection, and export.

This is why I still do not buy the idea of a single universal geospatial MCP. Real systems are layered:

Layer Job Example
Discovery Find authoritative datasets, maps, and documents daa_mcp
Place resolution Turn language into geometry Gazet-style geocoder / gazetteer
Catalog handoff Resolve items and assets STAC MCP
Workflow orchestration Schedule and monitor long-running jobs Argo Workflows via MCP
Spatial compute Transform and analyze data GDAL / QGIS / OSM style MCPs
Presentation Package outputs for humans VEDA-style skills, notebooks, apps

The protocol is the plumbing. The quality comes from giving each part of the stack a narrow responsibility.

This is the same systems lesson in geospatial form: trying to make one component do discovery, place semantics, heavy compute, and governance usually creates brittle architecture. Small explicit layers look less magical, but they survive production.

EO processing standards are now part of the same stack

This is where real operations begin. Once a question needs heavy lifting, you need a standards-aligned processing path, not just another API call.

I think of it as three aligned pieces:

  • eoAPI as the service substrate (STAC, tiles, PostGIS-adjacent query surfaces)
  • openEO UDFs for custom analysis logic that does not fit built-in processes
  • OGC API - Processes as a portable execution contract for geospatial jobs

This matters because of execution portability, not standards purity. If a workflow is represented through openEO process graphs, UDF hooks, and OGC process semantics, you are no longer trapped in one vendor UI or one internal notebook convention.

The openEO side is especially useful because it is explicit about chunked execution and long-running back-end jobs. run_udf and related process patterns (apply_dimension, apply_neighborhood, reduce_dimension) are built for data-cube scale operations where synchronous request/response is the wrong abstraction.

OGC API - Processes complements this by giving a common HTTP interface for process descriptions, inputs, execution, and result retrieval. That is the bridge between geospatial compute and enterprise service design: you can expose processing as a governed capability, not just a script somebody knows how to run.

eoAPI then provides a practical packaging layer around STAC-first infrastructure (pgSTAC, stac-fastapi, titiler-pgstac, tipg). Data discovery and data services stay first-class while compute orchestration grows around them.

Long-running analysis needs workflow orchestration, not prompt retries

When EO analysis becomes asynchronous (tiling, temporal composites, model inference over large extents, zonal stats at scale), agent architecture must move from “call tool, get answer” to “submit job, monitor, collect artifacts, summarize result”.

This is exactly where Argo Workflows patterns fit, and why Argo-flavored MCP servers are interesting. The MCP Market Argo server by jakkaj exposes workflow submission/status/result retrieval through JSON-RPC. That maps naturally to agent behavior:

  1. build a workflow payload from resolved geometry + selected assets
  2. submit to workflow engine
  3. poll status with bounded retries and timeout policy
  4. fetch outputs and register artifact locations
  5. return a grounded summary with links to outputs and provenance

Critically, outputs here are not only text. They can be Cloud Optimized GeoTIFFs, STAC Items, GeoParquet extracts, vector tiles, QA reports, or model feature embeddings. For geospatial agents, the right response class is generated artifacts plus explainable metadata, not just a paragraph.

If you combine this with gateway policy (auth, quotas, audit) and model routing, you end up with a more durable operating model:

  • language models interpret intent and coordinate tools
  • workflow systems execute expensive, long-running spatial jobs
  • standards-based APIs keep outputs portable across teams and platforms

That is much closer to “EO production system” than “chatbot with map screenshots”.

Why enterprises should care: this is really about UI replacement

The bigger opportunity is not just geospatial search. It is enterprise task conversion.

Most corporate and government knowledge work still lives in forms, portals, dashboards, policy systems, and browser tabs. Much of it is manual API orchestration done by humans through UIs.

We should be blunt about portal economics. Most are disposable visual sales tools, one level up from a PowerPoint deck. They are useful while an initiative is being socialized or funded, but once that phase passes, maintenance becomes hard to justify.

The exception is when the interface itself is the monetized product. If the UI is a booking or experience engine, like Airbnb, Uber, or Netflix, the interface is the business and ongoing investment is obvious. If it is only a discovery surface, the long-term value is usually in the underlying services and automations, not the glossy front end.

That is where agent stacks become interesting: not “replace staff with a bot”, but “replace repetitive navigation with question answering backed by orchestrated tools”.

For example:

  • use a Digital Atlas MCP to find relevant public datasets
  • use a Gazet-style tool to resolve the place phrase into geometry
  • use STAC or downstream data APIs to locate assets
  • use GDAL or analytics tools to extract what matters
  • use an internal engineering or policy system to fetch approvals, tickets, or previous decisions
  • return one answer with links, bounds, intermediate evidence, and the exact tools invoked

That is a much more defensible enterprise AI story than a generic assistant sitting on top of a screen scraper.

The enterprise layer above the tools: gateway patterns

Of course, the moment you try to productionize this, you run into the boring parts that actually matter: auth, quotas, observability, data leakage, audit, policy, and cost.

At that point, the conversation should move from product selection to gateway patterns. Martin Fowler’s gateway pattern framing is useful because it centers controlled mediation between clients and backend capabilities, which maps directly onto agent-to-tool and agent-to-service boundaries.

The first pattern is identity mediation. Agent calls need to carry a clear identity boundary between user, agent, and downstream systems, with policy checks at each boundary.

The second pattern is protocol normalization. Teams will mix chat payloads, tool calls, MCP traffic, and legacy APIs. The gateway has to normalize those flows so routing and controls are consistent.

The third pattern is cost and budget governance. Token and request spend cannot be an afterthought. Per-key, per-team, and per-tier budgets need to be enforceable in the traffic path, not in a monthly spreadsheet.

The fourth pattern is safety policy composition. Guardrails are rarely one thing. In practice you need layered controls: authentication, content moderation, prompt-injection checks, data-loss prevention, and explicit deny behavior.

The fifth pattern is telemetry with replayable evidence. A useful gateway record is not just “request succeeded”. It is who called what, with which policy version, routed to which provider or tool, at what cost, with what outcome.

I still mention Kong mainly as a signal of category maturity, not as a single implementation path. The same pattern language also shows up in Otoroshi’s AI gateway model: provider abstraction, OpenAI-compatible route surfaces, API-key enforcement, metadata-driven budgets, moderation and regex guardrails, plus extension-based control points.

Whether a team uses Kong, Otoroshi, or another gateway stack is less important than whether these patterns are explicit. Enterprises want MCPs and agent flows, but they need them behind durable control planes instead of scattered sidecars with weak visibility.

Once agents start calling internal systems, this layer becomes unavoidable. Someone will want to know who called what, with which identity, against which policy, and how many tokens or requests that answer burned to produce.

Small models are not a concession, they are often the right architecture

The other part of this conversation that is becoming more honest is model size.

Development Seed’s Gazet makes the point very clearly: many geospatial tasks are structured enough that a small model, grounded to the right schema and surrounded with the right tooling, is better engineering than throwing a giant model at the problem.

That also lines up with what NVIDIA has been highlighting around agentic workloads. Their recent MiniMax M2.7 material is not about a universal oracle; it is about efficient agentic harnesses, tool calling, and orchestration-friendly serving with explicit tool-call parsers, auto-tool-choice, and a runtime story through vLLM, SGLang, NIM, and NemoClaw.

MiniMax M2.7 is not “small” in total parameter count, but it is architected in the direction the market is moving: sparse activation, efficient serving, orchestration hooks, and infrastructure that treats tools as first-class. The important lesson is not the specific model; it is the pattern.

The pattern looks like this:

  • small or moderate specialist models where the task is structured
  • larger reasoning models only where the ambiguity genuinely demands them
  • explicit tool routing rather than one giant model pretending to be every subsystem

For geospatial and enterprise workflows, that is just common sense.

The real end state

The end state here is not “chat with your maps”.

It is something more useful:

  1. take a question expressed in normal language
  2. resolve the place semantics properly
  3. discover the authoritative data sources
  4. retrieve or compute what is needed, including expensive long-running analytics when required
  5. enforce enterprise controls on the path
  6. return an answer that is fast, inspectable, and grounded in tools

Beyond one-shot questions, run persistent agents that watch for change over time. For hazard operations, that means continuously tracking weather, flood signals, fire risk, and related event streams, then escalating when deviation from steady state crosses defined thresholds.

The same pattern applies to slower domains such as seasonal wheat growth. A persistent agent should infer baseline behavior from historical and current EO catalogs, run heavy analysis when confidence drops, and react when observed trajectories diverge from expected seasonal dynamics.

This is where long-running workflows and persistent agents work together: the workflow engine handles expensive catalog-scale computation, while the agent handles temporal monitoring, trigger logic, and action routing.

That operational loop was the key point in the earlier LinkedIn post, and it is worth repeating: in enterprise geospatial systems, value compounds when agents watch state over time and selectively escalate computation, not when they generate one more static summary.

That is what turns manual UI work into operational question answering.

For geospatial work, it means less tab-hopping and more reproducible spatial reasoning. For enterprise AI more broadly, it means agents stop being a decorative chat layer and start looking more like a disciplined API and workflow orchestration fabric.

Better than another dashboard, frankly.

If you are building in this direction, the interesting design questions are no longer “which model is smartest?” They are:

  • where should discovery stop and reasoning begin?
  • which place-resolution tasks deserve their own narrow model?
  • which tools should be exposed directly as MCPs versus hidden behind workflow skills?
  • what sits at the gateway for auth, quotas, audit, and A2A telemetry?
  • which monitoring loops should run continuously, and what deviations should trigger expensive recomputation or human escalation?

That is engineering, not theatre.

References

Wednesday, May 13, 2026

AI-assisted PCB design for energy monitoring (and scaling up don't trust the autorouter)

There is a specific smell to late-stage PCB work. A mix of coffee, solder mask anxiety, and the faint optimism that one more route pass will magically make analog behave. The poking around with oscilloscope probes and poring at datasheets to find out that you failed to pull reset low, out comes the magnet wires for a bodge.

Over the last week I have been building out an ADE9000 breakout with an AI-assisted workflow wrapped around KiCAD scripting. The results are useful, occasionally impressive, and still very far from “hands-off” for precision energy monitoring.

If you grew up with “don’t trust the autorouter” as muscle memory, good news: the saying still holds. We just scaled the blast radius from one menu click to an entire AI pipeline.

AI-aided KiCad schematic work in VS Code and KiCad

What changed on the board, not just on a slide deck

The recent commit sequence in ADE9000_Breakout tells the story better than any marketing copy:

  • d1e3352 (2026-05-07): “Initial AI Creation”
  • 043161c (2026-05-08): “Complete routing”
  • b8c8132 (2026-05-09): “Reroute with JST connector for SPI”
  • a469246 (2026-05-09): “Move decoupling closer”
  • 84b86e7 (2026-05-12): “Relayout using connectors for analog inputs”
  • 7d4b420 (2026-05-16): “Rerouted with Groundplanes and new skills”
  • c40e40a (2026-05-16): “Added License”
  • 9b2d97e (2026-05-17): “Add project-local 3D CAD assets”
  • cf8ba2c (2026-05-17): “Fix CT jack STEP orientation”
  • f5a11b2 (2026-05-17): “Remove CT jack pad overlays”

This was not a single-shot generation. It was iterative engineering with scripting and machine assistance in the loop:

  • Placement and connector mapping in scripts like place_pcb.py.
  • Deterministic route passes in route_pcb.py.
  • Critical route seeding and patching in seed_critical_routes.py and patch_remaining_routes.py.
  • Mechanical/silk cleanup with apply_board_markings.py and move_refs_to_silkscreen.py.
  • Continuous ERC/DRC artifacts (erc.json, drc.json) committed as hard checkpoints.
  • 3D STEP model handling pushed into the reusable layout and size-shape skills so mechanical review can keep pace with PCB edits.

AI-aided ADE9000 PCB layout in KiCad

That pattern matters. The AI and automation stack gave speed and repeatability, but the engineering value came from repeated correction passes driven by board physics.

The same iteration loop is now visible in whatnick-energy-monitor-skills:

  • 335a718 (2026-05-16): Initial whatnick energy monitor skills
  • 78bfc44 (2026-05-17): Document STEP model workflow
  • 79d925d (2026-05-17): Clarify connector STEP model matching
  • af57c2d (2026-05-17): Document connector CAD alignment workflow
  • 32adeff (2026-05-17): Correct ADE9000 jack CAD guidance
  • d4167f6 (2026-05-17): Remove ADE9000 jack overlay guidance

That repo now captures reusable circuit, layout, routing, and board-shape guidance, plus the ADE9000 project overlay. The 3D model side was not a first-pass success either. STEP exports and connector models took multiple iterations before the project-local paths, model matching, and export workflow were stable enough to trust in a mechanical review. The later commits are the best evidence: first add local CAD assets, then discover the CT jack STEP orientation is wrong, then remove pad overlays that made the model look plausible while hiding the fact that the footprint and mechanical story needed to line up properly.

This is where AI-assisted hardware feels different from AI-assisted code. In software, a bad abstraction usually fails in tests or production logs. In PCB work, a bad abstraction can become a connector body floating neatly above the wrong pads in the 3D viewer. It looks professional right up until the part, enclosure, or cable says otherwise.

Pivoting the ADE9000 board toward the larger analog-connector layout

NotebookLM research: good at methodology, weaker on analog edge cases

I started on this adventure with posts from Samuel Beek promoting AI PCB design. I blended in my research background and curated a set of academic papers on AI/Algorithm usage in PCB place-n-route. I queried my pre-created NotebookLM notebook “PCB Place and Route Algorithms” specifically for mixed-signal energy-monitoring constraints.

The useful part of that synthesis:

  • Current AI placement/routing research still optimizes mostly for geometric proxies (wirelength, overlap, congestion).
  • Newer approaches improve constraint capture and collaboration, but are explicitly human-in-the-loop.
  • Model quality degrades on low-frequency pattern classes and novel interface combinations (cold-start behavior).

The important caveat from the same query was even more telling: source coverage was strong on AI placement methodology, but thinner on precision metering specifics like safety creepage strategy, return-current choreography around split references, and “what actually ruins ENOB on a real board at 2am”.

That gap is exactly the point.

Schematik, SnapMagic, and the new hardware workflow stack

Samuel Beek’s Schematik story is almost too perfect as an origin myth for this moment. An AI-generated wiring guide for a home-grown door opener took out the fuses in his apartment, which is a fairly direct way for physics to reject your prompt. Schematik has since raised $4.6 million from Lightspeed Venture Partners and is positioning itself as a “Cursor for Hardware”: describe a device in plain language, get a bill of materials, purchase links, and assembly instructions.

The interesting engineering choice is the safety boundary. Schematik is deliberately aiming at low-voltage circuits, typically 3-5 V IoT and maker projects, because that is where the promise is large and the downside can still be bounded. That is a sensible line. It is also a reminder that “AI for hardware” is not one market. The assistant that helps someone build an MP3 player is not the same system I would trust near mains metering, isolated sensing, or a DIN rail enclosure without a lot more constraint machinery around it.

SnapMagic is attacking a nearby but different layer of the stack. Its pitch is an AI copilot for electronics design, built on the huge CAD model base that started as SnapEDA: symbols, footprints, 3D models, part discovery, BOM optimization, supply-chain substitution, and integration with existing EDA tools including KiCad. That matters because a surprising amount of PCB time is not heroic analog insight; it is finding the right model, importing it cleanly, checking whether the footprint is sane, and making sure the thing still exists at Mouser or Digi-Key.

Schematik is closer to natural-language project generation. SnapMagic is closer to CAD-data and component-selection acceleration. My little ADE9000 workflow sits in the garage between them: NotebookLM for research synthesis, KiCad Python for deterministic changes, Freerouting for a first pass, project-local skills for repeatable domain knowledge, and human review for the parts that still smell like physics.

That is the practical opening for individual designers. You do not need to wait for one perfect vendor platform. You can build a small, opinionated workflow from pieces you already control:

  1. Put your design rules and recurring board-family choices into a local skill or checklist.
  2. Use AI for datasheet digestion, net naming, script generation, and alternative exploration.
  3. Keep KiCad, ERC, DRC, STEP export, and git history as the accountability layer.
  4. Treat external libraries like SnapMagic/SnapEDA as accelerators, then verify footprints, symbols, pin numbers, and 3D models against datasheets and mechanical reality.
  5. Let autorouters and AI propose routes, but hand-check return currents, decoupling loops, creepage, clocks, testability, and enclosure fit.

That sounds less magical than “hardware Cursor”, but it is much closer to how individual designers can safely get leverage today.

Why this hurts more on energy monitoring boards

An ADE9000-style board is not just “digital plus some analog”. It is a negotiated peace treaty between:

  • tiny differential analog signals,
  • noisy clocks and digital SPI edges,
  • shared ground structures,
  • high-voltage interfacing constraints,
  • and assembly realities.

AI can route what it can score. Physics punishes what you forgot to score.

For energy monitoring, the failure modes are often subtle first and expensive later:

  • A seemingly short route with a terrible return path becomes an EMI antenna.
  • “Close enough” decoupling in XY turns into high loop inductance in 3D.
  • Digital fanout convenience leaks noise into the measurement front end.
  • Clearance passes in one view while creepage silently fails along real surfaces.

The scaled-up autorouter adage

Classic autorouter distrust was about ugly traces and cleanup effort.

AI-assisted distrust is about false confidence.

You now get cleaner visuals, plausible routing, and confidence scores. The board can look more “engineered” while still violating analog intent. That is worse than obviously bad output, because it delays the moment when a human gets suspicious.

In the ADE9000_Breakout commits, that showed up as repeated topology and placement adjustments:

  • decoupling moved closer,
  • connector strategy revised,
  • critical nets explicitly seeded,
  • remaining logical gaps patched after freeroute import,
  • then relayout for analog input connector realism.

ADE9000 input clamp and analog connector pivot

None of that is anti-AI. It is pro-accountability.

A practical review checklist I now treat as mandatory

Before I trust an AI-assisted pass on a precision board, I manually review:

  1. Return current continuity under each critical signal path.
  2. Analog/digital ground interaction at the exact stitch points, not just net names.
  3. Decoupling loop geometry (pin, cap, via topology), not just nearest-component distance.
  4. Clock and fast digital net proximity to high-impedance analog channels.
  5. Creepage and clearance across real isolation boundaries and along surfaces.
  6. Testpoint access and assembly risks (reworkability, tombstoning, awkward probe points).

If any of those rely on “the model probably understood that”, I assume it did not.

What AI is genuinely good for in this workflow

After using it in anger, the wins are real:

  • Faster exploration of placement variants.
  • Deterministic scriptable edits to keep design intent reproducible.
  • Constraint management that catches obvious misses early.
  • Better ergonomics for repeated board evolutions.
  • Mechanical and export workflows, including STEP models, can be standardized, but they still need several passes to reconcile footprint libraries, connector variants, and CAD export paths.

This is similar to CNC in machining. You still need a machinist mindset. You just get to fail faster and with better logs.

The part that still needs engineers

The physics does not care whether a trace came from a human, an RL policy, or a nicely branded copilot.

Energy metering boards live or die on the details that are hardest to encode as generic reward functions. The edge where analog integrity, EMC, and safety overlap is still mostly tacit knowledge earned through measurements, bring-up scars, and post-mortems.

So yes, use AI aggressively for PCB work. I certainly am.

Just keep the old sign above the bench.

Don’t trust the autorouter.

Now it applies to systems, not just traces.

If you are building similar mixed-signal boards, I would love to compare review checklists and failure cases that escaped DRC but showed up on the bench.

Friday, May 1, 2026

Using Aluminium Instead of Carbon to Make Silicon

I have been thinking a lot about sovereign capability lately, not in the abstract flag-waving sense, but in the boring physical sense of what materials sit near each other, what energy sources are nearby, what ports exist, and what loops can actually close.

That line of thought clicked into a more concrete shape after reading the paper Carbon-Neutral Silicon via Aluminothermic Reduction? Exploring Industrial Symbiosis through Life Cycle Assessment, the Australian Silicon Action Plan, and then updating my Aluminium + Silicon Sovereign ecosystem slide deck to reflect it.

The core idea is simple enough to explain to a high-school chemistry class: we normally reduce quartz to silicon with carbon. What if, in the right industrial setting, we used aluminium to reduce silicon instead of carbon?

The Conventional Route Uses Carbon

Silicon does not come out of the ground in neat shiny wafers. It starts as silica or quartz, and the conventional metallurgical route is carbothermic reduction: take quartz, add a carbon source, add a lot of heat, and accept a pile of carbon dioxide as part of the bargain.

That bargain made sense when the objective function was mostly “make silicon cheaply”. It makes less sense when we also care about carbon intensity, geopolitical fragility, and whether a country with abundant ore, sunshine, and smelting know-how can turn those endowments into a durable manufacturing base.

The Alternative Route Uses Aluminium

The paper explores aluminothermic reduction, using an aluminium source as the reductant material instead of carbon. More specifically, it looks at aluminium dross as an industrial symbiosis input rather than a pristine, purpose-made feedstock. That detail matters. This is not a fantasy process that assumes some magical zero-cost aluminium stream falls from the sky. It starts from a messy industrial byproduct and asks whether a better loop can be built around it.

The headline result is strong enough to justify attention: the authors find that the aluminothermic route can reduce global warming impact and cumulative energy demand by up to 80% relative to the reference route.

That is the part that makes you sit up.

The useful thing about the paper is that it does not stop at the good news. Some impacts get worse, especially if the aluminium scrap would otherwise have displaced something valuable elsewhere, and because this route still needs extra input materials. So this is not free decarbonisation. It is a real industrial trade-off.

That makes the paper more useful, not less. Serious policy should be built on “this looks promising, but here are the hotspots” rather than on conference-hall hydrogen hallucinations.

Why This Starts To Look Real

In updating that slide deck, I kept coming back to the same question: what happens if you stop treating aluminium, silicon, and solar panels as separate industries?

That is the framing I find compelling, because it turns this from a chemistry curiosity into an engineering and logistics problem.

Around 80% of a typical solar panel by mass is aluminium plus silicon. If a country is serious about energy sovereignty, it should be thinking not just about installing more panels, but about building the material loops that sit behind them. Solar farms are not merely generators. They are future material stockpiles sitting in the sun.

Once you see that, a different policy picture appears:

  • quartz becomes not just a mining input but a strategic silicon feedstock.
  • bauxite and aluminium refining become adjacent to solar manufacturing rather than unrelated heavy industry.
  • end-of-life panels become future reductant, frame stock, and silicon feed instead of landfill problems.
  • smelters, ports, and renewable generation start to look like parts of the same machine.

This is where aluminium reducing silicon instead of carbon stops being an isolated chemistry trick and starts looking like something you could build an industry around.

If I Had To Put Pins On A Map

The notebook I pulled together on domestic solar manufacturing helped sharpen this. Once you stop talking in continent-sized blobs and start naming actual places, a few candidates jump out.

1. Kemerton and south-west WA

This is the least speculative option because Simcoa at Kemerton already exists and is still Australia’s only operating silicon manufacturer. The Silicon Action Plan notes Simcoa is producing about 52,000 tonnes of metallurgical silicon a year, mining its own quartz and running an established smelter operation.

That matters because south-west WA also has the Darling Range bauxite mines, alumina refineries at Wagerup, Pinjarra and Worsley, the SWIS grid, and Bunbury port infrastructure all in the same broad industrial neighborhood. If you wanted to trial aluminium-assisted silicon reduction somewhere in Australia, starting near the one place that already knows how to make silicon seems less heroic than starting from a blank paddock.

2. Townsville and the Lansdown precinct

If Kemerton is the incumbent, Townsville is the “someone is actually trying to draw the whole supply chain in one industrial estate” option. The major project write-up and Solar Sunshot coverage point to the Lansdown Eco-Industrial Precinct near Townsville as the proposed site for a quartz-to-metallurgical-silicon campus plus silicon ingot and wafer manufacturing.

What I like about Townsville is not that it is magically complete today. It is that the logic is visible. There is Queensland quartz, there are nearby solar resources, there is port access, and CopperString plus the Northern Queensland REZ story gives you a plausible path to much more electricity than the region has today. It is easier to imagine an aluminothermic pilot piggybacking on a place already trying to integrate quartz, silicon and wafer production than on a site that only knows one piece of the story.

3. Gladstone

Gladstone feels like the heavy-industry answer. It already has alumina, aluminium-adjacent infrastructure, deep-water port capability, and a lot of people thinking about how to decarbonise industrial heat without hollowing out the place. The Climateworks work on Gladstone is interesting here because it frames the region not just as a load, but as a flexible industrial node that could soak up and shape renewable power.

Gladstone is weaker than Kemerton on current silicon capability, but stronger on industrial mass. If you needed somewhere that already thinks in terms of furnaces, refineries, export terminals and gigawatts rather than artisan clean-tech vibes, Gladstone is on the shortlist.

4. Mourilyan and Weipa as upstream feedstock pieces

I would not put the whole chain in one place just to satisfy a PowerPoint aesthetic. Sometimes the better answer is a linked corridor rather than one mega-site.

The Mourilyan silica sands project is interesting because it gives Far North Queensland a high-purity silica input close to road and port infrastructure. Pair that with Cape York bauxite and alumina flows coming through Weipa and Yarwun and you start to see a north-to-central Queensland materials story, even if the final smelting and wafering steps land further south.

That sort of arrangement is less neat on a map, but a lot more believable in real life.

I have spent enough time around electronics, energy monitoring, and hardware supply chains to be skeptical of national capability claims built on nothing more than a minister at a lectern. Sovereign capability usually comes from embracing the mess: furnaces, scrap streams, slag reprocessing, aging solar farms, logistics yards, and the boring people who know how to keep them running through summer.

The paper explicitly highlights recirculating carbonation gases, reprocessing byproduct slags, and using surplus aluminium scrap as some of the most important improvement levers. Those are exactly the kinds of details that separate a sovereign ecosystem from a PowerPoint ecosystem.

The Fallen Leaves Analogy Is Better Than the Circular Economy Cliche

One line from the slides stuck with me: every 10 years or so, as panel efficiency degrades or silicon technology advances, you recycle the aluminium and silicon into a new panel. Build enough installed capacity and after 30-35 years you have not just electricity generation, but a meaningful stockpile of reusable material.

That feels less like a recycling slogan and more like a forest floor. Fallen leaves are not waste. They are deferred structure. The same could be true of first-generation solar deployments if we design the industrial loop ahead of time rather than pretending recycling will somehow organize itself later.

This is also where the sovereign-policy lens improves the climate-policy lens. A circular loop that produces domestic industrial feedstock, manufacturing resilience, export optionality, and lower carbon intensity is politically sturdier than one justified only as moral sacrifice.

What I Would Actually Like To See Next

If this idea is to move from interesting paper to something testable, I would want to see a few things next:

  • a serious Australian material flow analysis for quartz, aluminium scrap, aluminium dross, solar panel retirements, and metallurgical silicon demand.
  • a location-based study around Kemerton, Townsville-Lansdown and Gladstone rather than a placeless national average.
  • explicit comparison against the alternative use of the aluminium scrap streams, because the paper shows this assumption drives a lot of the environmental trade-off.
  • a pilot framed as industrial symbiosis infrastructure, not just as a decarbonisation demonstration.

The real question is not “can we make a greener tonne of silicon?” It is “can we build a self-reinforcing aluminium-silicon-energy system that compounds capability over decades?”

Final Thought

I like this idea because it is neither purely green-tech optimism nor old-school extractive nostalgia. It says something more interesting: a country with abundant sun, bauxite, quartz, and engineering talent should be able to turn one generation of solar build-out into the feedstock for the next.

Using aluminium to reduce silicon instead of carbon will not solve everything. The paper is clear about the trade-offs, and that honesty is part of why it is worth reading. But as a way of connecting chemistry, recycling, heavy industry, solar deployment, and geography into one coherent story, it has teeth.

That is usually a sign the idea deserves a prototype.

Friday, April 3, 2026

From SaaS to Serviced Software -- the code was never the hard part

Sometime in the last two years the bottleneck shifted. I used to spend most of a greenfield week staring at an empty editor, scaffolding routes, arguing with ORMs, and coaxing CSS into something that did not look like a government form from 2004. Today I can prompt my way to a working CRUD app with auth, a reasonable schema, and even half-decent styling before lunch. The code, it turns out, was never the hard part. The hard part was–and still is–keeping the thing alive once real users touch it.

The SaaS mental model

“Software as a Service” trained an entire generation to think the value lives in the application layer. Build a clever feature, wrap it in a subscription, ship a landing page. The implicit promise: we write the software, you pay monthly, everyone wins. That framing put the spotlight squarely on creation–new features, new integrations, new UI polish.

But anyone who has operated a SaaS product past the euphoric launch week knows where the hours actually go:

  • Rotating secrets and patching CVEs at 11pm on a Friday
  • Chasing down why the invoice PDF lambda timed out in ap-southeast-2 but not us-east-1
  • Fighting Terraform drift after someone clicked through the console “just this once”
  • Explaining to a customer why their data export is 48 hours stale because the Celery worker OOM-killed itself

The ratio of build-time to keep-it-running-time was already lopsided. AI just made it more obvious.

Enter the vibe-coded prototype

Large language models have compressed the “zero to working prototype” phase from weeks to hours. Cursor, Copilot, Aider, v0, Bolt – pick your weapon. The scaffolding phase that used to justify a two-pizza team for a quarter now fits in a solo weekend sprint. I have experienced this first-hand: prompting out a FastAPI backend with DynamoDB tables, an SPA frontend, and a deployment pipeline that mostly works. The code is not elegant. It does not need to be. It is structurally correct enough to demo and iterate.

This is genuinely magical. It is also genuinely dangerous, because it creates an illusion of completeness. The prototype works on your laptop, passes the happy path tests the LLM also generated, and looks great in the demo. Then production happens.

Production is where software goes to get serviced

Here is where the mental model needs updating. We are not really selling Software as a Service anymore. We are selling Serviced Software – and the distinction matters.

In the SaaS framing, the software is the product and the service is the delivery mechanism. In the Serviced Software framing, the service is the product and the software is just the substrate it runs on. Customers do not care that your backend is FastAPI or Express or Rails. They care that:

  1. It is up when they need it (hosting, redundancy, failover)
  2. Their data is safe (encryption at rest and in transit, access controls, backups that actually restore)
  3. It stays current (dependency updates, OS patches, framework migrations)
  4. It costs a predictable amount (no surprise egress bills, no runaway autoscaling)
  5. Someone answers the phone when it breaks at 3am

None of those are code problems. They are operational problems. And they are the problems that AI is worst at solving, because they require sustained human judgement over months and years, not a one-shot generation pass.

The maintenance asymmetry

There is a well-known asymmetry in software engineering: building version 1.0 is perhaps 20% of the total lifetime cost. The remaining 80% is maintenance, evolution, and eventual decommissioning. AI has dramatically reduced the cost of that first 20%. But it has done almost nothing for the other 80%.

If anything, AI makes the maintenance problem worse. When code is cheap to produce, people produce more of it. More repos, more microservices, more side projects that “just need a small server.” Each one becomes a maintenance liability. Each one needs patching, monitoring, log rotation, certificate renewal, database vacuuming. The open-source maintainer burnout problem I wrote about after PyConAU 2019 is now everyone’s problem, because everyone is now a maintainer of their own vibe-coded fleet.

I keep thinking about the analogy to 3D printing. When desktop printers got cheap, everyone printed trinkets for a month. Then the printers gathered dust because the hard part was never fabrication–it was design, finishing, and material science. The bottleneck moved upstream and downstream simultaneously, leaving the newly-cheap middle step feeling oddly irrelevant.

What “Serviced Software” looks like in practice

If you accept that the value has shifted from code creation to code stewardship, a few things follow:

Platform engineering matters more than feature engineering. Internal developer platforms (Backstage, Port, Humanitec) that abstract away infrastructure and enforce guardrails are more valuable than another AI code assistant. The companies investing in paved roads for deployment, observability, and incident response will win over those investing in faster code generation.

Managed services eat custom code. Every line of custom infrastructure code is a future maintenance burden. I learned this the hard way running microservices from folders on EC2 – cron jobs, plain-text .env files, manual venv management. It worked, but every operational incident was my problem. The appeal of managed Postgres over self-hosted, or Vercel over hand-rolled CI/CD, is not laziness. It is recognizing where your scarce operational attention should go.

Security becomes the primary differentiator. When everyone can generate a working app, the ones that survive are the ones that do not get breached. Supply chain attacks, dependency confusion, credential leaks in AI-generated code that helpfully hardcoded an API key – these are the failure modes of the vibe-coding era. Security is not a feature you bolt on; it is the service layer that justifies the subscription.

Cost modeling is a core engineering skill. I wrote about EKS baseline costs being $70/month for personal projects back in 2020. That sensitivity to operational cost used to be a niche concern. Now that anyone can spin up infrastructure with a prompt, understanding what it costs to keep running is table stakes. Cloud bills are the new technical debt – invisible until they are catastrophic.

Tactical tornados vs strategic maintainers

Every engineering org has its archetypes. The tactical tornado is the superstar feature developer who can bang out a new module in a weekend, leave a trail of impressed stakeholders, and move on to the next shiny thing. They are celebrated in sprint reviews, promoted quickly, and held up as the template for “10x engineers.” Product managers love them because they turn roadmap dreams into demo-able reality at terrifying speed. AI amplifies the tornado: give them Copilot and a weekend and they will generate an entire product surface.

Then Monday arrives. The tornado has moved on to the next feature. Someone else inherits the code–no tests beyond the happy path, no runbooks, secrets in environment variables that nobody documented, an autoscaling policy copied from a blog post that assumed a different traffic shape. The tactical tornado created value in a burst. The strategic maintainer captures it over months.

Strategic maintainers are the people who go deep on the two or three features that actually drive revenue. They understand the edge cases customers hit at 2am. They know which database index is holding the query plan together and what happens when the table crosses 50 million rows. They are the ones who turn a flashy demo into a reliable product–incrementally tightening error handling, adding observability, negotiating with product managers about which “small” feature requests would actually require rewriting the payment flow.

Product managers sit in the middle of this tension. A good PM dreams up the features that will deliver the most value and sequences them so the team can ship without drowning in maintenance debt. A less experienced PM treats every sprint as a feature factory, stacking new work on top of un-serviced foundations, because the roadmap rewards visible output over invisible resilience. In the Serviced Software framing, the PM’s job is not just “what should we build next?” but “what is costing us the most to keep running, and is it worth it?”

The industry has historically rewarded tornados and feature-shipping PMs disproportionately. Promotions go to the person who launched the thing, not the person who kept it alive for three years. AI will sharpen this imbalance unless orgs explicitly revalue the strategic maintainer. When code generation is cheap, the scarce skill is not writing new software–it is understanding existing software deeply enough to keep it healthy.

The human layer

There is a deeper point here that goes beyond tooling. The shift from SaaS to Serviced Software is really a shift from building to caring. Building is exciting, creative, dopamine-rich. Caring is routine, patient, often invisible. Our industry has always undervalued the people who keep the lights on relative to the people who ship new features. AI will widen that gap unless we consciously correct for it.

The sysadmin, the SRE, the on-call engineer, the person who actually reads the CVE advisories and patches before the exploit drops – these roles are becoming more important, not less. The code is becoming commodity. The care is becoming scarce.

Where this goes

I do not think SaaS as a business model disappears. But I think the honest version of what customers are paying for will increasingly sound like: “We run and maintain this software so you do not have to.” That is not a new idea – managed hosting has existed forever. What is new is that the software layer itself is becoming thin enough that the operational layer dominates the value proposition.

We are entering an era where the question is not “can you build it?” but “can you keep it running, secure, updated, and affordable for the next five years?” The answer to that question has never been a one-shot prompt. It is a sustained commitment – and that, for now, remains stubbornly human.

Saturday, March 7, 2026

Two Agents, One Codebase: An F1 Race Team Approach to Porting ACOLITE to Rust

In Formula 1, every team fields two drivers. Not as a backup plan – as a strategy. One driver pushes the pace, forcing rivals to respond. The other holds position, manages tyres, and covers the alternative strategy. They share telemetry, they share a garage, but they are running different races on the same track. The team wins when both cars score points, not when one driver tries to do everything.

Porting a scientific Python codebase to Rust feels remarkably similar. You need the aggressive driver – the one who charges into unfamiliar code and lays down fast laps of Rust implementation. And you need the calculating driver – the one who reads the data, watches for degradation, and calls out when the numerical precision is drifting. Two AI coding agents, paired like Norris and Piastri, sharing a codebase but operating on different parts of the problem.

The Starting Grid: Why Rust for ACOLITE?

ACOLITE is RBINS’ atmospheric correction toolkit for aquatic remote sensing. It handles everything from Landsat and Sentinel-2 to hyperspectral sensors like PACE OCI (286 bands) and PRISMA (239 bands). The Dark Spectrum Fitting (DSF) algorithm is elegant – image-based, no external atmospheric inputs – but in Python, processing a full PACE scene involves reading 291 NetCDF variables, interpolating multi-dimensional LUTs, and correcting each pixel’s reflectance through a chain of gas transmittance, Rayleigh scattering, and aerosol models. On a decent machine, this takes around 230 seconds.

The seed was planted at FOSS4G 2025 in Auckland when Leo Hardtke ran a tutorial on Earth Observation processing with Rust. It was plagued by Nix environment issues (as I noted in my conference write-up), but when the code ran, it was fast. Zero-cost abstractions and fearless concurrency are not just slogans at that point – they are wall-clock seconds you are not spending waiting for your atmospheric correction to finish.

I had also been watching Rob Woodcock’s acolite-mp branch, which tackled the same performance problem from within Python. His approach was clever: per-band parallelism with memory budgets tuned to cloud CPU-to-RAM ratios (2, 4, or 8 GiB per core), replacing NumPy’s interpolation with the multithreaded pyinterp, and carefully managing the GIL contention that Python’s threading model inflicts on you. He got Sentinel-2 from 791s down to 197s and Landsat from 312s to 99s on a 24-core i9 – roughly a 3-4x speedup.

But the GIL is still there. The memory model is still Python’s. And as Rob himself noted, “further performance improvements are possible but require more extensive changes to the file handling” and “there is a fair amount of GIL contention which limits threading being caused by some structural choices in the implementation.” At some point, you are fighting the language rather than the problem.

Rust sidesteps all of this. No GIL. No garbage collector. Rayon gives you data-parallel iterators that map across bands or tiles with work-stealing. Memory usage is deterministic and known at compile time – you can profile it statically before deploying, which is a sentence that makes no sense in Python-land but is table stakes in systems programming.

The Pit Crew: Two Agents via ACP

Here is where the teammate analogy really kicks in. In F1, a team with only one driver is not half a team – it is no team at all. You cannot run a split strategy with a single car. You cannot use one driver to hold up a rival while the other pulls a gap. The performance of the pair exceeds the sum of the individuals because they create options that a solo driver simply cannot.

Porting 40,000+ lines of scientific Python to Rust is the same. A single AI agent writing Rust will drift – the implementation slowly diverging from Python’s numerical behaviour until your reflectance values are off by just enough to be scientifically useless. You need the second driver to keep it honest.

The solution I landed on was a multi-agent orchestration harness using the Agent Client Protocol (ACP), a JSON-RPC 2.0 protocol over NDJSON stdio that lets coding agents communicate in a structured way:

Agent Role F1 Equivalent
Kiro Executor – writes Rust code, runs tests, reads files Lead driver – pushes the pace, sets fast laps
Copilot Proposer – reviews output, suggests next steps, cross-checks Python Second driver – covers the strategy, watches the gaps
Human Approver – filters proposals before dispatch Team principal – makes the call on when to pit

The workflow per sensor port looks like this:

  1. Human provides --task to the orchestrator (tools/agent_harness.py)
  2. Kiro receives the task via ACP session/prompt and starts writing code
  3. Kiro streams output via session/update chunks
  4. Output goes to Copilot for review against the Python source
  5. Copilot proposes ACTION: lines – “fix the gas transmittance interpolation order”, “the Rayleigh LUT needs pressure stacking”
  6. Human approves or rejects
  7. Approved actions go back to Kiro
  8. Repeat until regression tests pass or maximum cycles reached

This is not vibe coding. This is a two-car team running a split strategy.

Think about how McLaren or Red Bull operate. The lead driver qualifies on pole and sets the pace in clean air. The second driver starts on a different tyre compound, runs a longer first stint, and emerges from the pits into a different part of the field. They are solving complementary problems – one optimises for raw speed, the other for strategic coverage. Neither is redundant.

Kiro is the lead driver. It attacks the Rust implementation aggressively – writing loaders, porting DSF algorithms, wiring up rayon parallelism. It sets fast laps. It also occasionally bins it into the gravel trap by hallucinating a NumPy broadcasting rule that does not exist in ndarray.

Copilot is the second driver. It reads the Python source, cross-references the Rust output, and spots where the gap to parity is growing. “The gas transmittance interpolation order is wrong” is exactly the kind of radio call a second driver makes – not flashy, but it prevents a DNF.

The human is the team principal. You do not override the drivers on every corner, but you make the strategic calls: do we pit now and fix this RMSE regression, or do we push on and address it in the next stint? Is a 0.002 RMSE difference in Sentinel-2 reflectance acceptable? (It is – that is within float32 precision.) When do we switch from tiled DSF to fixed DSF mode for this sensor?

Together, they converge faster than either alone, for the same reason that two cars gathering tyre data in free practice gives the team more information than one car doing twice as many laps.

The Telemetry: Regression Tests Against Real Data

In F1, both drivers generate telemetry. The team overlays their data – braking points, throttle application, cornering speed – to find where one is faster and why. The overlay is the truth. Not the driver’s feeling, not the engineer’s simulation, but what the car actually did on the track.

Regression tests are our telemetry overlay. The Python ACOLITE output is Driver 1’s trace. The Rust output is Driver 2’s. We overlay them pixel-by-pixel, band-by-band, and look at the delta. When the traces diverge, something real has changed and we need to understand whether it is a genuine improvement or an error we need to correct.

There are currently 141 Python regression tests that compare Rust output against Python output pixel-by-pixel across real satellite scenes:

  • Landsat 8/9: 13 regression + 13 Rust-vs-Python + 7 benchmark tests
  • Sentinel-2 A/B: 19 regression + 15 Rust-vs-Python + 9 benchmark tests
  • PACE OCI: 17 regression + 14 Rust-vs-Python + 12 DSF comparison + 12 ROI + 10 full-scene tests

The tolerances are tight. Sentinel-2 achieves RMSE < 0.002 (physics-equivalent). Landsat gets RMSE < 0.02. PACE full-scene (1710 x 1272 pixels x 291 bands) hits mean RMSE of 0.004 with 100% of pixels within 0.05 of Python. Correlation coefficients are R > 0.999 across all sensors.

These are not toy tests on synthetic data. They run against actual L1 scenes downloaded from USGS and NASA. When the tests break, something real is wrong.

The Performance Gap: Where the Seconds Go

Sensor Scene Size Rust Python Speedup
Landsat 8 62M px x 7 bands 66s 180s 2.7x
Landsat 9 62M px x 7 bands 56s 180s 3.2x
Sentinel-2 A 30M px x 11 bands 52s 182s 3.5x
Sentinel-2 B 30M px x 11 bands 64s 173s 2.7x
PACE OCI (full) 1710 x 1272 x 291 bands 84s 230s 2.7x

The PACE result is particularly satisfying. The key optimisation was switching from 291 per-band NetCDF reads to 3 bulk detector reads, then applying rayon-parallel atmospheric correction across tiles. Load is 12 seconds, AC is 34 seconds, write is 35 seconds. That write phase for a 291-band hyperspectral cube goes to GeoZarr V3 with gzip compression – try doing that in a Python event loop without your memory allocator throwing a tantrum.

Energy Efficiency: The Fuel Strategy Nobody Talks About

Here is the part where I get philosophical – and where the F1 analogy turns from metaphor into mirror.

Formula 1 underwent a fuel efficiency revolution in 2014. The FIA introduced hybrid power units, capped fuel flow at 100 kg/hour (monitored 2,200 times per second), and forced teams to extract maximum performance from minimum fuel. The result was not slower cars – it was faster cars that used less. The 2026 regulations go further: fossil carbon is prohibited entirely, the MGU-K will deliver three times the electrical power (350kW vs today’s 120kW), producing up to 1,000 horsepower while burning sustainable fuel. Less fuel, more power. That is not a trade-off – it is an engineering constraint that drives innovation.

The same constraint applies to scientific computing, we just pretend it does not. Cloud computing bills are denominated in dollars, but the underlying unit is energy. Every CPU cycle your atmospheric correction burns is a watt drawn from a power grid somewhere. When you are processing continental-scale Sentinel-2 archives or the full PACE ocean colour mission, those watts add up. Python is the V10 era of scientific computing – glorious, unrestricted, and profligate with resources.

Rust is the hybrid power unit. Its advantage is not just speed – it is energy per unit of work. A 3x speedup roughly translates to using a third of the compute time, which means a third of the energy, a third of the carbon footprint, and a third of your AWS bill. The Rust Foundation and others have pointed to studies showing compiled languages like Rust and C using an order of magnitude less energy than interpreted languages for equivalent workloads. Just as F1 teams discovered that fuel efficiency constraints forced them to build fundamentally better engines, switching to Rust forces you to think about memory layout, allocation patterns, and data flow in ways that Python’s garbage collector lets you ignore – until the bill arrives.

And here is the irony that would make an F1 sustainability officer wince: Earth observation processing is meant to monitor the planet’s health. Burning excess energy to do it is like running your emissions-monitoring car on leaded fuel. F1 recognised that the sport’s 20-car grid is only 1% of its total carbon footprint, but pursued fuel efficiency anyway because the technology trickles down. The same logic applies to EO processing pipelines. The individual savings per scene are modest, but at continental archive scale they compound – just like how F1’s hybrid innovations now power road cars from Ferrari’s SF90 to the electric components in every modern turbo engine.

Static memory profiling makes this tangible. In Rust, I can tell you at compile time that a Sentinel-2 full-scene atmospheric correction will peak at approximately N gigabytes of memory, because the allocations are deterministic. In Python, you find out at runtime – usually when the OOM killer visits your pod. F1 teams know their fuel load to the gram before the formation lap. Rust gives you the same certainty for compute.

Kubernetes 1.35 and Vertical Pod Autoscaling

This deterministic memory behaviour dovetails nicely with Kubernetes 1.35’s improvements to Vertical Pod Autoscaler (VPA). VPA watches your pod’s actual resource usage and adjusts CPU and memory requests/limits accordingly. When your workload has predictable resource usage – as Rust workloads tend to – VPA converges quickly to the right allocation instead of oscillating between OOM kills and wasted headroom.

For a processing pipeline that ingests satellite scenes of varying sizes (a Landsat scene is 62 million pixels across 7 bands; a PACE scene is 2.2 million pixels across 291 bands), VPA can right-size pods per sensor type. Rust’s static memory profile means the VPA recommendations stabilise fast, which means tighter bin-packing, which means more scenes processed per node, which means lower cost per scene.

Compare this to Python pods where memory usage is non-deterministic, garbage collection spikes are unpredictable, and the VPA has to overprovision to avoid OOM. The 2 GiB/core cloud ratio that Rob’s acolite-mp was carefully designed around becomes less of a constraint when your language does not waste half of it on interpreter overhead.

Out-of-Band Development: Preventing Merge Conflicts with Upstream

One design decision I am particularly happy with is keeping the Rust port on a separate feature branch (feature/rust-port) and treating it as out-of-band from the Python codebase. ACOLITE upstream is actively maintained by Quinten Vanhellemont at RBINS, with regular additions of new sensors, algorithm refinements, and bug fixes. A traditional “rewrite in Rust” approach would create an immediate fork that diverges with every upstream commit.

Instead, the Rust code lives in src/, benches/, and tests/ directories that do not exist in upstream Python ACOLITE. The Python code in acolite/ stays untouched. The regression tests are the synchronisation mechanism – they import both the Python ACOLITE modules and the compiled Rust binary, run the same scene through both, and compare outputs.

When upstream adds a new sensor or changes a gas transmittance coefficient, the regression tests fail in the Rust port. That failure is the trigger: it goes into the agent harness as a --task, Kiro investigates the numerical difference, Copilot cross-references the upstream commit, and the fix lands in Rust without touching a single Python file. No merge conflicts. No rebasing nightmares. Just tests that enforce parity.

This is how you keep an acceleration layer in sync with a moving target – you do not try to merge them. You test them against each other.

What Is Next: The Gap to Full Sensor Parity

The roadmap has the current state at 48 Rust tests, 141 Python regression tests, and three sensors fully validated (Landsat 8/9, Sentinel-2 A/B, PACE OCI). The architecture – loader, AC, writer – is clean and extensible. But three sensors out of 30+ is a qualifying lap, not a race win. Here is what closing the gap to full ACOLITE parity actually looks like.

Sensor Coverage: 3 down, 30+ to go

Python ACOLITE supports a sprawling constellation of sensors. The Rust port has ticked off the three highest-priority ones but the remaining fleet breaks into tiers:

Tier 1 – Near-term (shared loader patterns exist):

Sensor Bands Loader Type Blocker
Sentinel-3 OLCI 21 NetCDF Sensor def exists, needs full pipeline
PRISMA 239 HDF5 Shares pattern with PACE
DESIS 235 HDF5 Shares pattern with PACE
EnMAP 224 HDF5 Shares pattern with PACE
EMIT 285 NetCDF Similar to PACE OCI

These are the low-hanging fruit. The PACE port proved out the NetCDF and hyperspectral GeoZarr writer path; PRISMA/DESIS/EnMAP share the HDF5 loader pattern. Each is a well-scoped --task for the agent harness – Kiro writes the loader and wires up the AC pipeline, Copilot validates against Python output on a reference scene.

Tier 2 – Medium-term (new loader work required):

Sensor Bands Notes
Landsat 5 TM / 7 ETM+ 7-8 Older calibration metadata formats
PlanetScope (Dove/SuperDove) 4-8 Commercial format, GeoTIFF based
WorldView-2/3 6-29 Multi-resolution, pan-sharpening
Pleiades 5-7 DIMAP format
QuickBird-2 5 Legacy but still used
VIIRS (NPP/J1/J2) 22 Swath-based HDF5, three platforms
Aqua/Terra MODIS 36 HDF4/HDF-EOS
GOCI-2 12 Korean ocean colour mission

Each of these needs a dedicated loader – different metadata formats, different calibration approaches, different file layouts. The atmospheric correction core (DSF, gas transmittance, Rayleigh, aerosol models) is shared, but getting the radiometrically calibrated top-of-atmosphere reflectance array into the pipeline is the per-sensor work.

Tier 3 – Geostationary and niche (lowest priority for aquatic applications):

GOES ABI, Himawari AHI, MTG-I FCI, SEVIRI, Sentinel-3 SLSTR, AMAZONIA-1 WFI, CHRIS, HYPERION, HICO, HyperField, HYPSO, Tanager. Some of these (HYPERION, HICO) are decommissioned but their archives are still processed. Others (Tanager at 420 bands, HYPSO at 120) are newer hyperspectral missions that would benefit most from Rust’s performance advantage.

Beyond Loaders: The Algorithm Gap

Sensor parity is not just about reading files. Python ACOLITE has several processing features the Rust port does not yet implement:

  • ROI subsetting: Limit processing to a bounding box or polygon – critical for operational workflows that do not need a full scene
  • Ancillary data retrieval: NCEP ozone, pressure, and wind speed from NASA OBPG; currently the Rust port uses default values
  • DEM-derived pressure: Copernicus DEM at 30/90m for surface pressure estimation in mountainous coastal regions
  • Glint correction: Sun glint removal for low-latitude ocean scenes
  • RAdCor adjacency correction: The physics-based adjacency effect correction developed under the STEREO program
  • TACT thermal processing: Surface temperature from Landsat thermal bands via libRadtran – this one is architecturally interesting because it requires calling an external Fortran radiative transfer code
  • Interface reflectance (rsky): Sky reflection correction at the air-water interface
  • L2W water products: Chlorophyll-a (OC algorithms), TSS (Nechad, Dogliotti), turbidity, Secchi depth – the derived products that downstream scientists actually use

The L2W gap is the most consequential. Most ACOLITE users do not care about surface reflectance per se; they want chlorophyll maps or turbidity time series. Until the Rust port can produce L2W outputs, it remains a fast atmospheric correction engine rather than a complete aquatic remote sensing toolkit.

The Realistic Path

Closing this gap is not a sprint, it is an endurance race – appropriately enough. The agent harness makes each sensor port a repeatable, testable unit of work. The pattern is established: write loader, wire to AC pipeline, run regression tests against Python, fix deltas, validate on real data. Each sensor port takes the agents a day or two of focused work plus human review.

At the current pace, Tier 1 sensors are within reach in the near term. Tier 2 will follow as the loader library matures. The algorithm features (ancillary data, glint, TACT) are orthogonal to sensor coverage and can be developed in parallel. L2W is the final milestone – when the Rust port can ingest a Sentinel-2 scene and produce a chlorophyll-a map that matches Python to within measurement uncertainty, the port will be race-ready for production.

Each of these is a --task for the agent harness. Two drivers, one constructor’s championship. The lead driver pushes into unfamiliar sensor territory, the second driver validates against Python, and the telemetry overlay catches every divergence before it compounds into a retirement.

If the intersection of Rust, Earth observation, and AI-assisted development interests you, the code is all on GitHub. Feel free to ping me with ideas, bug reports, or competing approaches – especially if you have a cleverer way to handle the N-dimensional LUT interpolation. That one was a fun 3 days of Rapid Rust Rewrite fuelled by AI Amphetamine Analogs.

Friday, February 27, 2026

Copilot + Arduino CLI + Saleae: Driver Development for V93XX

I have been spending time building out the V93XX_Arduino library for Vangotech energy-monitoring ASICs, and this is one of those projects where tooling makes or breaks momentum.

The pairing that worked best for me was:

The short version: Copilot gets you to a plausible driver quickly, but the logic analyzer gets you to a correct driver.

V93XX driver workflow walkthrough

Why this trio works

Developing protocol drivers is usually a game of “spec says one thing, silicon does another thing”.

If you rely only on serial logs, you can miss timing and framing issues. If you rely only on captures, you can waste hours writing boilerplate and one-off scripts. Using these three together gives a tighter loop:

  1. Copilot proposes code and tests from your intent
  2. Arduino CLI compiles and flashes quickly from scripts
  3. Saleae confirms framing, parity, baud behavior, and CRC bytes
  4. Copilot helps refactor after you learn what the bus is doing

In practice, this turned the V9381 UART ChecksumMode and waveform/FFT work from a stop-start activity into a repeatable pipeline.

Project context: V93XX_Arduino

Repository: whatnick/V93XX_Arduino

The library currently covers:

  • V93XX_UART for V9360 and V9381 UART modes (including address pins and checksum modes)
  • V93XX_SPI for V9381 SPI paths (faster acquisition, up to MHz clocks)
  • Examples for baseline comms, waveform capture, and FFT

Useful docs in the repo:

Practical workflow

1) Start with a machine-checkable baseline

Before touching hardware, run the local CRC and framing tests.

python tools/test_checksum_mode.py

If this is red, do not flash anything yet.

2) Use Copilot for structured implementation work

I get better results when prompts are specific and constrained. Example prompt:

Implement V9381 UART read path with explicit ChecksumMode behavior.
Keep Clean mode strict and Dirty mode permissive.
Add unit tests for CRC-8 edge cases and frame parsing.
Do not change public method names.

Copilot is strongest here at:

  • Building test scaffolding quickly
  • Keeping repetitive register access code consistent
  • Producing incremental refactors after captures reveal protocol edge cases

3) Consume datasheets with PDF MCP and keep them in-repo

Before generating more driver code, I now ingest the vendor datasheet with pdf-reader-mcp so Copilot can work from extracted register tables and command framing notes.

My workflow is:

  1. Commit the original datasheet PDF to the repository, for example under docs/datasheets/v93xx/.
  2. Use PDF MCP to extract the high-value sections (register map, UART/SPI timing, checksum/CRC rules, waveform buffer details).
  3. Save extracted artifacts back into the repo as text or markdown notes, for example docs/datasheets/v93xx/register_notes.md.
  4. Prompt Copilot using both code and extracted notes so generated changes are tied to concrete datasheet language.

This gives you a versioned paper trail from silicon docs to driver behavior, which is very useful when reconciling captures from the logic analyzer.

4) Compile and flash with Arduino CLI

For ESP32-S3 targets, this keeps the build deterministic and scriptable:

arduino-cli board list
arduino-cli compile --fqbn esp32:esp32:esp32s3 examples/V9381_UART_DIRTY_MODE
arduino-cli upload --fqbn esp32:esp32:esp32s3 --port COM3 examples/V9381_UART_DIRTY_MODE

PowerShell users can run the project helper script:

.\tools\run_automated_tests.ps1 -Port COM3

That pipeline can run unit tests, compile, upload, serial verification, and optional capture/analysis phases.

5) Validate on-wire behavior with Saleae

The logic analyzer phase is where protocol assumptions get tested.

For UART work:

  • Confirm bus settings (baud, parity, stop bits) match the target mode
  • Verify frame boundaries and inter-frame timing
  • Check CRC byte placement and value against expected payload sum/complement logic

If using the helper scripts in this repo:

python tools/capture_v9381_uart.py
python tools/analyze_checksum_captures.py [capture_dir]

This gives a concrete report of expected vs observed CRC and whether behaviour matches Clean vs Dirty semantics.

What changed in my debugging habits

Old approach

  • Edit sketch
  • Upload
  • Stare at serial output
  • Guess

New approach

  • Ask Copilot for narrow, test-backed change
  • Build and flash with Arduino CLI from script
  • Capture with Saleae and compare against expected frame math
  • Feed mismatch details back into Copilot for targeted fixes

It feels closer to hardware TDD than trial-and-error firmware hacking.

Trade-offs and cost notes

  • Arduino CLI is free and excellent for repeatable CI-style local loops
  • Copilot is a paid productivity tool, but pays for itself when protocol code churn is high
  • Saleae hardware is not cheap, but it can save days when parity/timing/CRC bugs are subtle

If you are doing serious serial or SPI driver work, a logic analyzer is not optional for long.

Troubleshooting notes that keep coming up

  1. Wrong serial configuration (especially parity) can look like random CRC failure.
  2. Upload success does not imply runtime protocol correctness.
  3. Dirty mode can hide issues by design; keep a Clean mode path for strict validation.
  4. Keep datasheet extracts in-repo so Copilot prompts reference real tables, not memory.
  5. A scripted workflow (pdf-reader-mcp + arduino-cli + capture + analysis) beats ad-hoc manual steps every time.

Closing

For this V93XX work, the winning combination was AI-assisted coding plus old-school instrumentation.

Copilot helps me move faster, Arduino CLI keeps the loop reproducible, and Saleae captures keep me honest.

If you are building protocol drivers, treat those as complementary tools, not alternatives.