← All posts

Engineering doctrine that doesn't go stale

Engineering standards go stale in one of two ways. They stay too abstract and nobody knows how to apply them. Or they hard-code specific tools and become obsolete the next time the stack changes.

I've been building a reference library that tries to avoid both failure modes by keeping principles and tooling in separate layers that evolve at different rates.

The core split

Principles describe outcomes, constraints, and trade-offs. Tooling shows one possible way to satisfy them with concrete products. Swapping tooling should never require rewriting principles.

A principle like 'validate before packaging, verify after deployment' is true regardless of whether you're on GitHub Actions, Azure DevOps, Jenkins, or a custom deploy script. The order of operations is the principle. The specific jobs and YAML are tooling.

A principle like 'use GitHub Actions with these specific action versions' is not a principle — it's tooling pretending to be a principle. Teams on different platforms can't adopt it, and it needs updating every time the toolchain moves.

Principles vs tooling — how to tell them apart
PrincipleTooling equivalent (don't write this)
Validate before packaging, verify after deploymentUse the deploy-prod GitHub Action with wait-for-health set to 120s
Contracts must be validated in CIRun scripts/validate_contracts.py in the quality.yml workflow
Build evidence must be traceable to a commitPin all GitHub Actions to their full SHA digest
Caches are accelerators, not dependenciesConfigure cache: keys in GitHub Actions to use the lockfile hash

The three-layer structure

Repository structure
LayerPathChanges how often
Timeless principlesdoctrine/principles/Rarely — only when we learn something structural
Illustrative toolingdoctrine/tooling/When the stack, platform, or team context changes
Estate supplementsdoctrine/tooling/estates/Per org, per cloud, per team — optional overrides

Estate supplements are the key to making this usable in practice. An org running entirely on Azure has different tooling examples than one running on AWS. Neither invalidates the principles. The estate supplement captures the org-specific mapping without polluting the canonical principle files.

Build surfaces — making the implicit explicit

One of the more useful concepts in the doctrine is named build surfaces. Every repository owns a set of surfaces. The problem isn't missing surfaces — it's hidden ones.

Build surfaces every repo should name explicitly
SurfaceWhat it is
Local developer entrypointThe one command a new contributor runs to build and test locally
Quality gateThe CI job that must pass before merge — lint, tests, contracts, deny
Release surfaceHow the artefact is built and packaged — reproducible, from a tagged commit
Deploy surfaceHow the artefact reaches a running environment — promotion, not rebuild
Verification surfaceHow you know the deployed thing is healthy — not just that the deploy succeeded
Execution surfaceScheduled scans, queued automation, recurring runbooks — first-class, not hidden in deploy pipelines

The principle is: define the surfaces you own, not the ones you wish you had. Hidden surfaces are the problem. A repo with no declared verification surface usually has no verification — or has it buried in an oncall runbook that nobody reads.

The adoption playbook

The playbook is dependency-aware, not dogmatic. Skip phases that are already healthy.

Suggested adoption order
PhaseFocusWhy first
1Quality gate — one command that fails on fmt/lint/tests for mainCreates safety to change process; reproducible failures
2Trunk-oriented integration — short-lived branches, PR review, green mainReduces drift and batch risk; smaller PRs merge faster
3Contracts at boundaries — API or event schemas validated in CIStops tribal JSON; catches contract breaks before production
4Observability baseline — correlated logs/traces for main pathsMakes incidents diagnosable without archaeology
5Reliability habits — incident severity, blameless reviews, error budgetsTies delivery cadence to actual risk

The adoption order exists because dependencies are real. You can't have useful observability if you don't have contract-validated events to observe. You can't have meaningful reliability metrics if your quality gate doesn't catch regressions before they ship.

What it's for

The library is designed to be forked or referenced by teams. Take the principles wholesale — they're platform-agnostic. Replace the tooling examples with your stack. Add an estate supplement for your org's specific constraints. Hand new leaders the one-pager (minimum-viable-doctrine template) before the full tree.

The goal is doctrine that stays useful as the stack evolves, rather than becoming a historical artifact that everyone ignores because it still references Jenkins pipelines from 2019.

PortableText [components.type] is missing "code"