Engineering standards go stale in one of two ways. They stay too abstract and nobody knows how to apply them. Or they hard-code specific tools and become obsolete the next time the stack changes.
I've been building a reference library that tries to avoid both failure modes by keeping principles and tooling in separate layers that evolve at different rates.
The core split
Principles describe outcomes, constraints, and trade-offs. Tooling shows one possible way to satisfy them with concrete products. Swapping tooling should never require rewriting principles.
A principle like 'validate before packaging, verify after deployment' is true regardless of whether you're on GitHub Actions, Azure DevOps, Jenkins, or a custom deploy script. The order of operations is the principle. The specific jobs and YAML are tooling.
A principle like 'use GitHub Actions with these specific action versions' is not a principle — it's tooling pretending to be a principle. Teams on different platforms can't adopt it, and it needs updating every time the toolchain moves.
| Principle | Tooling equivalent (don't write this) |
|---|---|
| Validate before packaging, verify after deployment | Use the deploy-prod GitHub Action with wait-for-health set to 120s |
| Contracts must be validated in CI | Run scripts/validate_contracts.py in the quality.yml workflow |
| Build evidence must be traceable to a commit | Pin all GitHub Actions to their full SHA digest |
| Caches are accelerators, not dependencies | Configure cache: keys in GitHub Actions to use the lockfile hash |
The three-layer structure
| Layer | Path | Changes how often |
|---|---|---|
| Timeless principles | doctrine/principles/ | Rarely — only when we learn something structural |
| Illustrative tooling | doctrine/tooling/ | When the stack, platform, or team context changes |
| Estate supplements | doctrine/tooling/estates/ | Per org, per cloud, per team — optional overrides |
Estate supplements are the key to making this usable in practice. An org running entirely on Azure has different tooling examples than one running on AWS. Neither invalidates the principles. The estate supplement captures the org-specific mapping without polluting the canonical principle files.
Build surfaces — making the implicit explicit
One of the more useful concepts in the doctrine is named build surfaces. Every repository owns a set of surfaces. The problem isn't missing surfaces — it's hidden ones.
| Surface | What it is |
|---|---|
| Local developer entrypoint | The one command a new contributor runs to build and test locally |
| Quality gate | The CI job that must pass before merge — lint, tests, contracts, deny |
| Release surface | How the artefact is built and packaged — reproducible, from a tagged commit |
| Deploy surface | How the artefact reaches a running environment — promotion, not rebuild |
| Verification surface | How you know the deployed thing is healthy — not just that the deploy succeeded |
| Execution surface | Scheduled scans, queued automation, recurring runbooks — first-class, not hidden in deploy pipelines |
The principle is: define the surfaces you own, not the ones you wish you had. Hidden surfaces are the problem. A repo with no declared verification surface usually has no verification — or has it buried in an oncall runbook that nobody reads.
The adoption playbook
The playbook is dependency-aware, not dogmatic. Skip phases that are already healthy.
| Phase | Focus | Why first |
|---|---|---|
| 1 | Quality gate — one command that fails on fmt/lint/tests for main | Creates safety to change process; reproducible failures |
| 2 | Trunk-oriented integration — short-lived branches, PR review, green main | Reduces drift and batch risk; smaller PRs merge faster |
| 3 | Contracts at boundaries — API or event schemas validated in CI | Stops tribal JSON; catches contract breaks before production |
| 4 | Observability baseline — correlated logs/traces for main paths | Makes incidents diagnosable without archaeology |
| 5 | Reliability habits — incident severity, blameless reviews, error budgets | Ties delivery cadence to actual risk |
The adoption order exists because dependencies are real. You can't have useful observability if you don't have contract-validated events to observe. You can't have meaningful reliability metrics if your quality gate doesn't catch regressions before they ship.
What it's for
The library is designed to be forked or referenced by teams. Take the principles wholesale — they're platform-agnostic. Replace the tooling examples with your stack. Add an estate supplement for your org's specific constraints. Hand new leaders the one-pager (minimum-viable-doctrine template) before the full tree.
The goal is doctrine that stays useful as the stack evolves, rather than becoming a historical artifact that everyone ignores because it still references Jenkins pipelines from 2019.