← All posts

CellOS: no ambient authority in CI runners

Every CI runner starts a job with a full environment. Every variable set earlier in the workflow, every credential configured at the repo level, every secret the job was given — all of it is in the environment of every process that runs. A compromised npm package in step 5 inherits the AWS credentials configured in step 1.

Authority in CI pipelines is ambient by default. Every process can see everything unless you actively prevent it.

CellOS is built around the opposite assumption: no ambient authority. Every unit of execution declares exactly what it needs. Everything else is withheld.

The execution cell

The core primitive is the execution cell — a least-authority compute unit defined by a spec. The spec is a contract: it declares what the cell is allowed to do before it starts running.

PortableText [components.type] is missing "code"

This cell gets exactly one secret. It can reach one host on one port. Everything else — other secrets, arbitrary network access, the host filesystem — is not available. The contract is declared in advance, enforced at exec time, and recorded in the audit trail.

What ambient authority looks like in practice

A standard GitHub Actions workflow:

PortableText [components.type] is missing "code"

This is the secret spray problem. npm run build inherits the AWS credentials configured two steps earlier. A compromised postinstall hook finds them in the environment and has everything it needs to exfiltrate them.

With a cell spec:

PortableText [components.type] is missing "code"

The supervisor calls env_clear() before execve. The only key available to npm is the one explicitly listed in secretRefs — a short-lived OIDC token scoped to the declared audience. AWS_ACCESS_KEY_ID is never in the child's environment.

Cross-run contamination

Shared runners accumulate state. A pull request job writes a malicious file to /tmp. The next job — a release build from main — runs on the same host and sources cached tooling from /tmp. This is a real attack class with a GitHub advisory behind it.

On the hardened Linux path, the supervisor mounts a fresh empty tmpfs over the cell's working directory before execve. The previous run's workspace data is not visible. When the cell exits, the mount namespace is destroyed.

The isolation is proven with a cross-run test: run A writes a sentinel file; run B asserts it is absent. The test runs in CI.

Destruction that means something

Most ephemeral compute cleans up on a best-effort basis. CellOS defines what destroyed means precisely, per layer:

Destruction semantics per layer
LayerDestroyed meansHow it's proven
Process treeSIGKILL; no orphan supervisors retaining capabilitiesSupervisor exit code + lifecycle events
SecretsTTL at broker; broker-side materialized secrets revoked after teardownresidue.rs: broker empty after destroy+revoke; two-cell isolation test
Filesystem (workspace)Cell-private tmpfs discarded when mount namespace exitssupervisor_linux_private_workspace.rs: run B gets empty workspace
NetworkPrivate net namespace and nft rules removed; cell's network identity does not persistsupervisor_linux_network_policy.rs: no host-loopback reachability from child
AuditTeardown event emitted; final residue class recorded (none / documented exception)CloudEvents lifecycle trail; JSONL sink

Observable execution

Every cell emits structured CloudEvents over its lifecycle. The events flow to NATS JetStream or a JSONL file for SIEM ingestion.

CloudEvents lifecycle trail
EventWhen emittedKey fields
cell.identity.v1.materializedOIDC or secret broker resolved successfullyrun_id, identity type, audience
cell.command.v1.startedChild process spawnedrun_id, argv, working_dir
cell.network_policy.v1.appliedNetwork namespace and egress rules configuredrun_id, egress_rules, netns_active
cell.export.v2.completedArtifacts exported to declared sinkrun_id, destination, bytes
cell.command.v1.completedChild process exitedrun_id, exit_code, duration_ms
cell.teardown.v1.completedAll resources destroyedrun_id, residue_class

Standard mode emits events but may not enforce network isolation at the kernel level. Hardened mode requires CELLOS_SUBPROCESS_UNSHARE including net and mnt, and enforces the full claim. The enforcement contract in the docs lists exactly which conditions are required for each capability.

Scored by Claude Code

This assessment was independently reviewed and scored by Claude Code against the cellos-lite codebase, test suite, docs/guarantee-matrix.md, and docs/break-attempts.md.

Independent assessment — Claude Code
PropertyScoreEvidence
No ambient secrets in child process9/10supervisor_no_ambient_env.rs: proves host env vars not in secretRefs do not appear in child; CELLOS_SECRET_* carrier prefix also absent
Cross-run workspace isolation8/10supervisor_linux_private_workspace.rs: run A sentinel absent in run B; workspace_is_empty_on_start confirmed on hardened path
Network containment7/10Private netns + best-effort nft on hardened path; nftRulesApplied=false is possible if nft unavailable — docs are explicit about this bound
Destruction semantics8/10residue.rs covers host+broker empty after destroy; two-cell isolation; idempotent revoke_for_cell
Observability9/10Structured CloudEvents with stable schema on every lifecycle event; JetStream + JSONL sinks; correlation ID across events
Authority contract as first-class spec9/10spec.authority is declared before execution; validated against JSON Schema in CI; not a flag bag added after the fact

The governance loop

CellOS is the execution layer in a closed loop:

LayerRole
tauditScans pipeline YAML, builds the authority graph, flags where privilege leaks across trust boundaries
tsafeConstrains secrets to the specific steps that need them; exec injects only what was declared
CellOSEnforces execution — the cell gets what the spec authorised, nothing else; teardown removes every trace

taudit findings route to the right tool. Scope findings carry a TsafeRemediation. Isolation findings carry a CellosRemediation. The loop closes: detect, constrain, isolate, observe again.

Current state

cellos-lite enforces per-run isolation and authority scoping at the process and Linux namespace level, with auditable lifecycle and event emission. The authority bundle is a first-class contract in the spec. MicroVM-class isolation (Firecracker) is the next milestone; the semantic layer is stable now and carries forward regardless of the isolation primitive underneath it.

PortableText [components.type] is missing "code"