Memory for agents that doesn't quietly turn opinions into facts

I have been writing a thing called Cortex. It is a local-first memory substrate for AI agents — cargo-buildable, runs on a laptop, no cloud account — and the shortest pitch I can give it is this: agent memory today silently promotes model guesses into truth, and I think the right response is a substrate that makes promotion an explicit, auditable step.

That sentence sounds abstract until you have watched it happen. Most agent memory in 2026 is some shape of "the model said this last time, save it for next time" — usually a vector store keyed on prose, sometimes a SQL table keyed on whatever the model decided was important. The model writes; the model reads; nothing in between asks whether the claim was true, whether the operator ever agreed it was true, or whether anyone should be allowed to act on it as if it were true.

Six conversations later your agent has a "memory" that says "Ryan prefers Postgres for new projects", which it inferred from a single sarcastic remark, and now every project plan it generates routes through Postgres. The claim was never confirmed. There is no audit trail. The model has been talking to itself, with delay.

Cortex is what I have been building for about a month as the alternative shape.

The trust boundary, in one sentence

Rust owns validation, storage, scoring, and audit. Models propose interpretations.

That sentence is load-bearing. It means every piece of state in Cortex has a known origin (operator, sensor, model), a known trust class (raw fact vs. candidate inference vs. promoted principle), and a known auditor (the Rust crate that signed off on it landing in that bucket). A model can suggest "Ryan prefers Postgres" all it wants. The substrate accepts the suggestion as a candidate, scored against contradicting evidence, with the operator’s actual stated preference (if any) ranked above it. The substrate does not let the candidate cross into the principle tier — the tier the model gets to read back as a load-bearing fact — without an explicit operator step.

It is the boring distinction between a witness and a judge. Today’s "agentic memory" lets witnesses promote themselves.

Three layers, and one of them never moves

There is a hash-chained append-only event log at the bottom. Every observation the agent makes — every tool call result, every conversation turn, every operator instruction — lands there as an immutable event. That layer never edits and never deletes; the only operations are append and verify. If the chain hash breaks, the system refuses to proceed. This is the layer you reach for when an auditor asks "what actually happened" — not what the model summarised, what happened.

Above that, a derivation layer turns raw events into candidate memories. A candidate memory is "Ryan said the word 'Postgres' negatively in this transcript span, here is the span". A candidate is reproducible — you can rerun the derivation against the underlying events and get the same candidate — and it carries an explicit provenance chain back to the events it was derived from. Candidates are the layer the model is allowed to write into. They are also the layer that is not trusted on read-back.

Above that, promoted principles: facts the operator has reviewed and accepted. Promotion is a discrete operation, takes operator input, and is reversible. A principle that turns out to be wrong gets revoked with a recorded reason, and the revocation is itself an event in the log below. The model only ever sees principles on the trust side; it sees candidates as suggestions, not as ground truth.

What gets sent to the model, and why

A practical thing this lets me do: every prompt Cortex assembles for a model comes with an explanation. Not a debug log — a structured artifact that says here is what was included, here is what was excluded, here is what was redacted, here is what was deemed trustworthy and at what tier. The explanation is generated by the same Rust code that did the assembly, so it cannot drift from what was actually sent.

When the agent does something wrong, the question stops being "the model hallucinated" — which is true but not actionable — and becomes "what did the prompt say, and which of those claims were principles vs candidates, and was anything redacted that the operator wanted included". Those are answerable questions. They route to either the principle tier was wrong (revoke and re-examine the promotion decision) or the candidate-to-principle gate let something through it shouldn’t have (which is a substrate bug, fixable).

This is the part that makes Cortex feel different to use. Agent failures stop being mystical.

What it is not

It is not a cloud product. There is no shared instance, no team mode, no SaaS. The whole substrate runs against a SQLite file and an append-only JSONL boundary on your laptop. If you want collaborative memory across multiple operators you will not get it here, at least not today.

It is not a vector database. The retrieval layer is there but the memory model is structured — typed events, typed candidates, typed principles — and the read paths exercise that structure. A vector index is one of several retrieval surfaces, not the primary one.

It is not an agent framework. Cortex does not run your agent loop. It is the memory substrate — a library plus a CLI plus a set of typed contracts — that you plug into whichever agent runtime you are using. I happen to also have a sibling runtime (axiom) that consumes Cortex’s contracts, but Cortex itself is agnostic about what is on the other side of the wire.

And it is not finished. The honest state today is: the trust-boundary plumbing works, the event log is solid, the candidate-and-principle tiers are real and have working promotion ceremonies. The retrieval surface is sparser than I want. The operator UI is a CLI. The threat model lives in docs/THREATS.md and gets updated when I find a new one.

Why I think this is the right shape

The trade I am making is: more operator work today in exchange for a system where you can still answer "where did that come from" in a year. That trade is worth it for the kind of agent work I care about — multi-session, multi-tool, accumulating context that has to stay honest over time. It is probably the wrong trade for a one-shot chatbot, and that is fine.

The substrate-shaped take on agent memory is: the model is the smartest component in the loop, and is also the least trustworthy, and the right architecture is the one that keeps both things true at once. Cortex is my attempt at the substrate that lets the smartest-component get smarter without the least-trustworthy bleeding into ground truth.

If that resonates, the README has a 60-second offline walkthrough and the source is on GitHub. If it does not, that is also useful signal — I would rather hear "the trade is wrong for my agent" than "I tried it and got confused by what it was".