AI agents have a memory problem. Here’s how I fixed mine — and open-sourced the solution.

The Problem Nobody Talks About

If you’ve spent serious time with AI agents, you’ve hit this wall.

You build something smart. Your agent knows your codebase, your team, your patterns. It makes good decisions. Then the session ends — and the next day, you’re starting from scratch. It doesn’t know what it did yesterday. It doesn’t know what it learned last week. It doesn’t even know its own name unless you tell it again.

This isn’t a prompt engineering problem. It’s an architectural one.

Stateless sessions mean every conversation is the agent’s first conversation. You compensate by stuffing context into system prompts that balloon out of control. You paste in notes, paste in history, paste in documentation — hoping the model can piece it together. It’s exhausting. It doesn’t scale. And it breaks the thing that makes agents valuable: the ability to compound knowledge over time.

I’ve been building a multi-agent AI system at Microsoft — a chief of staff architecture where different agents handle different domains, orchestrated by a central coordinator. The agents were good. But they kept forgetting. Every session, I was re-briefing the same agents on the same context. The knowledge existed. It just had nowhere to live.

The Flat File Phase

My first fix was what most people do — a flat markdown file. Curated context every agent read at startup: who’s on the team, what we’re working on, key decisions, conventions. I maintained it by hand.

It worked for a while. But it kept growing. What do you keep? What do you cut? When does a decision from two weeks ago stop mattering? There was no principled way to consolidate it. I was the consolidation engine — skipping entries, letting it bloat past its useful size, forgetting to prune things that no longer applied.

That’s when it clicked. I was doing manually what the human brain does automatically every single night.

The Neuroscience Bridge

During sleep, your brain replays the day’s experiences, scores them for importance, consolidates the ones that matter into long-term memory, and lets the rest fade. There’s real research behind this — Kim & Park (2025) mapped how NREM and REM sleep phases handle different stages of memory consolidation, from hippocampal replay to cortical integration.1

The more I read, the more obvious it became: this was the architecture I needed. Not a bigger context window. Not better retrieval. A consolidation loop.

I looked at existing tools. Vector databases. RAG pipelines. Long-context models. Summary injection. None of it was purpose-built for the problem of agent memory — persistent, structured, semantically searchable, self-maintaining knowledge that evolves with the agent over time.

So I built it.

Introducing Myelin

Myelin is a knowledge graph memory system for AI agents. It’s named after the neural sheath that wraps axons in your brain — the structure that makes frequently-used neural pathways conduct signals faster. The more a path is used, the better it works. That’s exactly how Myelin models memory.

The core idea: agents should get smarter across sessions, not reset.

It’s local-first, open source (Apache-2.0), and runs entirely on your machine. No cloud APIs, no data leaving your environment. Install it, wire it to your agent, and it starts building a knowledge graph from your code, your documents, your agent’s own observations — and that graph persists, evolves, and enriches every subsequent session.

The Architecture

Myelin has three pipelines, a graph layer, and a Copilot CLI extension.

Code Indexing — myelin parse

Parses your codebase using tree-sitter and writes structural knowledge to the graph. Supported languages: C#, TypeScript/TSX, Python, Go, JSON, YAML, Dockerfile, PowerShell, Bicep. Every class, method, function, interface, and config file becomes a node. Structural relationships — File defines Class, Class contains Method, Module depends on Interface — become edges.

When an agent boots, it can query the graph for “what are the authentication patterns in this codebase?” and get a grounded, current answer.

Document Ingestion — myelin ingest

Reads documents, meeting notes, architecture records — anything text-based — and runs zero-shot named entity recognition using GLiNER. It extracts relationships between entities using embedding-based comparison against 37 prototype sentences, and writes the result to the graph. No training data needed. No API calls. Runs entirely on-device with a ~600MB ONNX model.

Memory Consolidation — myelin consolidate

This is the one I’m most proud of. It’s modeled directly on the neuroscience research.

NREM phase — replay, extract, score, transfer. The system replays agent session logs, runs NER to extract entities and decisions, and scores each node for salience using a dual-signal model:

  • Importance (modeled on dopamine) — decisions, bugs, and security issues score high
  • Novelty (modeled on norepinephrine) — first-time events and surprises score high
  • Final salience = 0.7 × importance + 0.3 × novelty

Nodes are written to the graph with reinforcement over duplication — if a node already exists, its salience gets boosted rather than creating a duplicate.

REM phase — decay, prune, refine. Homeostatic decay applies a temporal forgetting curve. Nodes below the salience threshold AND older than the age cutoff get pruned. The graph stays lean and relevant rather than accumulating noise indefinitely.

The result is a graph that remembers what matters, forgets what doesn’t, and gets richer over time.

Fully Local NLP

This was a hard constraint from day one. I was not going to build an agent memory system that requires sending your codebase and notes to a third-party API.

Myelin ships with:

  • GLiNER (gliner_small-v2.1) — zero-shot named entity recognition via ONNX runtime. The model takes entity labels as input and scores spans across your text. It finds People, Tools, Decisions, Patterns, Bugs, Initiatives, and more — with no training data specific to your domain.
  • all-MiniLM-L6-v2 — 384-dimensional sentence embeddings for semantic search. ~80MB. Lazy-loaded on first use.
  • sqlite-vec — KNN vector search directly inside SQLite using L2 distance on normalized vectors (equivalent to cosine similarity). Hybrid retrieval: semantic search first, FTS5 keyword fallback.
  • tree-sitter — incremental AST parsing for the code indexing pipeline. Language grammars loaded per-language, ~5MB each.

The total footprint is real but reasonable. And it runs offline, always.

The Knowledge Graph

Under the hood: SQLite with WAL mode, FTS5 virtual tables for full-text search, and sqlite-vec for vector KNN. Everything in one file. No database server. No configuration.

Node types span two domains:

  • Knowledge: Person, Tool, Decision, Pattern, Bug, Initiative, Meeting, Rule, Convention, Concept
  • Code: Class, Method, Interface, Function, Config, File, Enum

Every node has a salience score [0, 1], a confidence score, a source agent tag, a category, and a namespace for partitioning across repos. Edges use composite keys for multiple relationship types between the same pair of nodes.

The Copilot CLI Extension

The integration most people will care about: a drop-in extension for the GitHub Copilot CLI that gives every agent persistent memory.

After running myelin setup-extension, your agents automatically get:

  • 5 tools: query (semantic search), boot (load agent-specific context), log (write observations), show (inspect nodes), stats (graph metrics)
  • 4 lifecycle hooks: onSessionStart injects boot context before the agent speaks. onUserPromptSubmitted enriches each message with semantic context. onSessionEnd writes a structured session summary. onErrorOccurred logs errors for consolidation.

The result: your agents boot with their accumulated knowledge already loaded. They log what they learn. Consolidation extracts it. The graph enriches the next boot. It’s a closed loop.

What Self-Evolving Actually Means

I want to be precise about this claim because it gets thrown around loosely.

An agent in Myelin is self-evolving in this specific sense: it produces structured logs of its own observations and decisions → consolidation runs (on a schedule, or manually) → NER and relationship extraction build graph nodes from those logs → salience scoring determines what’s important → the graph is enriched → the next boot injects that enriched context back into the agent.

It’s not magic. It’s a well-defined loop with neuroscience-inspired scoring at the core. The agent doesn’t get smarter in the model sense. But it accumulates institutional memory in a way that makes each session more grounded than the last.

My chief of staff agent has been running in this loop. She knows my team, my recurring meetings, my preferred communication patterns, my project priorities — not because I told her, but because she logged it herself and consolidation surfaced it.

Getting Started

Myelin is open source under Apache-2.0.

# Install globally
npm install -g github:shsolomo/myelin

# Set up the Copilot CLI extension
myelin setup-extension

# Parse your codebase
myelin parse --dir ./src --language typescript --namespace myrepo

# Ingest documents
myelin ingest ./docs/architecture.md

# Run consolidation
myelin consolidate --agent myagent

GitHub: github.com/shsolomo/myelin

I built this because I needed it. I’m open-sourcing it because I’m confident I’m not the only one who does.

If you’re building AI agents and you’re tired of re-briefing them every session, this is the tool I wish I’d had when I started.

What’s Next

Myelin is shipped and running in production, but the roadmap is deep. Here’s where it’s headed:

Phase 1 — Stabilize: Windows + Node 24 install fix, consolidation resilience (locking, backup, integrity checks), documentation for beta testers getting started.

Phase 2 — Procedural Memory: This is the big one. Right now, all knowledge decays over time — that’s by design, it prevents noise accumulation. But some knowledge should be permanent: rules, conventions, identity. Phase 2 adds pinned nodes, graduation criteria (nodes that survive enough consolidation cycles get promoted to constitutional memory), and integration with Ian Philpot’s prefrontal system POC for drift detection.

Phase 3 — Smarter Extraction: Hybrid NER + LLM relationship extraction for higher-quality edges. Perspective-aware retrieval so each agent’s boot context is tuned to their domain. Two-pass consolidation that scans session transcripts to catch what logs miss.

Phase 4 — Multi-User Scale: Sensitivity classification on nodes, secret scanning before consolidation, and a merge strategy for shared team graphs. This is the long game — Myelin working for teams, not just individuals.

Each phase maps to a GitHub milestone. The full issue breakdown with dependency graphs is in the repo.

Getting Started

Issues, PRs, and feedback welcome. The graph is ready.


  1. Kim, J., & Park, M. (2025). Systems memory consolidation during sleep: oscillations, neuromodulators, and synaptic remodeling. BMB Reports, 58(10), 425-436.