Everything agents need to manage memory.
Architecture
How it fits in your stack.
Agents Memory runs as a managed service you call over HTTP. It connects to your agent logic — not directly to your LLM. You call it before and after each LLM interaction.
At write time, your agent sends interactions to Agents Memory: messages, events, extracted facts. These are processed, scored, and stored — in session state or long-term storage depending on content and configuration.
At read time, your agent queries Agents Memory before constructing its prompt. The API returns ranked memory items with metadata. You decide how to inject them.
Request flow
Your Agent
↓ retrieve(user_id, query)
Agents Memory API
↓ Conversation + Visitor/Merchant
Ranked context returned
↓ ranked context
Your Agent builds prompt
↓ prompt + context
LLM (OpenAI / Anthropic / …)
↓ response
Your Agent calls add()
Agents Memory API
Sub-200ms total overhead per round-trip
The Digital Brain
Every region mapped to a real system.
Agents Memory's architecture is a direct analogy of biological memory. Each brain region that handles a distinct memory function maps to a purpose-built subsystem.
Prefrontal Cortex
Working Memory
Session state, last N turns, auto-TTL. Fast hot storage for active context.
Neocortex
Frequency tracking and confidence scoring on every stored fact.
Sleep Consolidation
Auto-triggers after every ingest, merging conversation data into persistent profiles.
Forgetting Curve
Importance fades over time; frequency protects strong neural pathways from decay.
Amygdala
Importance score (0.0–1.0) ensures high-impact events persist longer than low-signal noise.
Core Capabilities
Five memory primitives. One API.
Session Memory
01
Context that survives the conversation.
Session memory captures the full thread of a conversation — what the user asked, what the agent responded, what actions were taken. It's scoped to a session ID and persists beyond the LLM's context window.
Use it to maintain coherent multi-turn conversations, resume interrupted sessions, and pass relevant history to downstream agents in a pipeline.
Key behaviors
Automatic message threading
Token-aware summarization for long sessions
Session expiry controls
Resumable across deployments
Long-term Memory
02
What the agent should remember forever.
Long-term memory stores facts, preferences, and conclusions that should persist across sessions. It accumulates over time and is not tied to a single conversation.
Agents Memory applies scoring to determine what deserves long-term retention versus what's transient. You can also write to long-term memory explicitly via API.
Key behaviors
User-attributed fact storage
Memory scoring and relevance weighting
Manual and automatic write modes
Configurable retention policies
User Memory
03
A persistent profile for every user.
User memory is a structured, evolving record of what an agent knows about a specific user: preferences, past behavior, stated goals, and inferred patterns.
It's separate from conversation history. It answers the question: what do I know about this person? — not what did we talk about last time?
Key behaviors
Per-user memory namespacing
Preference and trait extraction
Exportable and deletable for GDPR/CCPA
Versioned updates — no silent overwrites
Temporal Fact Tracking
04
Facts that know when they're true.
Every fact is stored with valid_from and valid_to timestamps. When a new fact contradicts an existing one, Agents Memory detects the conflict, versions the old fact, and surfaces the current truth.
This eliminates the silent drift problem — where an agent confidently cites outdated information. The full version history is queryable so you can see exactly how a fact evolved over time.
Key behaviors
Contradiction detection on every ingest
valid_from / valid_to timestamp tracking
Full version history per fact
Confidence scoring (0.0–1.0) with each write
Retrieval Engine
05
Relevant context, ranked and ready.
The retrieval engine surfaces the right memory at the right time. Given a query or current message, it returns ranked memory items — a mix of recent history, long-term facts, and user context — formatted for prompt injection.
It's not raw similarity search. It combines semantic relevance with recency, user scope, and memory type weighting.
Key behaviors
Hybrid retrieval (semantic + recency + type)
Configurable context window budget
Returns memory with source metadata
Streaming support for real-time agents
Positioning
Not a replacement. The missing piece.
LLMs generate responses. Vector databases store and search embeddings. Agents Memory manages what your agent remembers and when.
If you're already using Weaviate, Pinecone, or pgvector for document retrieval — keep using them. Agents Memory handles a different problem: agent state, user context, and conversational memory. These are complementary.
If you don't have a vector store, you don't need to add one. Agents Memory handles its own storage and retrieval internally.
LLM
Generates responses
Vector DB
Document retrieval (RAG)
Agents Memory
Persistence, user context, statefulness
Example Workflow
A typical agent interaction.
1
User sends message
Your agent receives it
2
Agent calls .retrieve()
Gets ranked context: preferences, session summary, long-term facts
3
Build the prompt
Inject context + current message, call your LLM
4
LLM responds
Agent sends response to user
5
Agent calls .add()
Exchange stored, scored, and updated in memory
Start building agents that remember.
Free tier available. No credit card required.
By signing up, you agree to our Terms of Service and Privacy Policy.