Technology — Gnosis Memory

How Gnosis Works

Gnosis is a remote cloud MCP server. Your AI assistant connects to it over HTTPS, stores memories, and retrieves them via semantic search. There is no local install, no Docker, no vector database to manage. Just a URL.

Under the hood, four pieces of engineering make this work well.

Topic-Landscape Architecture

The core problem with AI memory is discovery: how does the AI know what you've stored without reading everything? Most memory systems solve this with random samples — dump a block of arbitrary memories into context and hope something relevant shows up.

Structured Discovery, Not Random Sampling

At session start, init_core_memories returns a structured topic landscape — a complete map of your knowledge organized into macro-topic clusters with memory counts and active task counts, topic keywords with density, type distributions, active tasks with progress indicators, and your behavioral preferences. Every part of the response is purposeful. Nothing is random. Nothing is wasted.

Compare that to systems that spend their token budget on a random grab bag of memories that miss the mark most of the time and give the AI no way to search for what it actually needs. Gnosis gives your AI a map and a search engine. Other systems give it a handful of confetti.

How the topic landscape is structured

The landscape is a compressed representation of your entire memory corpus. It includes:

Macro-topic clusters with memory counts and active task counts — your AI sees gnosis(1501, 3 tasks) and knows there's a large body of knowledge with open work items on that topic
Topic keywords with density counts — a flat list of every topic in your corpus, ranked by frequency, so the AI knows what search terms will find results
Type distributions — how many facts, decisions, tasks, and preferences exist, giving the AI a sense of corpus shape
Active tasks with progress indicators — incomplete work surfaces automatically
Behavioral preferences — communication style, workflow rules, and constraints that shape the AI's behavior from turn one

The format is content-agnostic. It works identically whether your corpus is deeply technical, personal, or a mix. The AI doesn't need to see your memories to know what you've stored — the AI navigates the landscape and searches when it needs specifics.

Preview-then-retrieve: how search stays efficient

When your AI searches, Gnosis returns compressed previews — not full memories. Your AI scans the previews, picks the ones it needs, and retrieves only those in full:

Breadth first — 32 previews fit in roughly 400 tokens. Most memory systems return 10 full chunks at 3,000–5,000 tokens — most of which the AI ignores
Depth on demand — your AI reads the previews, identifies the 2–3 it actually needs, and retrieves only those. The AI decides what to read, not the system
Piggybacked initialization — the first search can ride along with the init call, eliminating a network round-trip and the token overhead of a separate tool invocation

The key insight is who decides what to read. In most memory systems, the system guesses which chunks are relevant. In Gnosis, your AI makes that choice — informed by previews that are cheap enough to scan in bulk.

Quality By Design

Gnosis has no server-side LLM. It is a GDPR data processor — it stores what it's told to store. So how did an internal audit find 99.8% of stored content grading B+ or better?

Protocol-Guided Quality

Every MCP tool description is deeply refined to guide the calling LLM toward writing structured, specific, searchable memories. The descriptions encode creation guidelines, topic conventions, type taxonomy, and quality heuristics directly into the tool schema — the same schema the LLM reads before deciding what to write.

This works even with small 8B-parameter models. The quality isn't enforced by a gate — the interface itself guides the AI toward good output. Combined with server-side deduplication that prevents redundant storage, the result is a clean memory corpus without any content filtering.

What makes a high-quality memory

Every memory is guided toward a specific structure:

Front-loaded summary — the first 50 characters must be an executive summary that stands alone. An AI deciding whether to retrieve a memory reads only this preview — if the summary is vague, the memory is effectively invisible
Type discipline — each memory is classified as a fact, preference, decision, path, or task. Each type has structural requirements that make it searchable in predictable ways
Topic keywords — single lowercase words chosen to match future search queries. If searching "redis" should find a memory about caching, "redis" must be in the topics — even if the memory is primarily about sessions
Self-contained content — present tense, includes the rationale, names the subject. No memory should require reading another memory to understand

These conventions aren't enforced by a server-side filter. They're embedded in the tool descriptions that every LLM reads before calling memory_add. The LLM follows them because the interface makes them the path of least resistance. Internal audit across 2,095 memories: 99.8% graded B+ or better. Only 0.2% needed improvement.

Meeting models halfway

The protocol design guides quality on the input side. But what about when a model gets the format wrong despite having the right intent?

Small models — and even frontier models having a bad day — regularly produce close-but-wrong tool calls: memory_replace instead of memory_edit, new_content instead of content, topics as a comma-separated string instead of an array, or arrays double-serialized as JSON strings. Each mistake normally wastes a round trip — the model gets an error, spends tokens understanding it, and tries again.

Gnosis intercepts these at a normalization layer before they reach the service. Hallucinated tool names map to the correct tool. Wrong parameter names are remapped. Malformed arrays are recovered. Invalid type values resolve to their closest valid equivalent. Production logs show 30+ distinct error patterns that this layer catches, each one a saved round trip.

How deduplication works

Two-tier deduplication prevents your memory corpus from filling with redundant entries:

Hash fast-path — exact text matches are caught instantly by content hashing. Zero overhead, zero false positives
Semantic similarity — new memories are embedded and compared against existing memories. Above a similarity threshold, the duplicate is rejected and the existing memory is returned so the AI can update it instead
Update, don't duplicate — when a duplicate is caught, the AI receives the existing memory's ID and content. The AI can refine or replace the existing memory rather than creating a near-duplicate — the corpus grows in accuracy, not noise
No false suppression — semantic similarity uses a conservative threshold to avoid blocking genuinely new memories that happen to be on a similar topic. Better to store a near-duplicate than silently drop new knowledge

The result: a corpus that gets more accurate over time as redundant entries are caught and consolidated.

Your AI Stays in Control

Most memory services are black boxes. Your data goes in, something happens behind the scenes, and you hope for the best. You can't see what was stored, can't see what was silently dropped, and can't see what a server-side model decided was “important enough” to keep.

AI-Directed Storage

Your AI decides what to remember. The AI chooses what to store, how to categorize it, and what to search for. Gnosis never touches that editorial judgment — what goes into your memory is between you and your AI.

The tool-use problem. Getting LLMs to reliably use external tools is one of the hardest problems in AI integration. LLMs don't always call tools when they should. They sometimes hallucinate tool capabilities. They lose track of available tools as conversations grow long and context fills up.

Protocol as guidance. Gnosis solves this through deeply refined tool descriptions that align with how LLMs naturally make decisions. Rather than fighting the model's behavior, the protocol makes good memory practice the path of least resistance. The AI stores what matters because the interface makes it easy to store well and hard to store poorly.

Retrieval That Earns Trust

Speed is table stakes. Trust is the real challenge. Semantic search with cross-encoder reranking delivers sub-100ms results — but if those results are wrong, speed makes the problem worse, not better.

The memory poisoning problem. A factually incorrect memory doesn't just give a wrong answer once — the incorrect memory poisons future conversations. The AI treats retrieved memories as ground truth: if a memory says "use library X" and library X was deprecated six months ago, the AI will confidently recommend it, argue for it, and build on it. One bad memory cascades through every session that retrieves it.

Silence over noise. This is why Gnosis returns nothing rather than returning low-confidence matches. Your AI learns that when Gnosis returns a result, the result is worth reading — and when Gnosis returns nothing, the information genuinely isn't stored, not that the search failed.

Full Transparency

Every operation is visible. Every memory_add call appears in your conversation as a visible tool call. Every search result comes back where you can read it. If your AI stores something wrong, you see the mistake and correct it on the spot.

Why this matters. Memory services that operate invisibly — silently extracting, filtering, or modifying what gets stored — create a system where nobody can audit what the AI "knows." When the AI makes a mistake because of a bad memory, you can't trace the cause. When the service silently drops something important, you don't notice until the AI forgets it.

Errors are caught at the source. With Gnosis, you see the memory being created, you see the content, you see the topics assigned. If the AI stores something wrong — and it will, because all LLMs make mistakes — you catch the error immediately and correct it. The correction replaces the bad memory. The system gets more accurate over time because mistakes are visible.

Data Sovereignty

Processor, not controller. Under GDPR, Gnosis acts on instructions from you and your AI, never on its own judgment. There is no server-side LLM deciding what's "important enough" to keep. No invisible filtering. No editorial layer between your AI and your memories.

Full authority stays with you. You control what gets stored, how memories are organized, when they're deleted, and where they go. One-click export gives you your entire memory corpus as JSON. Account deletion is permanent and auditable. Your memories are encrypted with keys derived from your own credentials — Gnosis holds the data, but you hold the keys.

Constraints that protect. This is a GDPR architecture decision, not a product limitation. A processor that doesn't inspect content can't be compelled to filter content. With Advanced Protection enabled, a system that can't decrypt your data can't be ordered to disclose it. Default-tier users have strong encryption with two-party custody — valid legal process can reach that content.

Token Efficient

Context windows are expensive. Every token spent on memory infrastructure is a token your AI can't use for reasoning. Gnosis is designed to minimize overhead while maximizing the signal your AI receives.

Compressed Topic Landscape

Instead of dumping random memories into context, Gnosis returns a structured map of your knowledge — topic clusters with counts and type distributions. Your AI searches intelligently from this map instead of scanning everything. Less context, better results.

Refined tool descriptions guide LLMs to write memories that are already well-structured and searchable. This means fewer retrieval round-trips, less redundant storage, and higher hit rates on the first search. The efficiency compounds: better memories in means fewer tokens spent finding them later.

Where the token savings come from

Efficiency isn't one optimization — it's a stack of them, each compounding on the last:

Topic landscape vs random injection — structured discovery replaces the common approach of injecting random memory samples. The AI gets a map of everything you know, not a random handful
Preview-then-retrieve — 32 compressed previews in ~400 tokens vs 3,000–5,000 tokens for 10 full chunks from a typical memory system. An order of magnitude less context for better results
Optimized response formats — every response format is tuned for the widest range of models. The challenge isn't just compression — the challenge is finding formats that small models parse as reliably as large ones. No nested JSON, no schema negotiation
Field-specific updates — when refining an existing memory, only the changed fields are rewritten. A topic adjustment costs a fraction of regenerating the entire memory
Piggybacked initialization — the first search can ride along with the init call, eliminating a network round-trip and the overhead of a separate tool invocation

Production search latency is sub-100ms p95. Topic landscape initialization delivers a complete knowledge map in a design target of ~150 tokens, where comparable systems use ~800 tokens of random samples.

The search pipeline

When your AI searches, a multi-stage pipeline finds the best matches in sub-100ms — less time than one thinking token from your LLM:

Embedding — the search query is converted to a vector representation in the same mathematical space your memories live in. The embedding model supports 100+ languages natively — cross-lingual search works without translation
Vector similarity — finds the nearest memories by meaning, not keywords. A search for "database performance" finds memories about query optimization even if they never use the word "performance"
Topic matching — a parallel path that finds memories by their topic tags, catching results that vector search might rank lower
Reciprocal rank fusion — merges the vector and topic results into a single ranked list, combining the strengths of both approaches
Cross-encoder reranking — a dedicated model reads each candidate alongside your query and scores relevance directly. Like the embedding model, the reranker supports 100+ languages. More accurate than vector similarity alone, because the reranker sees the full text of both query and memory together

The pipeline is adaptive. Small result sets skip reranking entirely — no point scoring 3 candidates when Gnosis can return them all. A confidence floor rejects low-quality matches rather than always returning something. Your AI learns that results from Gnosis are worth reading, and that an empty result means the information genuinely isn't stored.

Encryption Architecture

Memory content is encrypted at rest using AES-256-GCM with per-user keys. Gnosis uses a tiered model. Default tier: session keys are derived from your OAuth credentials by the Cloudflare Worker and delivered to the storage service per-request — they are ephemeral at the storage layer and never persisted there. Gnosis and Cloudflare acting together can derive these keys (two-party custody), so default-tier content is reachable by legal process. Advanced Protection (user PIN, OPAQUE zero-knowledge): Gnosis cannot decrypt — an architectural constraint, not a policy promise.

Vector embeddings are stored unencrypted because similarity search requires mathematical operations on raw vectors. Embeddings are lossy, non-reversible projections — useful for matching, but the original text cannot be directly decoded from an embedding.

What this means for search

Encrypting content at rest has a deliberate consequence for how search works. Traditional approaches are off the table:

No full-text search — keyword matching (BM25) requires a plaintext index. Encrypted content can't be indexed. There is no searchable plaintext copy of your memories anywhere in the system
No lexical fallback — most search systems fall back to keyword matching when semantic search misses. Gnosis can't do that — the entire retrieval path runs on vector similarity and cross-encoder reranking
Decryption only at delivery — content is decrypted only for the final results your AI actually receives. The search pipeline itself never sees plaintext — the pipeline operates on vectors and scores

The tradeoff is explicit: search quality depends entirely on embedding quality and reranker accuracy. In exchange, your memory content is never exposed in a searchable index. This is why Gnosis invests heavily in reranker quality — the reranker isn't an optional refinement, it's the only semantic layer between your query and your memories.

Compliance advantages

Encryption at rest provides concrete legal protections beyond the security benefit:

GDPR Article 25 — privacy by design and by default
GDPR Article 32 — encryption at rest is explicitly listed as an appropriate technical measure
GDPR Article 34(3)(a) — encrypted data breaches do not require individual user notification
US state safe harbors — multiple state breach notification laws exempt encrypted data

The encryption architecture was designed from day one to support SOC 2 and HIPAA certification. The remaining work is auditing and certification, not redesign.

Full details on our Security page, including threat model and what Gnosis does and doesn't protect against.

Cross-Platform by Default

MCP is an open protocol, and Gnosis implements Streamable HTTP transport with OAuth 2.1 auto-discovery. Most clients just need the URL — https://gnosismemory.com — and handle the rest automatically.

Currently verified across 13+ clients: Claude, ChatGPT, Gemini, Cursor, VS Code, Copilot CLI, Cline, Roo Code, OpenCode, Vibe, Goose, grok-cli, and mcp-remote as a universal bridge. Clients that don't support native HTTP can use mcp-remote as a stdio-to-Streamable-HTTP adapter.

Why cross-platform is harder than it sounds

MCP is a standard, but every client implements it differently. Making one server work reliably across 13+ clients means solving compatibility problems that the protocol specification doesn't cover:

Transport string fragmentation — VS Code expects http, Cline expects streamableHttp, Roo Code expects streamable-http, and others use their own variants. The wrong string causes silent failures with no error message
Auth flow differences — some clients support OAuth auto-discovery natively, others need manual token configuration, and others use bridge adapters like mcp-remote to translate between transport types
Config format fragmentation — Claude uses claude_desktop_config.json, VS Code uses .mcp.json, Gemini CLI uses settings.json with different field names for the same concepts
Mobile sync behavior — ChatGPT mobile inherits MCP config from the web interface automatically, Claude mobile syncs from your account settings. Each platform handles sync differently

None of this is glamorous engineering. The work is testing every client, documenting every config format, and handling every edge case. The result is that your memories follow you across devices and providers because Gnosis has already solved the compatibility problems you'd otherwise hit yourself.

Small models welcome

The entire interface was designed and tested with models as small as 8 billion parameters running on consumer GPUs. Natural language descriptions instead of rigid JSON schemas. Active instructions instead of passive suggestions. Three required fields, server-computed everything else.

Beyond the interface design, an intent acceptance layer catches common formatting mistakes that would otherwise waste a round trip: hallucinated tool names route to the correct tool, misspelled parameter names are remapped, and malformed arrays are recovered from double-serialization. All three required fields still need to be present — the normalizer fixes how they're formatted, not whether they exist.

If your AI can read a tool description and fill in three fields, it can use Gnosis. No frontier model required.

How portability works under the hood

Your memory corpus is stored centrally, authenticated by your OAuth identity. The data format is designed for portability:

Encrypted content is just bytes — the encrypted payload is storage-agnostic. Your data can be exported, backed up, or migrated to a different backend without re-encryption
Vectors are standardized floats — the embedding format is the same mathematical representation used across the industry. Not tied to a proprietary index
Keys stay with you — encryption keys are derived from your identity, not from the storage system. Your data remains yours regardless of where the data is hosted

One-click export downloads your entire corpus as JSON. You can inspect your data, back it up, or take it somewhere else.