Team memory
Team memory is the difference between “another AI agent” and “an AI agent that gets your codebase.”
The problem it solves
Section titled “The problem it solves”When a developer uses Copilot, Cursor, or Claude individually, each session is isolated. The architectural decisions made in last week’s meeting aren’t in anyone’s AI context. Coding conventions discovered by one dev aren’t shared with anyone else’s AI. The “why we chose X over Y” reasoning is forgotten the moment the conversation ends.
Teams collaborate in human channels (meetings, Slack, Teams). But the AI side of the modern dev workflow is single-player. Cascade fixes that with team memory.
How it works
Section titled “How it works”When you run cascade init, Cascade creates a team-memory/ directory with five starter files:
team-memory/├── README.md├── conventions.md ← coding style, file layout, naming├── decisions.md ← architectural decisions and why├── glossary.md ← domain terms unique to your business├── constraints.md ← performance, security, compliance└── prior-work.md ← summaries of recently shipped storiesEvery Cascade stage reads relevant excerpts from these files as grounding context for its LLM calls. So:
- The story extractor knows your domain glossary and won’t invent new names for existing concepts.
- The planner knows what’s already been built and avoids duplicating it.
- The coder knows your conventions and produces code that fits your style, not generic textbook style.
- The tester knows your test patterns and produces tests that match.
What to put in each file
Section titled “What to put in each file”conventions.md
Section titled “conventions.md”Be specific. Examples beat abstractions.
- Python: snake_case for functions, PascalCase for classes- File layout: API routes in src/api/routes/, models in src/models/- Database: singular table names (user, not users)- Error handling: raise specific exceptions, never bare except- No global mutable state- Type hints required on all public functionsdecisions.md
Section titled “decisions.md”ADR-style log. What you chose, why, and what alternatives you rejected.
## [2026-01-15] Postgres over MongoDBWe needed schema enforcement and relational guarantees. Postgreswon. All new persistence uses SQLAlchemy + Alembic migrations.We don't introduce new MongoDB anywhere.glossary.md
Section titled “glossary.md”Terms unique to your product or codebase.
**Workspace**: A user's top-level container. Each user can have manyworkspaces. Not a folder, not a UI panel.
**Run**: One execution of a workflow. Has start time, end time, status,and produces artifacts.constraints.md
Section titled “constraints.md”Non-functional requirements that affect every design decision.
## Performance- API response time p99 under 200ms- Max 5 DB queries per request- No N+1 queries
## Security- All API endpoints require auth except /health- No PII in logs- All secrets via env vars, never in codeprior-work.md
Section titled “prior-work.md”Brief summaries of shipped work so Cascade doesn’t duplicate it.
## [2026-04-22] Cursor pagination on /api/usersAdded cursor-based pagination with ?limit and ?after.Pattern lives in src/api/pagination.py. Extending to otherendpoints? Use the same pattern. Don't introduce offset-based.How much detail is enough
Section titled “How much detail is enough”Aim for at least 50 substantive lines across all five files before relying on Cascade for real work. Below that threshold, the LLM is mostly working from generic best practices. Above it, you’ll see the difference.
How team memory grows
Section titled “How team memory grows”In v0.1, you maintain it manually. After every architecture meeting, add to decisions.md. After every PR merges, summarize in prior-work.md. After every glossary term emerges, add it.
In v0.2 and beyond, Cascade will suggest updates automatically based on the meetings it processes and the PRs that merge. That’s on the roadmap; for now, treat team memory as living documentation that you tend to.
Keeping context windows honest
Section titled “Keeping context windows honest”Naively dumping every team-memory file into every LLM call would bloat prompts, slow responses, raise costs, and dilute the model’s attention across irrelevant material. Cascade avoids that by treating team memory as a bounded, ranked context budget rather than a raw blob.
The rules:
- Hard character budget. Team memory is capped at 20,000 characters per LLM call by default (roughly 5,000 tokens — a small fraction of any modern model’s context window). Configurable via
memory.max_chars_per_callincascade.yaml. - Per-file proportional truncation. When your files exceed the budget, each file is truncated to its proportional share so one bloated file can’t eat the whole window.
- Stage-aware selection. Different pipeline stages prioritize different files. The extractor weights
glossary.md; the planner reaches fordecisions.mdandprior-work.md; the coder leans onconventions.mdandconstraints.md. No stage receives the entire library. - Empty files are skipped. A starter template with no real content costs zero context.
- Structured grounding, not raw paste. Cascade sends the files as a labeled grounding block (headings, bullets, ADR entries) so the model sees signal, not markdown formatting noise.
The result: team memory grows over time without prompts growing with it. A team with 200 KB of accumulated decisions and conventions still produces a tight, focused prompt on every call.