@leanlabsinnov/codegraph
officialLive, queryable knowledge graph for your codebase. Indexes JS/TS into an embedded graph DB with embeddings and exposes an MCP server Claude Code and Cursor can call.
___ _ ___ _
/ __\___ __| | ___ / _ \_ __ __ _ _ __ | |__
/ / / _ \ / _` |/ _ \/ /_\/ '__/ _` | '_ \| '_ \
/ /__| (_) | (_| | __/ /_\\| | | (_| | |_) | | | |
\____/\___/ \__,_|\___\____/|_| \__,_| .__/|_| |_|
|_|
Live, queryable knowledge graph for your codebase.
CodeGraph
Turn any JS/TS/Python codebase into a live, queryable knowledge graph — then give your AI assistant a way to navigate it.
CodeGraph indexes your repository using tree-sitter into an embedded Kuzu graph database with vector embeddings, then exposes a local MCP server that Claude Code, Cursor, and Windsurf can call to answer structural questions about your code.
Zero infrastructure. The graph lives at ~/.codegraph/. No Docker, no external services, no cloud.
What you can ask
Once connected, ask your AI assistant questions like:
- “What calls
useAuthin this repo?” - “Show me the full component tree rooted at
App.” - “What’s the blast radius of renaming
formatPrice?” - “Find all symbols semantically similar to ‘JWT auth helper’.”
- “What are the transitive dependencies of
src/lib/db.ts?”
Behind the scenes, the assistant picks from 10 typed MCP tools that translate to Cypher queries against your indexed graph — no LLM hallucination about your code structure.
Install
npm i -g @leanlabsinnov/codegraph
Requires Node.js 20+. Works on macOS, Linux, and Windows.
Quickstart
The fastest way to get started is codegraph run — a single command that handles setup, indexing, and serving:
codegraph run ~/my-project # setup + index + serve
codegraph run ~/my-project --watch # …and auto re-index on file changes
It will prompt for an LLM provider and API key if you haven’t configured one yet, run a quick self-test, incrementally index the repo, and boot the MCP server.
Manual setup (step by step)
# 1. Pick an LLM provider
codegraph config llm set byo-openai # also: byo-anthropic, byo-google, local-ollama
export OPENAI_API_KEY=sk-...
# 2. Verify the connection (5-token gen + 1 embedding round-trip)
codegraph config llm test
# 3. Index a repo — parses, extracts symbols/edges, and embeds everything
codegraph index ~/my-project
# 4. Boot the MCP server
codegraph serve
# → MCP server: http://127.0.0.1:3748/mcp
# → Bearer token: see ~/.codegraph/config.json
Then point your AI client at http://127.0.0.1:3748/mcp with the bearer token. See docs/clients.md for copy-paste config snippets for Claude Code, Cursor, and Windsurf.
Commands
| Command | Description |
|---|---|
codegraph run <path> | All-in-one: setup, incremental index, serve. Add --watch to auto re-index on changes |
codegraph run <path> --watch | Same as above, plus watches for file changes with 2s debounce |
codegraph index <path> | Walk the repo, parse JS/TS/Python, embed every symbol, write to the graph |
codegraph index <path> --incremental | Only re-index files that changed since last run |
codegraph index <path> --no-embed | Parse only — faster, semantic search disabled |
codegraph status <path> | Node/edge counts and embedding coverage for the indexed repo |
codegraph wipe [path] | Delete a repo’s graph rows (--yes skips confirmation), or the whole graph dir |
codegraph serve [--port N] [--host H] | Boot the MCP server (default port 3748) |
codegraph doctor | Health check: Node version, config, API keys, Kuzu write, LLM round-trip |
codegraph config show | Print the resolved ~/.codegraph/config.json |
codegraph config llm set [preset] | Switch LLM preset (interactive picker when no arg) |
codegraph config llm test | Round-trip the configured provider — one gen + one embed |
MCP Tools
The server exposes 10 tools over SSE on http://127.0.0.1:3748/mcp:
| Tool | Description |
|---|---|
search_symbol | Find symbols by name — exact, prefix, optional kind/path filter |
find_file | Locate files by path fragment |
search_semantic | Vector similarity search across all embedded symbols |
get_file_context | All imports, exports, and defined symbols for a file |
find_callers | Who calls a given function or symbol (via CALLS edges) |
get_component_tree | Recursive RENDERS descendants from a root component |
affected_by | Nodes reachable from a symbol via CALLS/IMPORTS/RENDERS |
get_dependencies | Direct and transitive IMPORTS of a file |
blast_radius | Reverse-BFS upstream dependent count (CALLS + IMPORTS + RENDERS) |
nl_query | Natural language → Cypher via LLM → validated → executed (read-only guard) |
LLM Providers
| Preset | Generation model | Embedding model | Dimensions |
|---|---|---|---|
byo-openai | gpt-4o-mini | text-embedding-3-small | 1536 |
byo-anthropic | claude-3-5-haiku-latest | text-embedding-3-small (OpenAI) | 1536 |
byo-google | gemini-1.5-flash-latest | text-embedding-004 | 768 |
local-ollama | qwen2.5-coder:14b | nomic-embed-text | 768 |
Switch providers with codegraph config llm set. Switching provider triggers a re-embed — every vector is tagged with provider:model:dimension; mismatched vectors never silently pollute search results.
Architecture
codegraph CLI
│
▼
ingestion ──── web-tree-sitter (parse JS/TS/Python)
│ ──── LLM router (embed all non-File symbols)
│
▼
Kuzu graph DB (~/.codegraph/graph)
│ ──── Symbol nodes (File, Function, Class, Interface,
│ Component, Route, Variable)
│ ──── Rel tables (IMPORTS, CALLS, RENDERS,
│ INHERITS, DEFINES, EXPORTS)
│
▼
MCP server (SSE · http://127.0.0.1:3748/mcp)
│ ──── 10 MCP tools (typed Cypher + vector search)
│ ──── in-memory LRU result cache (30 s TTL)
│ ──── bearer-token auth
▼
Claude Code / Cursor / Windsurf
How indexing works
- Walk — gitignore-aware file walk, filtered to
.ts/.tsx/.js/.jsx/.py - Parse — per-file
web-tree-sitterparse with lazy WASM grammar loading - Extract — 5-pass AST extraction per JS/TS file:
- Declarations → nodes +
DEFINES/EXPORTS/INHERITSedges - Import statements →
IMPORTSedges - Call expressions →
CALLSedges - JSX elements →
RENDERSedges - Route detection (Express + Next.js App/Pages router)
- Declarations → nodes +
- Resolve — cross-file edge resolution, tsconfig path alias support
- Embed — batch of 100 symbols per LLM call, format:
"${kind} ${name}\n${signature}\n${leadingComment}" - Write —
deleteByRepo()+upsertNodes()+upsertEdges()in Kuzu
Project Structure
codegraph/
├── packages/
│ ├── cli/ @leanlabsinnov/codegraph — published CLI (bundles all below)
│ ├── ingestion/ @codegraph/ingestion — tree-sitter parse + embed engine
│ ├── graph-db/ @codegraph/graph-db — Kuzu embedded DB client
│ ├── mcp-server/ @codegraph/mcp-server — MCP SSE server + 10 tools
│ ├── llm-router/ @codegraph/llm-router — multi-provider LLM abstraction
│ └── shared/ @codegraph/shared — types, schemas, constants
├── docs/
│ └── clients.md — client setup (Claude Code, Cursor, Windsurf)
├── fixtures/
│ ├── sample-app/ — deterministic Next.js + Express test fixture
│ └── sample-python/
└── scripts/
├── smoke-mcp.ts
└── smoke-tree-sitter.ts
Development
Prerequisites
- Node.js 20+
- pnpm 9+
Setup
git clone https://github.com/Cirilcetra/codegraph.git
cd codegraph
pnpm install
cp .env.example .env # add your API key
pnpm build
Scripts
| Script | Description |
|---|---|
pnpm build | Build all packages |
pnpm dev | Watch-mode build across all packages |
pnpm test | Run all tests (vitest) |
pnpm test:watch | Watch-mode tests |
pnpm typecheck | Type-check all packages |
pnpm lint | Biome lint |
pnpm format | Biome format (write) |
pnpm smoke | Run both smoke tests |
Running the MCP server locally
pnpm build
node packages/cli/dist/cli.js serve
Troubleshooting
Run codegraph doctor first — it covers 90% of issues (missing API key, unwriteable storage, wrong Node version).
For client-specific issues (token config, SSE connection, Cursor MCP setup), see docs/clients.md.
Roadmap
- Incremental delta re-indexing (
codegraph run --watch/codegraph index --incremental) - All-in-one
codegraph runcommand with auto-setup, serve, and file watcher - HNSW vector index (blocked on Kuzu upstream fixes #5965 / #6040)
- Web-based graph visualizer (Phase 4)
- Managed hosted option
Contributing
PRs welcome. Please run pnpm lint && pnpm typecheck && pnpm test before opening one.
License
MIT — see LICENSE