Run without an API key
Most AI dev tools require you to provision a new API key, justify the cost to your finance team, and renew it forever. Cascade has two paths that skip that entirely.
Option 1: Claude Code SDK
Section titled “Option 1: Claude Code SDK”If you already have Claude Code installed, Cascade can use it as the LLM transport. Your existing Claude subscription does the work. No additional API key.
# Install Claude Code if you haven't already# (follow the official Claude Code installation guide first)
# Install the Cascade extra for Claude Code supportpip install "cascade-agent[claude-code]"
# Configure Cascade to use itcascade configure llm claude_code --set-defaultThat’s it. No API key flag.
Verify
Section titled “Verify”cascade configure showYou should see:
LLM providers: claude_code: key=(not set) model=(default) base_url=(default)The (not set) for the key is expected. Claude Code handles auth internally.
Caveats
Section titled “Caveats”- Slightly less reliable structured output than the direct Anthropic API. Cascade asks Claude to return JSON in a code fence and parses it; the direct API uses native tool-use which is stricter. Most of the time this difference doesn’t matter, but for very long or complex schemas you may see occasional parse errors.
- Subject to your Claude Code subscription’s usage limits.
Option 2: Ollama or vLLM (fully local)
Section titled “Option 2: Ollama or vLLM (fully local)”If you want zero external dependencies, run an LLM locally with Ollama or vLLM. Cascade talks to them via their OpenAI-compatible API.
Setup with Ollama
Section titled “Setup with Ollama”# Install Ollama from https://ollama.ai# Pull a modelollama pull llama3.1
# Configure Cascadecascade configure llm ollama --model llama3.1 --set-defaultSetup with vLLM (or any OpenAI-compatible local server)
Section titled “Setup with vLLM (or any OpenAI-compatible local server)”cascade configure llm ollama \ --model your-model-name \ --base-url http://localhost:8000/v1 \ --set-defaultThe ollama provider in Cascade is just an OpenAI-compatible client with sensible Ollama defaults. Any local server that speaks OpenAI’s chat completion API will work.
Caveats
Section titled “Caveats”- Quality scales with the model. Small local models (under 7B params) struggle with complex stories. Use the largest model your hardware can run.
- Structured output reliability varies by model. Models trained with strong JSON-mode support (Llama 3.1 8B+, Qwen 2.5 7B+) work well.
- Slower than the API providers, especially on CPU.
Which to choose
Section titled “Which to choose”| Situation | Best choice |
|---|---|
| You have Claude Code installed and you’re an individual developer | Claude Code SDK |
| Your company blocks SaaS but you have GPUs | Ollama / vLLM with a 70B+ model |
| You want maximum reliability and don’t mind paying | Direct Anthropic / OpenAI / Google API |
| You’re testing Cascade with no commitment | Claude Code SDK or Ollama |
Switching providers per call
Section titled “Switching providers per call”Even with a default set, you can override per command:
cascade prompt "Add health endpoint" --model claude-opus-4-7Or change the default any time:
cascade configure llm anthropic --key sk-ant-xxx --set-default