Skip to content

Run without an API key

Most AI dev tools require you to provision a new API key, justify the cost to your finance team, and renew it forever. Cascade has two paths that skip that entirely.

If you already have Claude Code installed, Cascade can use it as the LLM transport. Your existing Claude subscription does the work. No additional API key.

Terminal window
# Install Claude Code if you haven't already
# (follow the official Claude Code installation guide first)
# Install the Cascade extra for Claude Code support
pip install "cascade-agent[claude-code]"
# Configure Cascade to use it
cascade configure llm claude_code --set-default

That’s it. No API key flag.

Terminal window
cascade configure show

You should see:

LLM providers:
claude_code: key=(not set) model=(default) base_url=(default)

The (not set) for the key is expected. Claude Code handles auth internally.

  • Slightly less reliable structured output than the direct Anthropic API. Cascade asks Claude to return JSON in a code fence and parses it; the direct API uses native tool-use which is stricter. Most of the time this difference doesn’t matter, but for very long or complex schemas you may see occasional parse errors.
  • Subject to your Claude Code subscription’s usage limits.

If you want zero external dependencies, run an LLM locally with Ollama or vLLM. Cascade talks to them via their OpenAI-compatible API.

Terminal window
# Install Ollama from https://ollama.ai
# Pull a model
ollama pull llama3.1
# Configure Cascade
cascade configure llm ollama --model llama3.1 --set-default

Setup with vLLM (or any OpenAI-compatible local server)

Section titled “Setup with vLLM (or any OpenAI-compatible local server)”
8000/v1
cascade configure llm ollama \
--model your-model-name \
--base-url http://localhost:8000/v1 \
--set-default

The ollama provider in Cascade is just an OpenAI-compatible client with sensible Ollama defaults. Any local server that speaks OpenAI’s chat completion API will work.

  • Quality scales with the model. Small local models (under 7B params) struggle with complex stories. Use the largest model your hardware can run.
  • Structured output reliability varies by model. Models trained with strong JSON-mode support (Llama 3.1 8B+, Qwen 2.5 7B+) work well.
  • Slower than the API providers, especially on CPU.
SituationBest choice
You have Claude Code installed and you’re an individual developerClaude Code SDK
Your company blocks SaaS but you have GPUsOllama / vLLM with a 70B+ model
You want maximum reliability and don’t mind payingDirect Anthropic / OpenAI / Google API
You’re testing Cascade with no commitmentClaude Code SDK or Ollama

Even with a default set, you can override per command:

Terminal window
cascade prompt "Add health endpoint" --model claude-opus-4-7

Or change the default any time:

Terminal window
cascade configure llm anthropic --key sk-ant-xxx --set-default