← Docs/AI Routing

AI Routing

Automatically route every AI call to the best model using your own API keys. Reduce spend. Improve quality. Stay vendor-neutral.

How it works

  1. Classify — We detect your prompt type (code, structured_extraction, summarization, creative, long_context_reasoning, general_qa) with a hybrid classifier: fast rules for obvious cases, embedding similarity otherwise. Every response includes task_type and classification_confidence (0–1). Use force_model to override if you disagree.
  2. Score — Models are scored by strategy (cost, quality, latency). Quality uses a per-model, per-task-type reliability matrix (deterministic, no ML).
  3. Select — Best model within your cost cap is chosen. For lowest_cost, the literal cheapest model wins; other strategies use a weighted score (cost, quality, latency).
  4. Execute — We call the provider with your stored API key and return the result (or stream it).

Strategies

Choose how we prioritize models:

  • lowest_cost — Prioritize cheapest models (DeepSeek, Groq, GPT-4.1-mini)
  • balanced — Balance cost, quality, and latency
  • max_reliability — Prioritize quality over cost
  • fastest — Prioritize low-latency models (Groq, Gemini Flash)

Plans and request limits

Your plan sets the monthly routed-request cap and how many providers you can connect. Limits are enforced; over cap returns 429 with X-RateLimit-* headers. Resets at end of month (UTC).

  • Starter — 50,000 requests/month; up to 3 connected providers (OpenAI, Anthropic, Google).
  • Growth — 200,000 requests/month; unlimited providers; Control Center (request logs, visibility).
  • Scale — 750,000 requests/month; higher limits; see Pricing for details.

Control Center

Growth and Scale plans include the Control Center: a dashboard with request logs (model, provider, strategy, task_type, latency, cost), model usage, provider exposure, and strategy enforcement visibility. Use it to audit spend and tune routing.

Provider keys

Add your API keys at /account/keys. Keys are encrypted at rest and never exposed. We support OpenAI, Anthropic, Google (Gemini), Groq, and DeepSeek.

Force model override

Bypass routing and force a specific model: pass force_model: "provider:modelId" or force_model: "provider". We still normalize and log.

Failover

If the chosen model fails before streaming starts, we automatically retry with the second-ranked model. The response includes fallback_used: true when this happens.

API endpoints

All authenticated endpoints require Authorization: Bearer YOUR_JWT (Supabase session). Full details in the Routing API Reference.

  • POST /api/route — Non-streaming. Send prompt, strategy, optional max_cost and force_model. Get back result, model_used, task_type, classification_confidence, alternatives, etc.
  • POST /api/recommend — Same body as /api/route. Returns recommended model, cost, task_type, classification_confidence, and alternatives only; no LLM execution.
  • POST /api/route/stream — Same body. Returns Server-Sent Events: meta (model, task_type, classification_confidence, cost), then delta chunks, then done.
  • POST /api/route/demo — No auth. Public demo: our keys, cheap models only, 5 req/IP/day, 800 input / 500 output token limits. Optional current_model for savings comparison.