Routing API Reference

Intelligent AI routing. We classify your prompt, score models by strategy, and execute using your stored API keys. Requires connected provider keys at /account/keys.

Using LangChain or the OpenAI SDK? StepBlend supports an OpenAI-compatible endpoint: POST /api/v1/chat/completions. Set base_url and use your StepBlend JWT as api_key. See OpenAI-compatible API for full docs and examples.

POST /api/route

Non-streaming. Returns full response in JSON.

POST https://stepblend.com/api/route

Headers

Authorization: Bearer YOUR_JWT Supabase session token
Content-Type: application/json

Request body

prompt (required) Your prompt text
strategy (optional) lowest_cost | balanced | max_reliability | fastest. Default: balanced
max_cost (optional) Max USD per request (e.g. 0.01)
force_model (optional) Override: provider:modelId or provider

Example

curl -X POST https://stepblend.com/api/route \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Summarize this article...", "strategy": "balanced", "max_cost": 0.01}'

Response (200)

The model is chosen using pre-run cost and latency estimates. After the run, actual_cost is the cost for this request (real input + output tokens). Alternatives include estimatedCost (pre-run) and cost_for_this_request (same input + actual output tokens) for apples-to-apples comparison.

estimated_cost — Pre-run estimate (used for model selection)
actual_cost — Cost for this request after completion (optional)
cost_note — Explains that selection is based on pre-run estimates
task_type — Classification: code | structured_extraction | summarization | creative | long_context_reasoning | general_qa
classification_confidence — 0–1; rule-based = 1, embedding-based = cosine similarity to category
metrics.latency_ms — Total response time (ms). metrics.ttft_ms — Time to first token (ms), when available.

{
  "result": "string",
  "model_used": "gpt-4.1-mini",
  "provider": "openai",
  "estimated_cost": 0.00042,
  "actual_cost": 0.00038,
  "cost_note": "Model selected using pre-run cost and latency estimates...",
  "fallback_used": false,
  "reasoning": "Task: summarization. Selected openai...",
  "task_type": "summarization",
  "classification_confidence": 0.92,
  "alternatives": [...],
  "metrics": { "latency_ms": 1200, "ttft_ms": 420 }
}

POST /api/recommend

Recommendation only — no LLM execution. Returns which model would be chosen, cost, task_type, classification_confidence, and alternatives. Use this to preview routing before calling /api/route.

POST https://stepblend.com/api/recommend

Headers

Authorization: Bearer YOUR_JWT — Supabase session token
Content-Type: application/json

Request body

Same as /api/route: prompt (required), strategy, max_cost, force_model.

Response (200)

{
  "model_used": "gemini-2.5-flash",
  "provider": "google",
  "estimated_cost": 0.00012,
  "reasoning": "Task: general_qa. Selected google...",
  "task_type": "general_qa",
  "classification_confidence": 0.85,
  "alternatives": [
    { "provider": "deepseek", "modelId": "deepseek-chat", "estimatedCost": 0.00014, "score": 0.68, "reliability_delta": -0.01 }
  ]
}

POST /api/route/stream

Streaming response. Same request as /api/route. Returns Server-Sent Events: meta (model, cost, task_type, classification_confidence), then delta (content chunks), then done. Use for low-latency UX.

POST https://stepblend.com/api/route/stream

Headers

Authorization: Bearer YOUR_JWT
Content-Type: application/json

SSE events

event: meta — model_used, provider, estimated_cost, task_type, classification_confidence
event: delta — content (string)
event: done — optional ttft_ms, latency_ms (full stream duration), actual_cost (cost for this request after completion). All measured from the real provider stream, not the SSE wrapper.
event: error — message

POST /api/route/demo

Public demo — no auth. Uses our demo keys; limited to cheap models (e.g. gpt-4.1-mini, gemini-2.5-flash, deepseek-chat, llama-3.3-70b). Rate limit: 5 requests per IP per day. Max 800 input tokens, 500 output tokens. Optional current_model for savings comparison.

POST https://stepblend.com/api/route/demo

Headers

No Authorization. Content-Type: application/json.

Request body

prompt (required)
strategy (optional) — default: balanced
current_model (optional) — e.g. GPT-4, Claude 3.5 Sonnet; used to compute savings_vs_current

Response (200)

Same shape as /api/route plus: demo_mode: true, savings_vs_current (if current_model was sent), rate_limit (remaining, limit). 429 when rate limit exceeded.

Error responses

All endpoints return JSON on error. Include an error field (string); many include an optional message with details.

400 Bad Request

Invalid JSON — Request body is not valid JSON.
prompt is required — Missing or empty prompt.
Invalid strategy — strategy must be one of: lowest_cost, balanced, max_reliability, fastest.
Demo only: Prompt too long — Demo allows up to 800 input tokens; response includes message with limit details.

401 Unauthorized

/api/route, /api/recommend, /api/route/stream, /api/v1/chat/completions (demo does not require auth).

Missing Authorization header — Send Authorization: Bearer YOUR_JWT.
Invalid or expired token — JWT is invalid or expired; re-authenticate.

429 Too Many Requests

/api/route, /api/route/stream, /api/v1/chat/completions — Monthly routing limit reached. Body: error, message, current, limit. Headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset (Unix timestamp). Resets at end of month (UTC).
/api/route/demo — Demo rate limit (5 requests per IP per day). Body: error, message. Same rate-limit headers.

500 Internal Server Error

Server or routing failure. Body: error (e.g. "Routing failed", "Configuration error", "Demo execution failed"). Retry or contact support.

Getting your JWT

Get your API token

Try Optimizer