Routing API Reference
Intelligent AI routing. We classify your prompt, score models by strategy, and execute using your stored API keys. Requires connected provider keys at /account/keys.
Using LangChain or the OpenAI SDK? StepBlend supports an OpenAI-compatible endpoint: POST /api/v1/chat/completions. Set base_url and use your StepBlend JWT as api_key. See OpenAI-compatible API for full docs and examples.
POST /api/route
Non-streaming. Returns full response in JSON.
POST https://stepblend.com/api/routeHeaders
Authorization: Bearer YOUR_JWTSupabase session tokenContent-Type: application/json
Request body
prompt(required) Your prompt textstrategy(optional) lowest_cost | balanced | max_reliability | fastest. Default: balancedmax_cost(optional) Max USD per request (e.g. 0.01)force_model(optional) Override: provider:modelId or provider
Example
curl -X POST https://stepblend.com/api/route \
-H "Authorization: Bearer YOUR_JWT" \
-H "Content-Type: application/json" \
-d '{"prompt": "Summarize this article...", "strategy": "balanced", "max_cost": 0.01}'Response (200)
The model is chosen using pre-run cost and latency estimates. After the run, actual_cost is the cost for this request (real input + output tokens). Alternatives include estimatedCost (pre-run) and cost_for_this_request (same input + actual output tokens) for apples-to-apples comparison.
estimated_cost— Pre-run estimate (used for model selection)actual_cost— Cost for this request after completion (optional)cost_note— Explains that selection is based on pre-run estimatestask_type— Classification: code | structured_extraction | summarization | creative | long_context_reasoning | general_qaclassification_confidence— 0–1; rule-based = 1, embedding-based = cosine similarity to categorymetrics.latency_ms— Total response time (ms).metrics.ttft_ms— Time to first token (ms), when available.
{
"result": "string",
"model_used": "gpt-4.1-mini",
"provider": "openai",
"estimated_cost": 0.00042,
"actual_cost": 0.00038,
"cost_note": "Model selected using pre-run cost and latency estimates...",
"fallback_used": false,
"reasoning": "Task: summarization. Selected openai...",
"task_type": "summarization",
"classification_confidence": 0.92,
"alternatives": [...],
"metrics": { "latency_ms": 1200, "ttft_ms": 420 }
}POST /api/recommend
Recommendation only — no LLM execution. Returns which model would be chosen, cost, task_type, classification_confidence, and alternatives. Use this to preview routing before calling /api/route.
POST https://stepblend.com/api/recommendHeaders
Authorization: Bearer YOUR_JWT— Supabase session tokenContent-Type: application/json
Request body
Same as /api/route: prompt (required), strategy, max_cost, force_model.
Response (200)
{
"model_used": "gemini-2.5-flash",
"provider": "google",
"estimated_cost": 0.00012,
"reasoning": "Task: general_qa. Selected google...",
"task_type": "general_qa",
"classification_confidence": 0.85,
"alternatives": [
{ "provider": "deepseek", "modelId": "deepseek-chat", "estimatedCost": 0.00014, "score": 0.68, "reliability_delta": -0.01 }
]
}POST /api/route/stream
Streaming response. Same request as /api/route. Returns Server-Sent Events: meta (model, cost, task_type, classification_confidence), then delta (content chunks), then done. Use for low-latency UX.
POST https://stepblend.com/api/route/streamHeaders
Authorization: Bearer YOUR_JWTContent-Type: application/json
SSE events
event: meta—model_used,provider,estimated_cost,task_type,classification_confidenceevent: delta—content(string)event: done— optionalttft_ms,latency_ms(full stream duration),actual_cost(cost for this request after completion). All measured from the real provider stream, not the SSE wrapper.event: error—message
POST /api/route/demo
Public demo — no auth. Uses our demo keys; limited to cheap models (e.g. gpt-4.1-mini, gemini-2.5-flash, deepseek-chat, llama-3.3-70b). Rate limit: 5 requests per IP per day. Max 800 input tokens, 500 output tokens. Optional current_model for savings comparison.
POST https://stepblend.com/api/route/demoHeaders
No Authorization. Content-Type: application/json.
Request body
prompt(required)strategy(optional) — default: balancedcurrent_model(optional) — e.g. GPT-4, Claude 3.5 Sonnet; used to computesavings_vs_current
Response (200)
Same shape as /api/route plus: demo_mode: true, savings_vs_current (if current_model was sent), rate_limit (remaining, limit). 429 when rate limit exceeded.
Error responses
All endpoints return JSON on error. Include an error field (string); many include an optional message with details.
400 Bad Request
Invalid JSON— Request body is not valid JSON.prompt is required— Missing or emptyprompt.Invalid strategy—strategymust be one of: lowest_cost, balanced, max_reliability, fastest.- Demo only:
Prompt too long— Demo allows up to 800 input tokens; response includesmessagewith limit details.
401 Unauthorized
/api/route, /api/recommend, /api/route/stream, /api/v1/chat/completions (demo does not require auth).
Missing Authorization header— SendAuthorization: Bearer YOUR_JWT.Invalid or expired token— JWT is invalid or expired; re-authenticate.
429 Too Many Requests
- /api/route, /api/route/stream, /api/v1/chat/completions — Monthly routing limit reached. Body:
error,message,current,limit. Headers:X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset(Unix timestamp). Resets at end of month (UTC). - /api/route/demo — Demo rate limit (5 requests per IP per day). Body:
error,message. Same rate-limit headers.
500 Internal Server Error
Server or routing failure. Body: error (e.g. "Routing failed", "Configuration error", "Demo execution failed"). Retry or contact support.
Getting your JWT
Get your API token
Sign in first, then click below to fetch and copy your JWT.