Early adopter pricing — first 50 customers lock this rate for life.
Pricing
Predictable pricing. No token markup.
You bring your own API keys. We optimize every call.
All plans support the OpenAI-compatible endpoint (/api/v1/chat/completions).
Free
Free
ForeverTry routing with your own keys. Perfect for testing and small projects.
- Up to 1,000 routed requests / month (enforced)
- Up to 2 connected providers (enforced)
- Deterministic routing: lowest_cost, balanced, max_reliability, fastest
Starter
For early-stage AI SaaS teams validating model optimization.
$49/ month
- Up to 50,000 routed requests / month (enforced)
- Up to 3 connected providers — OpenAI, Anthropic, Google (enforced)
- Deterministic routing: lowest_cost, balanced, max_reliability, fastest
- Streaming + non-streaming endpoints (POST /api/route, /api/route/stream)
- Cost estimation per request + model comparison (alternatives)
- Confidence scoring + task_type in response
- Pre-stream fallback protection
- Recommendation-only endpoint (POST /api/recommend)
- Email support
Best for teams spending $1k–$5k/month on LLM APIs.
Most Popular
Growth
For production AI systems requiring reliability and visibility.
$149/ month
- Up to 200,000 routed requests / month (enforced)
- Unlimited connected providers (OpenAI, Anthropic, Google, Groq, DeepSeek)
- Advanced cost controls (max_cost enforcement)
- force_model override
- Request logs stored per request (view in Control Center)
- Latency metrics per request (metrics.latency_ms)
- Fallback indicator (fallback_used in response)
- Priority email support
Best for teams spending $5k–$25k/month on LLM APIs.
Scale
For high-volume vertical AI platforms and infrastructure teams.
$399/ month
- Up to 750,000 routed requests / month (enforced)
- Higher rate limits
- 99.9% routing availability SLA
- Request logs (view in Control Center)
- Provider performance metrics (roadmap)
- Dedicated Slack support
- Early access to optimization updates
Best for teams spending $25k+/month on LLM APIs.
Contact SalesWhat unlocks in Growth?
- Full Control Center dashboard
- Detailed request logs
- Provider exposure tracking
- Strategy enforcement visibility
- Operational metrics per request
Starter gives you routing.
Growth gives you control.
Enterprise
Custom pricing
- Unlimited routing volume
- Dedicated infrastructure
- Custom provider integrations
- VPC / private deployment options
- Custom SLA
- Dedicated technical support
How Billing Works
- StepBlend charges a fixed monthly subscription.
- You provide your own API keys.
- You are billed directly by OpenAI, Anthropic, Google, etc.
- StepBlend never marks up compute costs.
- No per-token billing from us.
Infrastructure pricing. No surprises.
Frequently Asked Questions
- Do you charge per token?
- No. You are billed directly by model providers using your own API keys.
- What counts as a routed request?
- Each call to the OpenAI-compatible endpoint (/api/v1/chat/completions) or the native API (/api/route or /api/route/stream) counts as one routed request. Fallback retries count as a single request.
- Do I need multiple providers connected?
- Full optimization works best with at least two providers connected.
- Can I force a specific model?
- Yes. Use force_model to override routing.
- Is routing deterministic?
- Yes. The same input, strategy, and constraints will produce the same model selection.
- What happens when I hit my monthly request cap?
- The API returns 429 with X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers. Your limit resets at the end of the current month (UTC).
Ready to optimize your AI spend?
Try the Optimizer free. Add your keys and see routing in action.
Try Optimizer