How to Set Max Cost per LLM Request

Setting a max cost per request is one of the fastest ways to get LLM cost control without changing how your app calls models. You define a ceiling (e.g. $0.01 or $0.05 per call); the router only picks models under that cap. If none fit, the request fails instead of overspending.

Why cap per request?

No surprise bills. A single long prompt or bug can otherwise burn through budget in one go.
Predictable unit economics. You know the worst-case cost per request, which helps with pricing and quotas.
Safe defaults. New features or tenants can use a low cap until you’re ready to raise it.

How it works

You send a max cost (in dollars) with each request—e.g. in the request body as max_cost or in your routing config.
The router estimates cost per candidate model (using prompt length and expected output).
It filters to models under your cap and picks the best one by strategy (e.g. lowest cost, balanced).
After the call, you get actual cost (real tokens × model price) for logging and dashboards.

If no model is under the cap, the router returns an error instead of calling a model. That way you never exceed the limit you set.

Example: setting a cap in the API

With a routing layer like StepBlend, you pass max_cost in the request body. For example:

{
  "prompt": "Summarize this document...",
  "strategy": "lowest_cost",
  "max_cost": 0.01
}

Only models with estimated cost ≤ $0.01 are considered. The routing API docs describe the full request shape and response (including actual_cost and which model was used).

Choosing a cap

$0.001–0.005: Very cheap models only (e.g. mini/Flash tiers). Good for high-volume, low-stakes work.
$0.01–0.02: Mix of cheap and mid-tier. Good default for many apps.
$0.05+: Allows premium models when needed. Use for critical or complex tasks.

Combine with strategy: e.g. lowest_cost + max_cost: 0.01 for batch jobs, balanced + max_cost: 0.05 for user-facing flows.

Where to see cost

After each request you get estimated (pre-call) and actual (post-call) cost. Log these in your app or use a Control Center to see spend by request, model, and time. That visibility is what makes the cap meaningful—you can confirm no request exceeded your limit.

For plans and limits, see pricing. To try caps in the UI, use the Optimizer.

How to Set Max Cost per LLM Request

Why cap per request?

How it works

Example: setting a cap in the API

Choosing a cap

Where to see cost

Ready to add control to your AI calls?

Related posts

LLM Cost Control: How to Cap and Reduce AI API Spend

Multi-Model Routing: One API for OpenAI, Anthropic, and Google