The control plane for AI inference.

The right model per request. About half the bill.

Sending every request to one flagship model is how the bill balloons. Amperes routes each one to the cheapest model that can do the job, and escalates to the strong model when the output needs it — roughly half the cost at today's prices.

Indexed projection at current provider prices, versus sending all traffic to a flagship model (e.g. Claude Opus or GPT‑5). Lighter chat and agent traffic typically saves more; the free shadow audit measures your exact number. Our 10,000‑prompt benchmark hit 98% in the extreme single‑model case — see Benchmarks.

How Amperes picks the model.

Every request runs the same pipeline in a few milliseconds. Nothing is guessed, and every decision is logged with the reasoning behind it.

01ClassifyComplexity tier and task type, scored in-process in under a millisecond.

02GovernOptional PII, region, and HIPAA checks run before anything leaves.

03ScoreEvery eligible model is ranked on cost, task fit, health, and latency.

04RouteThe best-value model that clears the constraints handles the request.

05EscalateIf the output looks low-confidence, it retries on the strong model.

06LogThe decision and its reasoning are written to a tamper-evident trail.

The score, weighted

Cost45% Task fit25% Health20% Latency10%

The winning model's full score breakdown comes back on every response in the x-router-policy header, so routing stays explainable, not a black box.

See your savings before you change a thing.

Send a sample of last week's AI requests. We'll email back what you'd save with Amperes — and proof the quality holds. No setup.

What you see day-to-day

The dashboard.

One screen: where your AI money goes, what you're saving, and any problems — live.

amperes.pro/dashboard · live_routing · illustrative

Requests / hr

12,847

↑ 8.2% vs last hour

Avg cost / req

$0.0019

↓ ~50% vs baseline

Escalation rate

3.4%

→ steady

P95 latency

1.2 s

↓ 180 ms

Time	Task	Tier	Model	Cost	Saved
14:03:12	extraction	low	gpt-5-nano	$0.0011	$0.0039	json
14:03:09	coding	med	claude-sonnet-4-6	$0.0052	$0.0049
14:03:05	planning	high	claude-opus-4-7	$0.0189	—	escalated
14:03:02	qa	low	llama-3.1-8b	<$0.0001	$0.0009
14:02:58	summarization	low	claude-haiku-4-5	$0.0010	$0.0034
14:02:54	extraction	low	gpt-5-nano	$0.0011	$0.0039	pii redacted

CRITICAL openai/gpt-5-mini · p50 latency up 233%

Baseline 1,500 ms → recent 5,000 ms. 50/50 samples. Detected 11:47.

→ demoted in scorer · webhook fired to on-call

WARN anthropic/claude-sonnet-4-6 · error rate +6.8 pp

Baseline 0.4% → recent 7.2%. Detected 11:39.

→ health weight × 0.6 · 38% of coding moved to opus

Your compliance controls, enforced at the proxy.

Switch on the guardrails your security team needs, per account. Each one is enforced before a request ever reaches a provider.

PII

Detection & redaction

Eleven categories, including Luhn-checked card numbers. Block, redact, or allow per policy.

Residency

Region & HIPAA routing

Pin traffic to allowed regions or HIPAA-eligible models. Fail-closed by design.

Audit

Tamper-evident log

Every routing decision is hash-chained, so any later edit or deletion is detectable.

Attribution

Per-team cost

Signed request identity slices spend by team or user, without one API key per team.

Available per account. Off by default on trial keys, so nothing touches your traffic until you switch it on.

Cut your AI bill.
Keep the quality that matters.

One line. Three wins.

Plug in

We route

You save

The right model per request. About half the bill.

How Amperes picks the model.

Try it.

See your savings before you change a thing.

The dashboard.

Your compliance controls, enforced at the proxy.

Detection & redaction

Region & HIPAA routing

Tamper-evident log

Per-team cost

See it on
your traffic.

Cut your AI bill.Keep the quality that matters.

One line. Three wins.

Plug in

We route

You save

The right model per request. About half the bill.

How Amperes picks the model.

Try it.

See your savings before you change a thing.

The dashboard.

Your compliance controls, enforced at the proxy.

Detection & redaction

Region & HIPAA routing

Tamper-evident log

Per-team cost

See it onyour traffic.

Cut your AI bill.
Keep the quality that matters.

See it on
your traffic.