Use cases

Workloads we route.

Different workloads need different trade-offs between cost, quality, and speed. One router covers all six — what changes is the policy and which models are eligible. These are the shapes most inference traffic falls into.

01 · Support & chat

High volume, mostly easy answers.

Easy turns go to a cheap model; escalate to a strong one only when it hedges. PII redacted inline.

cheap-first · auto-escalate

02 · RAG & document Q&A

Long context, structured extraction.

Cost scales with input tokens. Score by task and upshift automatically as the context grows.

task-affinity routing

03 · Code & copilot

Snippets cheap, refactors strong.

Completions to cheap models, refactors to Sonnet, hard bugs to Opus. Streaming passes straight through.

per-step model choice

04 · Agentic workflows

Plan with strong, format with cheap.

5–15 calls per task, mostly formatting. Route each step to the right tier; tag spend per team.

step-aware routing

05 · Batch processing

Async, high volume, latency-tolerant.

Runs the aggressive policy — maximum tier-down, with quality checked against the shadow baseline.

aggressive policy

06 · Regulated workloads

HIPAA, finance, gov.

In-VPC mode runs inside your own AWS account — data never leaves your cloud. Same routing, same audit.

In-VPC · HIPAA

Your workload
looks different?

The shadow audit characterizes your real traffic and proposes the right policy. Most inference traffic falls into one of these shapes.

Book a demo