Documentation

Amperes API reference

OpenAI-compatible inference control plane. Drop-in replacement for api.openai.com with smart routing, governance, and observability.

Quickstart

Amperes runs in one of two deployment modes today. The API is identical across both — only the base_url changes.

If you want…	Use	Onboarding
Fastest start, full visibility	Hosted	~5 minutes · we provision a key
Zero data egress, your AWS account	In-VPC deploy	~30 min · CloudFormation in your AWS account

An in-process SDK that keeps the routing decision inside your service (we only receive metadata) is in development. Book a demo if you'd like to be on the early-access list.

1. Get an API key

Self-serve: Sign in with Google and create a key from your account in about 30 seconds. Keys are shown once, and you can create, name, and revoke them yourself.

Larger volume, more providers, or an in-VPC deployment? Book a demo — limits are sized per pilot, and In-VPC keys come with the CloudFormation stack.

2. Set environment variables

$ export AMPERES_API_KEY="amperes-..." $ export AMPERES_BASE_URL="https://api.amperes.pro/v1" # Hosted SaaS # Or your in-VPC ALB URL: https://amperes.<your-domain>.internal/v1

3. Swap one line in your code

from openai import OpenAI import os client = OpenAI( api_key=os.environ["AMPERES_API_KEY"], base_url=os.environ["AMPERES_BASE_URL"], # ← only change ) response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "Parse this CSV..."}], ) print(response.choices[0].message.content) print(response.headers.get("x-router-model")) # which model we routed to

import OpenAI from 'openai'; const client = new OpenAI({ apiKey: process.env.AMPERES_API_KEY, baseURL: process.env.AMPERES_BASE_URL, // ← only change }); const response = await client.chat.completions.create({ model: 'auto', messages: [{ role: 'user', content: 'Parse this CSV...' }], }); console.log(response.choices[0].message.content);

$ curl $AMPERES_BASE_URL/chat/completions \ -H "Authorization: Bearer $AMPERES_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Parse this CSV..."}] }'

That's it. Existing call sites work without modification. The response shape is byte-identical to OpenAI's.

Authentication

All non-public endpoints require a bearer token in the Authorization header.

Authorization: Bearer amperes-...

Keys are per-customer. Lost or compromised keys can be rotated by booking a quick call. We hash keys before storage (SHA-256, hmac.compare_digest on lookup) so a SQL leak cannot recover them.

What's public

One endpoint requires no auth: GET /health. Use it for load-balancer liveness probes.

Deployment modes

Same routing engine. Two places it can live today. The choice is about where your prompt data is processed — not what the product does.

Hosted

~5-minute onboarding · your traffic passes through our proxy

The fastest start. We run the proxy, dashboard, and audit log on AWS in us-east-2. By default we store only prompt hashes (SHA-256); full prompt storage is opt-in per customer via store_full_prompts=true.

SOC 2 Type I is planned (not yet started). If procurement needs a current attestation today, In-VPC mode keeps your data in your own account.

$ curl https://api.amperes.pro/v1/chat/completions \ -H "Authorization: Bearer $AMPERES_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "auto", "messages": [...]}'

Good fit for: AI-native startups and mid-market teams where the data-residency question is "US is fine."

In-VPC deploy · CloudFormation

~30-minute deploy · zero data egress to Amperes

Amperes runs as an ECS Fargate task inside your AWS account. The stack (infra/aws/cloudformation.yaml) provisions a VPC, RDS Postgres (encrypted at rest), a Bedrock VPC endpoint, a scoped IAM role, and an internal ALB.

Your inference traffic stays inside your VPC end to end — we never see a token. We can publish the proxy image to your private ECR or supply a tagged build on request.

$ aws cloudformation deploy \ --template-file infra/aws/cloudformation.yaml \ --stack-name amperes-prod \ --parameter-overrides \ VpcId=vpc-0123abcd \ SubnetIds=subnet-aaa,subnet-bbb \ ContainerImage=<your-ecr-uri>/amperes-proxy:<tag> \ BedrockRegion=us-east-1 \ --capabilities CAPABILITY_NAMED_IAM

Your application talks to the internal ALB URL (for example https://amperes.internal.acme.com/v1). All other code is unchanged from the Hosted snippet above — same headers, same response shape.

Good fit for: healthcare (HIPAA), fintech, insurance, government, any F500 where a CISO has to sign off. Book a demo to get the template walked through.

Chat completions

POST/v1/chat/completions

The primary routing endpoint. Mirrors OpenAI's chat completion API; accepts every field OpenAI does. The proxy classifies, picks a model, calls upstream, and returns the response unchanged.

Request body

Same as OpenAI's spec, with two notable behaviors:

model accepts the literal string "auto" to let the router decide. Naming a specific model ID is currently advisory — the router may still re-route for cost, provider health, or availability. The model actually used is always returned in the x-router-model response header; when it differs from the one you named, x-router-requested-model and x-router-model-overridden: true are also set. (Hard model pinning is on the roadmap.)
Credential / mock / callback fields are stripped before reaching upstream.

Streaming

Set "stream": true for SSE. The proxy forwards each chunk as it arrives, preserves OpenAI's data: {...}\n\n framing, and emits data: [DONE] at the end. Time to first token (TTFB) and total latency are logged separately.

Tools

Tool calls work end-to-end. Models that don't support tools are filtered out of the candidate set automatically.

Embeddings

POST/v1/embeddings

Passthrough to your preferred embedding provider with governance enforcement. PII detection and region constraints apply. HIPAA-only customers receive 403 until a HIPAA-eligible embedding provider is configured.

Why governance on embeddings?

Embeddings move the prompt data through the same network and providers as chat. Without governance, a HIPAA customer could embed patient notes through a US-only model — same compliance risk as a chat completion. We close that gap.

Model registry

GET/v1/models

OpenAI-shape model list with our routing metadata attached. Each model entry:

{ "id": "claude-sonnet-4-6", "object": "model", "owned_by": "anthropic", "tier": "medium", "context_window": 200000, "supports_tools": true, "supports_streaming": true, "supports_structured_output": true, "cost_per_1m_input_usd": 3.00, "cost_per_1m_output_usd": 15.00, "regions": ["us", "eu", "global"], "hipaa_compliant": false, "soc2_compliant": true }

Response headers

Every chat completion (streaming or not) carries control-plane headers. Read them in your client to audit routing decisions:

Header	Value type	Description
x-router-model	string	The model that served this request
x-router-tier	low/medium/high	Classified complexity
x-router-task-type	string	coding / extraction / qa / planning / etc.
x-router-cost-usd	float	What this request actually cost
x-router-baseline-cost-usd	float	What the same prompt on Opus would have cost
x-router-policy	string	Decision reasoning with score breakdown
x-router-escalated	true / absent	Cheap-first failed confidence check; escalated to strong model
x-router-pii-detected	true / absent	PII detected and handled per policy
x-router-pii-categories	comma-list	email, phone, ssn, credit_card, …
x-router-governance-action	redacted / blocked	What we did with PII
x-router-agent-step	string	planning / retrieval_synthesis / formatting / tool_use
x-router-drift-active	comma-list	Model ids currently flagged by the drift detector (sent only when there is one)
x-router-request-id	uuid	For correlation with our audit log

Errors

OpenAI-shape error responses. Same JSON envelope, same client retry logic.

{ "error": { "message": "Rate limit: 60 requests/minute for chat", "type": "rate_limit_error", "code": 429 } }

Status	type	When
400	invalid_request_error	Bad JSON, missing fields, malformed messages
401	authentication_error	Missing / invalid Bearer token
403	permission_error	Tenant isolation, PII block, HIPAA mismatch, region mismatch
413	request_too_large	Body exceeds 5 MB
429	rate_limit_error	Per-customer rate limit; `Retry-After` header included
502	upstream_error	All routing candidates failed; non-retryable provider error

Policies

A policy is a tier-stratified allowlist of candidate models. The router picks the best candidate via multi-objective scoring.

Policy	Profile
balanced	Default. Cheapest viable per tier across providers.
aggressive	Maximum savings. Downshifts tiers when possible.
conservative	Quality-first. Upshifts tiers when in doubt.
anthropic_only	Anthropic direct + Bedrock-Claude.
openai_only	OpenAI direct.
bedrock_only	AWS Bedrock-Claude only. HIPAA-friendly.
local_only	On-prem models in your VPC. $0/token. HIPAA + EU + air-gap.
hybrid_local_cloud	Local for 80% of traffic; cloud Opus for the hardest 20%.

Task types

Detected from prompt content (regex + keyword scoring; embedding fallback). Drives task-affinity scoring in the policy.

qa · summarization · extraction · structured_output · coding · reasoning · planning · agentic · creative · tool_use · long_context · general

Confidence escalation

When eligible (tier > low, no tools, no structured output), the router calls a cheaper model first and inspects the response. Signals that trigger escalation:

Hedge markers ("I'm not sure", "perhaps")
Refusal markers ("I cannot", "I'm unable")
Truncation, repetition, or very short output
finish_reason of length or content_filter
Expected tool call missing
Response echoes the prompt

Below a configurable confidence threshold (default 0.55), we re-route to the strong model. Both calls are logged, and the dashboard tracks escalation rate so you can validate the trade is net-positive.

Provider failover

We track rolling-window error rate and p99 latency per provider. As a provider degrades, its score drops and traffic shifts to healthier candidates; when it goes "down", its models are excluded until health recovers.

PII detection

Regex-based detection across 11 categories. Luhn-validated credit card recognition. Actions configurable per customer:

action	behavior
redact	Replace matched spans with `[REDACTED_<CATEGORY>]` before sending upstream. Default.
block	403 the request entirely. Use for HIPAA / PCI workloads where leaks are unrecoverable.
allow	Log the detection but pass through. Audit trail only.

Region routing

Models carry region tags (us, eu, apac, global). Customers configure allowed_regions. Models whose region list doesn't intersect are filtered from candidates. Empty constraint = no restriction.

HIPAA mode

Set require_hipaa_models=true on the customer config and the router only selects models flagged hipaa_compliant: true — today the AWS Bedrock-hosted family (Claude Haiku / Sonnet / Opus, plus Amazon Nova Lite / Micro). The filter runs before scoring, so a HIPAA-on customer cannot route to a non-compliant model regardless of policy.

Audit log

Append-only record of every routing decision, escalation, fallback, PII redaction, and policy block. Queryable via GET /admin/audit?days=30. Exportable to S3 via POST /admin/export/s3.

Rate limits

Per-customer sliding-window. Default 60 req/min for chat, separate bucket for embeddings. Configurable on the CustomerConfig. 429 includes a Retry-After header.

In In-VPC mode the rate limit is yours to tune — you control the ECS task scaling.