Model Fallbacks

ClearMaas can try multiple models in order until one succeeds. Useful for resilience (if one provider is throttling or down) and for cost control (prefer a cheaper model, fall back to a stronger one if needed).

How to use

Put an ordered list of model IDs in extra_body.models and set extra_body.route to "fallback". The primary model field still matters — it’s the first attempt — but ClearMaas ignores it in favor of the chain if the chain is present.

response = client.chat.completions.create(
  model="openai/gpt-4o",
  messages=[{"role": "user", "content": "..."}],
  extra_body={
      "models": ["openai/gpt-4o", "anthropic/claude-haiku-4.5", "google/gemini-2.5-pro"],
      "route": "fallback",
  },
)

Rules

Maximum 5 models in the chain. Extras are silently truncated.
Recommended: all models in a chain should be the same endpoint type (all chat, or all image). Mixing a chat model with an image model won’t crash the gateway, but the fallback that actually serves the request needs to match the endpoint you called (e.g. if you call /v1/chat/completions, only chat models in the chain are usable).
Fallback behavior:
- Unresolvable clearmaas/{name} entries (bad name, disabled router) are silently skipped.
- Models the calling key cannot access (model-allowlist mismatch) are silently skipped.
- When the primary model fails upstream (5xx / 429 / network error), the next chain entry is tried.
- The request fails only when every chain entry has been exhausted.
- Streaming caveat: once any byte of the response has been sent to the client, fallback can no longer kick in — if the upstream drops mid-stream, the client sees a truncated stream, not a transparent retry on the next model.
Billing happens for the model that actually served the response, at that model’s rate — not the primary’s.
extra_body.route must be exactly "fallback" for the chain to activate. Any other value (or missing) → the chain is ignored and only the top-level model is used.

How to tell which model served the response

Check the X-Clear-Fallback-Level and X-Clear-Fallback-Model response headers. See Response Headers.

response = client.chat.completions.with_raw_response.create(...)
served_by = response.headers.get("X-Clear-Fallback-Model", "primary")
# "primary" means level 0; otherwise the fallback model name

When not to use this

If you want ClearMaas to automatically pick the cheapest available model without writing a chain, use clearmaas/auto instead. Fallback chains are for cases where you want explicit control over the ordering.

Getting started

Routing

Advanced

Native Formats

Compatibility

Operations

Other

How to use

Rules

How to tell which model served the response

When not to use this

Getting started

Routing

Advanced

Native Formats

Compatibility

Operations

Other

Documentation Index

​How to use

​Rules

​How to tell which model served the response

​When not to use this

How to use

Rules

How to tell which model served the response

When not to use this