Reasoning models spend extra compute on a hidden “thinking” pass before producing the final answer. They’re slower and more expensive but solve harder problems. ClearMaas provides one unified syntax for controlling reasoning effort across every provider — pick whichever form fits your client.Documentation Index
Fetch the complete documentation index at: https://docs.clearmaas.com/llms.txt
Use this file to discover all available pages before exploring further.
Two ways to set effort
1. The reasoning_effort field (OpenAI shape)
Pass it on a Chat Completions request. Values: low, medium,
high (and minimal / max on some models).
- OpenAI o-series and gpt-5-pro family: forwarded as native
reasoning_effort. - Anthropic Claude: mapped to
thinking: {type: "enabled", budget_tokens: ...}with budgetslow→1280,medium→2048,high→4096. Forclaude-opus-4.6specifically, mapped tothinking: {type: "adaptive"}plusoutput_config.effort. - Google Gemini: mapped to
generationConfig.thinkingConfigwithincludeThoughts: trueand a thinking-level / budget set from the effort. - xAI Grok: forwarded for grok-3-mini family (which accepts
reasoning_effortnatively). - DeepSeek reasoner: model is reasoner-by-design;
reasoning_effortis a no-op.
2. The -{effort} model-name suffix
You can also bake the effort into the model name. Recognized suffixes:
-minimal / -low / -medium / -high / -max.
Reasoning model families in this deployment
OpenAI:openai/o1,o1-proopenai/o3,o3-mini,o3-mini-highopenai/o4-mini,o4-mini-highopenai/gpt-5-proandgpt-5.x-profamily
anthropic/claude-sonnet-4.6,claude-opus-4.6,claude-opus-4.7, etc. — pair withreasoning_effortor the-{effort}suffix.
google/gemini-2.5-pro,gemini-2.5-flash,gemini-3-pro-preview, etc. — pair withreasoning_effortor the-{effort}suffix.
deepseek/deepseek-reasoner— reasoner-by-design.
grok/grok-4-fast-reasoning,grok-4-1-fast-reasoninggrok/grok-3-minipaired withreasoning_effort: loworhigh
/v1/models for the live catalog.
Reasoning trace in the response
For OpenAI Responses API the model’s hidden reasoning is returned asreasoning items in the response output. For Anthropic via native
/v1/messages, thinking arrives as content_block entries of type
thinking. The gateway also surfaces a reasoning_content field on
chat-completion responses where the upstream provides one.
You can display the trace for transparency or ignore it in production.
Billing
Reasoning tokens are tracked separately oncompletion_tokens_details .reasoning_tokens in the response usage object — see
Operations / Billing & Usage.