Audio input is supported by Gemini multimodal models. Two paths:Documentation Index
Fetch the complete documentation index at: https://docs.clearmaas.com/llms.txt
Use this file to discover all available pages before exploring further.
Path 1: OpenAI-shape input_audio on /v1/chat/completions
The gateway translates the OpenAI input_audio content part to
Gemini’s inline_data automatically. The format field maps to
the right MIME type (mp3 → audio/mp3, wav → audio/wav, etc.).
Path 2: Native /v1beta/ with inline_data
If you’re already on Gemini’s native protocol, pass inline_data
directly — no translation involved.
Supported model families
Gemini multimodal models accept inline audio — for examplegoogle/gemini-2.5-flash and the Gemini 3.x line. Behavior matches
Google’s published Gemini API exactly.