Text-to-speech

ClearMaas exposes two paths for text-to-speech depending on which provider’s TTS model you want to use.

OpenAI-shape: `/v1/audio/speech`

Use this with OpenAI’s TTS model family — openai/tts-1, openai/tts-1-hd, openai/gpt-4o-mini-tts, and similar:

curl https://api.clearmaas.com/v1/audio/speech \
  -H "Authorization: Bearer sk-clearmaas-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/tts-1",
    "input": "Hello, world!",
    "voice": "alloy"
  }' \
  --output speech.mp3

The response is binary audio. The Content-Type header tells you which format the upstream returned (audio/mpeg, audio/wav, audio/opus, or audio/flac).

Gemini TTS: native `/v1beta/`

Gemini TTS preview models (e.g. google/gemini-2.5-flash-preview-tts) are not served on /v1/audio/speech — call them through Gemini’s native surface instead:

curl "https://api.clearmaas.com/v1beta/models/google/gemini-2.5-flash-preview-tts:generateContent" \
  -H "Authorization: Bearer sk-clearmaas-..." \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Hello, world!"}]}],
    "generationConfig": {
      "responseModalities": ["AUDIO"],
      "speechConfig": {
        "voiceConfig": {"prebuiltVoiceConfig": {"voiceName": "Kore"}}
      }
    }
  }'

Audio bytes come back as base64 inside the response’s inlineData field. See Google’s Gemini TTS docs for the full set of voice names and configuration options.

Getting started

Routing

Advanced

Native Formats

Compatibility

Operations

Other

OpenAI-shape: `/v1/audio/speech`

Gemini TTS: native `/v1beta/`

See also

Getting started

Routing

Advanced

Native Formats

Compatibility

Operations

Other

Documentation Index

​OpenAI-shape: /v1/audio/speech

​Gemini TTS: native /v1beta/

​See also

OpenAI-shape: `/v1/audio/speech`

Gemini TTS: native `/v1beta/`

See also