Voices

Hyponema is vendor-neutral about voices. You pick a TTS provider and voice per agent, optionally with cascading fallbacks, and tune how the agent listens through a voice profile. You can also clone a branded voice into your provider account.

This guide covers the operator side — for clients, voice selection is part of agent configuration.

Voice catalog

The catalog lists every voice across every supported TTS provider you have credentials for: Cartesia, ElevenLabs, OpenAI, Deepgram, Groq, Google Cloud. Bookmark voices you like for quick selection later.

Open Voices in the dashboard to browse, preview audio, and bookmark. Each voice carries a stable composite ID (provider:voice_id) that the agent’s voice stack references.

Voice stack on the agent

A voice stack on an agent declares the providers and voices used at runtime:

STT — provider, model, primary languages, ordered fallbacks.
LLM — provider, model, ordered fallbacks (and optional custom OpenAI-SSE-compatible endpoint).
TTS — provider, voice, ordered fallbacks. Each fallback carries its own voice ID — voices do not map across providers.
Soft timeout — static filler text or LLM-generated filler when the user goes silent past after_ms.
Pronunciation dictionary (optional) — per-tenant, applied at synthesis.

Configure the stack from the agent’s Voice tab.

Voice profiles

A voice profile is a per-workspace preset of voice-runtime knobs that an agent can reference: noise suppression, VAD timing, idle-prompt behavior, speech timeouts, audio recording defaults. Use them when several agents should share the same audio behavior.

Create one through Voices → New voice profile, then select it on the agent.

For per-conversation listening behavior (interruption policy, first-message protection, VAD sensitivity), see Listening profiles — that’s a separate concept.

Voice cloning (instant voice cloning)

Hyponema can drive instant voice cloning against ElevenLabs and Cartesia from inside the dashboard. Open Voices → Clone, upload an audio sample (≤ 12 MB, ≤ ~5 minutes), pick the provider, and submit.

What happens behind the scenes:

The audio sample passes through Hyponema → the provider.
The provider creates the voice in your provider account (BYO keys).
Hyponema persists only the reference (provider, voice_id, display name).
The original sample is discarded after the clone succeeds.

The cloned voice shows up in the catalog like any other voice and can be selected on an agent’s voice stack. You can also register a manually-created provider voice (one you cloned in the provider dashboard) by adding it as a manual voice asset.

Preview and rate limits

Voice previews go through POST /workspaces/{ws}/voice-profiles/preview and return MPEG audio. The dashboard uses the same endpoint. Previews are rate-limited per workspace to protect provider quotas.

Fallback behavior

When the primary TTS provider fails (timeout, 5xx, rate limit), the runtime advances to the next entry in the fallback chain. The same applies to STT and LLM. Cascading is the difference between a clean fallback in production and a dropped call.

Test cascading by configuring a primary you can disable temporarily and confirming the trace shows a clean handoff.

Operational guidance

BYO provider keys keep cost transparent — provider spend goes straight to your account.
Use a stable fallback voice across agents so users never hear a wildly different tone after a primary outage.
Re-run preview after switching pronunciation dictionary versions.
For phone agents, validate cloning quality on a real PSTN call before going live; PSTN audio compression hides nuances that browsers expose.