Skip to content

Voice WebSocket

Voice sessions use a WebSocket URL returned by POST /sessions. The protocol is based on Pipecat RTVI events.

Text-only sessions use a different transport (HTTP Server-Sent Events) with a simplified event shape — see the Chat API reference for that surface. Event names and payloads do not match between the two transports.

URLModality
wss://api.hyponema.ai/v1/sessions/{session_id}/ws?token=...Voice or voice+text
wss://api.hyponema.ai/v1/sessions/{session_id}/chat?token=...Text-only WebSocket

The session_id in the path must match the session returned by POST /sessions.

  1. Create a session from your backend.
  2. Pass the signed URL to the browser or client app.
  3. Open the WebSocket.
  4. Send {"type":"client-ready","data":{}}.
  5. Stream audio or text events.

Use PCM 16-bit signed little-endian mono audio at 16 kHz. Send audio chunks as base64-encoded bytes inside client-audio-data events.

{
"type": "client-audio-data",
"data": {
"audio": "..."
}
}
EventMeaning
bot-readyPipeline initialized.
bot-llm-textAssistant text chunk.
bot-tts-audioTTS audio chunk.
user-transcriptionSTT transcript update.
function-callLLM invoked a tool.
function-call-resultTool result returned.
metrics-dataProvider and runtime metrics.
errorRuntime error.
CodeMeaning
4001Invalid token, expired token, or session mismatch.
4003Origin denied.
4404Agent, version, persona, user, or session not found.
4500Voice runtime disabled.

If the WebSocket drops before a graceful close, reconnect with the same session ID and token within the configured resume window. Hyponema can restore the parked LLM context for that session.