Streaming (WebSocket)
Transcribe audio in real time using the WebSocket streaming API.
The streaming API lets you send audio data in real time and receive transcription results as they are produced. This is ideal for live transcription, voice interfaces, and real-time captioning.
Connection Lifecycle
A streaming session follows this lifecycle:
- Connect — Open a WebSocket connection and authenticate with
Authorization: Bearer - Ready — Server sends a
readymessage confirming the session is active - Stream — Send audio chunks and receive transcription results
- End — Send an
ENDcommand to finalize and receive remaining results - Close — Server closes the connection after final results are sent
Connecting
Connect to the WebSocket endpoint and authenticate using the Authorization: Bearer header.
wss://asr.lesan.ai/v1/ws/transcribe?language=am&format=pcm_s16leNote: Replace wss://asr.lesan.ai with your WebSocket server URL (e.g., wss://asr.lesan.ai for production). Most browsers cannot set custom headers (including Authorization) on WebSocket connections; for browser apps, use a server-side proxy that injects the header.
Query parameters:
- language (optional) — Language code:
am,ti,so, oren - format (optional) — Audio format. Default:
pcm_s16le. See Audio Formats - turn_detection (optional) — Set to
server_vadto enable server-side turn detection - vad_type (optional) — VAD algorithm:
energy(default) orsilero
Client Commands
Send these text commands to control the streaming session:
FORMAT
Change the audio format mid-session. Send as a text message before the next audio chunk:
FORMAT:webm_opusTRANSCRIBE
Force transcription of buffered audio without ending the session. Useful for getting intermediate results:
TRANSCRIBEEND
Signal that audio streaming is complete. The server will process any remaining audio and send final results before closing:
ENDCLEAR
Discard all buffered audio without transcribing. Useful for cancelling or resetting:
CLEARPING
Keep the connection alive. The server will respond with a pong message:
PINGServer Messages
The server sends JSON messages with a type field:
ready
Sent when the connection is established and the session is ready to receive audio:
{
"type": "ready",
"session_id": "sess_abc123",
"format": "pcm_s16le",
"sample_rate": 16000
}chunk_received
Acknowledgement that an audio chunk was received and buffered:
{
"type": "chunk_received",
"bytes_received": 32000,
"buffer_duration_ms": 2000
}transcription
A transcription result, either partial or final:
{
"type": "transcription",
"transcription": "ሰላም እንዴት ነህ",
"language": "am",
"is_final": false,
"is_turn": false,
"duration_seconds": 2.5,
"processing_time_seconds": 0.3,
"audio_size_bytes": 80000,
"session_id": "550e8400-e29b-41d4-a716-446655440000"
}- is_final —
falsefor partial results that may change,truefor final results - is_turn —
trueif the server considers this transcription a completed VAD turn - duration_seconds — Amount of audio included in this transcription event
- processing_time_seconds — Server processing time for this event
turn_detected
Sent when Voice Activity Detection (VAD) detects a speaker turn boundary:
{
"type": "turn_detected",
"transcription": "how are you",
"language": "en",
"turn_start_ms": 0,
"turn_end_ms": 1500,
"duration_seconds": 1.5,
"processing_time_seconds": 0.2,
"session_id": "550e8400-e29b-41d4-a716-446655440000"
}pong
Response to a PING command:
{ "type": "pong" }error
An error occurred during the session:
{
"type": "error",
"error": "Unsupported format: flac",
"session_id": "550e8400-e29b-41d4-a716-446655440000"
}Session Results & Audio Playback
When the server sends a ready message, it includes a job_id. After the session ends (via the END command), the server finalizes the job and persists both the audio recording and transcript to cloud storage. You can retrieve the completed results by polling the transcription endpoint:
GET /v1/transcriptions/{job_id}Once the job status is completed, the response includes storage URLs:
{
"id": "txn_550e8400-e29b-41d4-a716-446655440000",
"object": "transcription",
"status": "completed",
"language": "am",
"text": "ሰላም እንዴት ነህ",
"segments": [...],
"duration_seconds": 10.5,
"audio_url": "lesan://streaming/audio/550e8400.opus",
"audio_download_url": "/v1/transcriptions/txn_550e8400-.../audio",
"result_url": "https://storage.../streaming/550e8400.json",
"url": "/v1/transcriptions/txn_550e8400-..."
}- audio_url — Canonical
lesan://URI. A stable, storage-agnostic identifier for the recorded audio. Never expires. Used internally by the API and by native SDKs. - audio_download_url — Stable API path for downloading the audio. Returns a
302redirect to a fresh signed URL (1-hour TTL) on every request. Web clients should use this for playback. - result_url — URL to the full transcript JSON artifact in cloud storage.
Audio playback in web apps
The audio_download_url works directly as an audio source. The browser follows the 302 redirect transparently:
<!-- Use audio_download_url directly -->
<audio src="/v1/transcriptions/txn_550e8400-.../audio" controls></audio>// Fetch the transcription, then play audio
const res = await fetch(`/v1/transcriptions/${jobId}`, {
headers: { Authorization: `Bearer ${apiKey}` }
});
const data = await res.json();
// audio_download_url is stable — store it, use it anytime
audioElement.src = data.audio_download_url;Unlike signed cloud storage URLs which expire after 1 hour, audio_download_url never expires. Each request generates a fresh signed URL server-side, so clients can store and reuse it indefinitely.
Turn Detection (VAD)
You can enable server-side turn detection (VAD) using query parameters. This helps split speech into turns and can trigger turn-level events.
- turn_detection — Set to
server_vadto enable server VAD - vad_type — Choose the VAD algorithm:
energy(default) orsilero
Streaming Audio Formats
The streaming API supports a subset of audio formats optimized for low-latency transmission. See the Audio Formats reference for full details.
- pcm_s16le (default) — Raw PCM, 16-bit signed little-endian. Lowest latency.
- wav — WAV container. Send the header with the first chunk.
- webm_opus — WebM with Opus codec. Best for browser-based streaming.
- opus_raw_16k — Raw Opus frames at 16kHz. Low bandwidth.
Complete Examples
Node.js (JavaScript)
// Node.js example (Authorization header). For browsers, use a server-side proxy.
// npm i ws
import WebSocket from "ws";
import fs from "fs";
const WS_URL = "wss://asr.lesan.ai/v1/ws/transcribe?language=am&format=pcm_s16le";
const ws = new WebSocket(WS_URL, {
headers: {
Authorization: "Bearer YOUR_API_KEY"
}
});
ws.on("open", () => {
console.log("Connected");
// Send PCM audio bytes in chunks
const chunkSize = 8000;
const audio = fs.readFileSync("recording.pcm");
for (let i = 0; i < audio.length; i += chunkSize) {
ws.send(audio.subarray(i, i + chunkSize));
}
ws.send("END");
});
ws.on("message", (data) => {
const msg = JSON.parse(data.toString());
if (msg.type === "ready") {
console.log("Session:", msg.session_id);
}
if (msg.type === "transcription") {
const text = msg.text || msg.transcription;
if (msg.is_final) console.log("Final:", text);
else console.log("Partial:", text);
}
if (msg.type === "error") {
console.error("Error:", msg.message || msg.error);
}
});
ws.on("close", (code, reason) => {
console.log("Closed:", code, reason.toString());
});Python
import asyncio
import websockets
import json
async def stream_audio(file_path, language="am"):
url = "wss://asr.lesan.ai/v1/ws/transcribe"
params = f"?language={language}&format=pcm_s16le"
async with websockets.connect(
url + params,
extra_headers={"Authorization": "Bearer YOUR_API_KEY"}
) as ws:
# Wait for ready message
ready = json.loads(await ws.recv())
assert ready["type"] == "ready"
print(f"Session: {ready['session_id']}")
# Send audio in chunks
chunk_size = 8000 # 250ms of 16kHz 16-bit audio
with open(file_path, "rb") as f:
while chunk := f.read(chunk_size):
await ws.send(chunk)
# Check for messages (non-blocking)
try:
msg = json.loads(await asyncio.wait_for(ws.recv(), timeout=0.01))
if msg["type"] == "transcription" and msg["is_final"]:
text = msg.get("transcription") or msg.get("text")
print(f"Transcription: {text}")
except asyncio.TimeoutError:
pass
# Signal end of audio
await ws.send("END")
# Collect remaining results
async for message in ws:
msg = json.loads(message)
if msg["type"] == "transcription" and msg["is_final"]:
text = msg.get("transcription") or msg.get("text")
print(f"Final: {text}")
asyncio.run(stream_audio("recording.pcm"))Connection Limits
- Max concurrent connections — 5 per API key
- Max session duration — 30 minutes
- Idle timeout — 300 seconds (5 minutes) without audio data or PING
- Max audio data rate — 1 MB/s
See the Rate Limits guide for full quota details.
Close Codes
The server uses these WebSocket close codes:
- 1000 — Normal closure after END command
- 1008 — Policy violation (invalid API key, insufficient permissions)
- 1011 — Internal server error
- 4000 — Invalid request (bad query parameters)
- 4001 — Authentication failed
- 4008 — Idle timeout (no data or PING for 300 seconds)
- 4029 — Rate limit exceeded (too many concurrent connections)
Troubleshooting
- No transcription results — Check that the audio format matches the
formatparameter. Mismatched formats produce silence. - Connection closes immediately — Verify your API key has the
writescope. Check the close code. - High latency — Use
pcm_s16leformat for lowest latency. Send smaller, more frequent chunks (100-250ms). - Idle timeout — Send
PINGcommands during silence to keep the connection alive. - Garbled results — Ensure your audio bytes match the declared
format(and its expected sample rate). Mismatches can cause pitch/speed issues.
See the Audio Formats reference for format details, or the Error Codes reference for error handling.