WebSocket Protocol
The SFVoPI WebSocket protocol defines 8 event types for bidirectional audio streaming: 5 incoming events from Superfone and 3 outgoing commands from your server.
Protocol Overview
| Direction | Event Type | Purpose |
|---|---|---|
| Incoming (Superfone → Your Server) | start | Stream established, ready for media |
| Incoming | media | Audio chunk from caller |
| Incoming | dtmf | DTMF key pressed by caller |
| Incoming | clearedAudio | Acknowledgment of clearAudio command |
| Incoming | playedStream | Acknowledgment of checkpoint command |
| Outgoing (Your Server → Superfone) | playAudio | Send audio to caller |
| Outgoing | clearAudio | Clear audio playback buffer |
| Outgoing | checkpoint | Mark a named checkpoint in stream |
All messages are JSON-encoded strings sent over the WebSocket connection.
Incoming Events (Superfone → Your Server)
1. start Event
Sent when the WebSocket connection is established and the audio stream is ready. This is the first event you'll receive.
When: Immediately after WebSocket connection is established (or after reconnection).
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "start" |
streamId | string | Unique identifier for this audio stream (ULID format) |
callId | string | UUID of the call |
Example:
{
"event": "start",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"callId": "550e8400-e29b-41d4-a716-446655440000"
}
What to do:
- Store
streamIdandcallIdfor this connection - Initialize your audio processor (transcription, AI, recording)
- Start any timers or background tasks
- If this is a reconnection, resume processing from where you left off
TypeScript Interface:
interface StartEvent {
event: "start";
streamId: string;
callId: string;
}
2. media Event
Sent continuously during the call, containing audio chunks from the caller.
When: Every 20ms (typically) while the caller is speaking or silent.
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "media" |
streamId | string | Stream identifier from start event |
media.payload | string | Base64-encoded PCM audio data |
media.contentType | string | Audio codec: "audio/PCMU" or "audio/PCMA" |
media.sampleRate | number | Sample rate in Hz: 8000 or 16000 |
Example:
{
"event": "media",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"media": {
"payload": "//7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+...",
"contentType": "audio/PCMU",
"sampleRate": 8000
}
}
What to do:
- Decode base64 payload to get raw PCM audio bytes
- Process audio (transcribe, analyze, record)
- Optionally send audio back using
playAudiocommand
Decoding Audio:
- JavaScript
- TypeScript
- Python
// Decode base64 to Buffer
const audioBuffer = Buffer.from(mediaEvent.media.payload, 'base64');
// audioBuffer is now raw PCM audio (16-bit signed little-endian)
// For 8000 Hz: 160 samples = 320 bytes = 20ms of audio
// For 16000 Hz: 320 samples = 640 bytes = 20ms of audio
// Example: Save to file
fs.appendFileSync('recording.pcm', audioBuffer);
// Example: Send to transcription service
await transcriptionService.processAudio(audioBuffer);
interface MediaEvent {
event: "media";
streamId: string;
media: {
payload: string;
contentType: "audio/PCMU" | "audio/PCMA";
sampleRate: 8000 | 16000;
};
}
function handleMediaEvent(event: MediaEvent): void {
// Decode base64 to Buffer
const audioBuffer = Buffer.from(event.media.payload, 'base64');
// Process audio
processAudio(audioBuffer, event.media.sampleRate);
}
import base64
def handle_media_event(event):
# Decode base64 to bytes
audio_bytes = base64.b64decode(event['media']['payload'])
# audio_bytes is now raw PCM audio (16-bit signed little-endian)
# Process audio
process_audio(audio_bytes, event['media']['sampleRate'])
TypeScript Interface:
interface MediaEvent {
event: "media";
streamId: string;
media: {
payload: string;
contentType: "audio/PCMU" | "audio/PCMA";
sampleRate: 8000 | 16000;
};
}
3. dtmf Event
Sent when the caller presses a key on their phone's keypad.
When: Immediately after DTMF tone is detected by Superfone.
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "dtmf" |
streamId | string | Stream identifier from start event |
digit | string | Key pressed: "0" to "9", "*", "#", "A" to "D" |
Example:
{
"event": "dtmf",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"digit": "5"
}
What to do:
- Handle IVR menu navigation ("Press 1 for sales, 2 for support")
- Collect input (account number, PIN, confirmation)
- Trigger actions (skip, replay, transfer)
Example Handler:
- JavaScript
- TypeScript
- Python
function handleDtmfEvent(event) {
console.log(`Caller pressed: ${event.digit}`);
switch (event.digit) {
case '1':
// Transfer to sales
playAudio(salesGreeting);
break;
case '2':
// Transfer to support
playAudio(supportGreeting);
break;
case '*':
// Go back to main menu
playAudio(mainMenu);
break;
case '#':
// Confirm input
processInput();
break;
default:
// Invalid input
playAudio(invalidInputMessage);
}
}
interface DtmfEvent {
event: "dtmf";
streamId: string;
digit: string;
}
function handleDtmfEvent(event: DtmfEvent): void {
console.log(`Caller pressed: ${event.digit}`);
// IVR menu logic
if (event.digit >= '1' && event.digit <= '9') {
handleMenuOption(parseInt(event.digit));
} else if (event.digit === '*') {
goBackToMainMenu();
} else if (event.digit === '#') {
confirmInput();
}
}
def handle_dtmf_event(event):
digit = event['digit']
print(f"Caller pressed: {digit}")
if digit == '1':
# Transfer to sales
play_audio(sales_greeting)
elif digit == '2':
# Transfer to support
play_audio(support_greeting)
elif digit == '*':
# Go back to main menu
play_audio(main_menu)
elif digit == '#':
# Confirm input
process_input()
TypeScript Interface:
interface DtmfEvent {
event: "dtmf";
streamId: string;
digit: string;
}
4. clearedAudio Event
Acknowledgment that the audio playback buffer has been cleared in response to your clearAudio command.
When: Immediately after Superfone processes your clearAudio command.
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "clearedAudio" |
streamId | string | Stream identifier from start event |
sequenceNumber | number | Sequence number from your clearAudio command |
Example:
{
"event": "clearedAudio",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"sequenceNumber": 42
}
What to do:
- Confirm that audio buffer was cleared
- Resume sending new audio (if needed)
- Update UI or logs
Use Case: You sent a long audio message, but the caller interrupted. You send clearAudio to stop playback, then send new audio based on the interruption.
TypeScript Interface:
interface ClearedAudioEvent {
event: "clearedAudio";
streamId: string;
sequenceNumber: number;
}
5. playedStream Event
Acknowledgment that a checkpoint you marked with the checkpoint command has been reached (all audio before the checkpoint has finished playing).
When: After all audio sent before the checkpoint command has been played to the caller.
Fields:
| Field | Type | Description |
|---|---|---|
event | string | Always "playedStream" |
streamId | string | Stream identifier from start event |
name | string | Checkpoint name from your checkpoint command |
Example:
{
"event": "playedStream",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"name": "greeting_complete"
}
What to do:
- Trigger next action (e.g., start listening for response)
- Update state machine (e.g., move from "greeting" to "listening")
- Log timing metrics
Use Case: You send a greeting message, then send a checkpoint named "greeting_complete". When you receive playedStream with name: "greeting_complete", you know the greeting finished playing and can now listen for the caller's response.
TypeScript Interface:
interface PlayedStreamEvent {
event: "playedStream";
streamId: string;
name: string;
}
Outgoing Commands (Your Server → Superfone)
6. playAudio Command
Send audio to the caller. Audio is queued and played in the order received.
When: Whenever you want to play audio to the caller (TTS, pre-recorded message, AI-generated speech).
Fields:
| Field | Type | Required | Description |
|---|---|---|---|
event | string | Yes | Always "playAudio" |
media.payload | string | Yes | Base64-encoded PCM audio data |
media.contentType | string | Yes | Audio codec: "audio/PCMU" or "audio/PCMA" |
media.sampleRate | number | Yes | Sample rate in Hz: 8000 or 16000 |
Example:
{
"event": "playAudio",
"media": {
"payload": "//7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+...",
"contentType": "audio/PCMU",
"sampleRate": 8000
}
}
Encoding Audio:
- JavaScript
- TypeScript
- Python
// Encode PCM audio to base64
const audioBuffer = fs.readFileSync('greeting.pcm');
const base64Audio = audioBuffer.toString('base64');
// Send playAudio command
const command = {
event: 'playAudio',
media: {
payload: base64Audio,
contentType: 'audio/PCMU',
sampleRate: 8000,
},
};
ws.send(JSON.stringify(command));
interface PlayAudioCommand {
event: "playAudio";
media: {
payload: string;
contentType: "audio/PCMU" | "audio/PCMA";
sampleRate: 8000 | 16000;
};
}
function sendAudioToCallee(audioBuffer: Buffer, ws: WebSocket): void {
const command: PlayAudioCommand = {
event: 'playAudio',
media: {
payload: audioBuffer.toString('base64'),
contentType: 'audio/PCMU',
sampleRate: 8000,
},
};
ws.send(JSON.stringify(command));
}
import base64
import json
def send_audio_to_callee(audio_bytes, ws):
command = {
'event': 'playAudio',
'media': {
'payload': base64.b64encode(audio_bytes).decode('utf-8'),
'contentType': 'audio/PCMU',
'sampleRate': 8000,
},
}
ws.send(json.dumps(command))
Notes:
- Audio is queued and played in FIFO order
- Use the same codec and sample rate as specified in your Stream JSON response
- Audio chunks are typically 20ms (160 samples at 8000 Hz, 320 samples at 16000 Hz)
- You can send multiple
playAudiocommands in rapid succession (they will be queued)
TypeScript Interface:
interface PlayAudioCommand {
event: "playAudio";
media: {
payload: string;
contentType: "audio/PCMU" | "audio/PCMA";
sampleRate: 8000 | 16000;
};
}
7. clearAudio Command
Clear the audio playback buffer. All queued audio is discarded immediately.
When: When you need to interrupt playback (e.g., caller interrupted, error occurred, new context).
Fields:
| Field | Type | Required | Description |
|---|---|---|---|
event | string | Yes | Always "clearAudio" |
sequenceNumber | number | Yes | Unique sequence number for tracking acknowledgment |
Example:
{
"event": "clearAudio",
"sequenceNumber": 42
}
What happens:
- Superfone immediately stops playing audio to the caller
- All queued
playAudiocommands are discarded - Superfone sends
clearedAudioevent with the samesequenceNumber
Example:
- JavaScript
- TypeScript
- Python
let clearSequence = 0;
function clearAudioBuffer(ws) {
clearSequence++;
const command = {
event: 'clearAudio',
sequenceNumber: clearSequence,
};
ws.send(JSON.stringify(command));
console.log(`Sent clearAudio with sequence ${clearSequence}`);
}
// Handle acknowledgment
function handleClearedAudioEvent(event) {
console.log(`Audio cleared, sequence ${event.sequenceNumber}`);
// Now safe to send new audio
}
interface ClearAudioCommand {
event: "clearAudio";
sequenceNumber: number;
}
let clearSequence = 0;
function clearAudioBuffer(ws: WebSocket): void {
clearSequence++;
const command: ClearAudioCommand = {
event: 'clearAudio',
sequenceNumber: clearSequence,
};
ws.send(JSON.stringify(command));
}
clear_sequence = 0
def clear_audio_buffer(ws):
global clear_sequence
clear_sequence += 1
command = {
'event': 'clearAudio',
'sequenceNumber': clear_sequence,
}
ws.send(json.dumps(command))
print(f"Sent clearAudio with sequence {clear_sequence}")
Use Case: Voice AI bot is speaking a long response, but the caller interrupts. You send clearAudio to stop playback, then send new audio based on the interruption.
TypeScript Interface:
interface ClearAudioCommand {
event: "clearAudio";
sequenceNumber: number;
}
8. checkpoint Command
Mark a named checkpoint in the audio stream. When all audio sent before this checkpoint has finished playing, Superfone sends a playedStream event.
When: When you need to know when specific audio has finished playing (e.g., greeting complete, question asked).
Fields:
| Field | Type | Required | Description |
|---|---|---|---|
event | string | Yes | Always "checkpoint" |
streamId | string | Yes | Stream identifier from start event |
name | string | Yes | Unique name for this checkpoint |
Example:
{
"event": "checkpoint",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"name": "greeting_complete"
}
What happens:
- Superfone marks the current position in the audio queue
- When all audio before this checkpoint finishes playing, Superfone sends
playedStreamevent with the samename
Example:
- JavaScript
- TypeScript
- Python
// Send greeting audio
sendAudioToCallee(greetingAudio, ws);
// Mark checkpoint
const checkpoint = {
event: 'checkpoint',
streamId: currentStreamId,
name: 'greeting_complete',
};
ws.send(JSON.stringify(checkpoint));
// Later, when you receive playedStream event:
function handlePlayedStreamEvent(event) {
if (event.name === 'greeting_complete') {
console.log('Greeting finished playing, now listening for response');
startListening();
}
}
interface CheckpointCommand {
event: "checkpoint";
streamId: string;
name: string;
}
function sendGreetingWithCheckpoint(ws: WebSocket, streamId: string): void {
// Send greeting audio
sendAudioToCallee(greetingAudio, ws);
// Mark checkpoint
const checkpoint: CheckpointCommand = {
event: 'checkpoint',
streamId,
name: 'greeting_complete',
};
ws.send(JSON.stringify(checkpoint));
}
function handlePlayedStreamEvent(event: PlayedStreamEvent): void {
if (event.name === 'greeting_complete') {
startListening();
}
}
def send_greeting_with_checkpoint(ws, stream_id):
# Send greeting audio
send_audio_to_callee(greeting_audio, ws)
# Mark checkpoint
checkpoint = {
'event': 'checkpoint',
'streamId': stream_id,
'name': 'greeting_complete',
}
ws.send(json.dumps(checkpoint))
def handle_played_stream_event(event):
if event['name'] == 'greeting_complete':
print('Greeting finished playing, now listening for response')
start_listening()
Use Cases:
- Turn-taking: Know when to stop speaking and start listening
- State machine: Transition between states (greeting → listening → responding)
- Timing metrics: Measure how long audio takes to play
- Synchronization: Coordinate audio playback with other actions
TypeScript Interface:
interface CheckpointCommand {
event: "checkpoint";
streamId: string;
name: string;
}
Complete Example
Here's a complete WebSocket handler that processes all 8 event types:
- JavaScript
- TypeScript
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 3000 });
wss.on('connection', (ws) => {
let streamId = null;
let callId = null;
let clearSequence = 0;
ws.on('message', (data) => {
const event = JSON.parse(data.toString());
switch (event.event) {
case 'start':
streamId = event.streamId;
callId = event.callId;
console.log(`Stream started: ${streamId}`);
break;
case 'media':
// Decode and process audio
const audioBuffer = Buffer.from(event.media.payload, 'base64');
processAudio(audioBuffer);
break;
case 'dtmf':
console.log(`DTMF pressed: ${event.digit}`);
handleDtmfInput(event.digit);
break;
case 'clearedAudio':
console.log(`Audio cleared: sequence ${event.sequenceNumber}`);
break;
case 'playedStream':
console.log(`Checkpoint reached: ${event.name}`);
handleCheckpoint(event.name);
break;
}
});
// Send audio to caller
function sendAudio(audioBuffer) {
const command = {
event: 'playAudio',
media: {
payload: audioBuffer.toString('base64'),
contentType: 'audio/PCMU',
sampleRate: 8000,
},
};
ws.send(JSON.stringify(command));
}
// Clear audio buffer
function clearAudio() {
clearSequence++;
const command = {
event: 'clearAudio',
sequenceNumber: clearSequence,
};
ws.send(JSON.stringify(command));
}
// Mark checkpoint
function markCheckpoint(name) {
const command = {
event: 'checkpoint',
streamId,
name,
};
ws.send(JSON.stringify(command));
}
});
import { WebSocketServer, WebSocket } from 'ws';
interface StartEvent {
event: "start";
streamId: string;
callId: string;
}
interface MediaEvent {
event: "media";
streamId: string;
media: {
payload: string;
contentType: string;
sampleRate: number;
};
}
interface DtmfEvent {
event: "dtmf";
streamId: string;
digit: string;
}
interface ClearedAudioEvent {
event: "clearedAudio";
streamId: string;
sequenceNumber: number;
}
interface PlayedStreamEvent {
event: "playedStream";
streamId: string;
name: string;
}
type IncomingEvent =
| StartEvent
| MediaEvent
| DtmfEvent
| ClearedAudioEvent
| PlayedStreamEvent;
const wss = new WebSocketServer({ port: 3000 });
wss.on('connection', (ws: WebSocket) => {
let streamId: string | null = null;
let callId: string | null = null;
let clearSequence = 0;
ws.on('message', (data: Buffer) => {
const event: IncomingEvent = JSON.parse(data.toString());
switch (event.event) {
case 'start':
streamId = event.streamId;
callId = event.callId;
console.log(`Stream started: ${streamId}`);
break;
case 'media':
const audioBuffer = Buffer.from(event.media.payload, 'base64');
processAudio(audioBuffer);
break;
case 'dtmf':
console.log(`DTMF pressed: ${event.digit}`);
handleDtmfInput(event.digit);
break;
case 'clearedAudio':
console.log(`Audio cleared: sequence ${event.sequenceNumber}`);
break;
case 'playedStream':
console.log(`Checkpoint reached: ${event.name}`);
handleCheckpoint(event.name);
break;
}
});
function sendAudio(audioBuffer: Buffer): void {
const command = {
event: 'playAudio',
media: {
payload: audioBuffer.toString('base64'),
contentType: 'audio/PCMU',
sampleRate: 8000,
},
};
ws.send(JSON.stringify(command));
}
function clearAudio(): void {
clearSequence++;
const command = {
event: 'clearAudio',
sequenceNumber: clearSequence,
};
ws.send(JSON.stringify(command));
}
function markCheckpoint(name: string): void {
if (!streamId) return;
const command = {
event: 'checkpoint',
streamId,
name,
};
ws.send(JSON.stringify(command));
}
});
Next Steps
- Audio Processing — Build audio processors with examples
- Overview — Learn about codecs, sample rates, and architecture
- Answer Webhook — Configure Stream JSON response
- Examples — See complete working examples