Skip to main content

WebSocket Protocol

The SFVoPI WebSocket protocol defines 8 event types for bidirectional audio streaming: 5 incoming events from Superfone and 3 outgoing commands from your server.

Protocol Overview

DirectionEvent TypePurpose
Incoming (Superfone → Your Server)startStream established, ready for media
IncomingmediaAudio chunk from caller
IncomingdtmfDTMF key pressed by caller
IncomingclearedAudioAcknowledgment of clearAudio command
IncomingplayedStreamAcknowledgment of checkpoint command
Outgoing (Your Server → Superfone)playAudioSend audio to caller
OutgoingclearAudioClear audio playback buffer
OutgoingcheckpointMark a named checkpoint in stream

All messages are JSON-encoded strings sent over the WebSocket connection.


Incoming Events (Superfone → Your Server)

1. start Event

Sent when the WebSocket connection is established and the audio stream is ready. This is the first event you'll receive.

When: Immediately after WebSocket connection is established (or after reconnection).

Fields:

FieldTypeDescription
eventstringAlways "start"
streamIdstringUnique identifier for this audio stream (ULID format)
callIdstringUUID of the call

Example:

{
"event": "start",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"callId": "550e8400-e29b-41d4-a716-446655440000"
}

What to do:

  • Store streamId and callId for this connection
  • Initialize your audio processor (transcription, AI, recording)
  • Start any timers or background tasks
  • If this is a reconnection, resume processing from where you left off

TypeScript Interface:

interface StartEvent {
event: "start";
streamId: string;
callId: string;
}

2. media Event

Sent continuously during the call, containing audio chunks from the caller.

When: Every 20ms (typically) while the caller is speaking or silent.

Fields:

FieldTypeDescription
eventstringAlways "media"
streamIdstringStream identifier from start event
media.payloadstringBase64-encoded PCM audio data
media.contentTypestringAudio codec: "audio/PCMU" or "audio/PCMA"
media.sampleRatenumberSample rate in Hz: 8000 or 16000

Example:

{
"event": "media",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"media": {
"payload": "//7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+...",
"contentType": "audio/PCMU",
"sampleRate": 8000
}
}

What to do:

  • Decode base64 payload to get raw PCM audio bytes
  • Process audio (transcribe, analyze, record)
  • Optionally send audio back using playAudio command

Decoding Audio:

// Decode base64 to Buffer
const audioBuffer = Buffer.from(mediaEvent.media.payload, 'base64');

// audioBuffer is now raw PCM audio (16-bit signed little-endian)
// For 8000 Hz: 160 samples = 320 bytes = 20ms of audio
// For 16000 Hz: 320 samples = 640 bytes = 20ms of audio

// Example: Save to file
fs.appendFileSync('recording.pcm', audioBuffer);

// Example: Send to transcription service
await transcriptionService.processAudio(audioBuffer);

TypeScript Interface:

interface MediaEvent {
event: "media";
streamId: string;
media: {
payload: string;
contentType: "audio/PCMU" | "audio/PCMA";
sampleRate: 8000 | 16000;
};
}

3. dtmf Event

Sent when the caller presses a key on their phone's keypad.

When: Immediately after DTMF tone is detected by Superfone.

Fields:

FieldTypeDescription
eventstringAlways "dtmf"
streamIdstringStream identifier from start event
digitstringKey pressed: "0" to "9", "*", "#", "A" to "D"

Example:

{
"event": "dtmf",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"digit": "5"
}

What to do:

  • Handle IVR menu navigation ("Press 1 for sales, 2 for support")
  • Collect input (account number, PIN, confirmation)
  • Trigger actions (skip, replay, transfer)

Example Handler:

function handleDtmfEvent(event) {
console.log(`Caller pressed: ${event.digit}`);

switch (event.digit) {
case '1':
// Transfer to sales
playAudio(salesGreeting);
break;
case '2':
// Transfer to support
playAudio(supportGreeting);
break;
case '*':
// Go back to main menu
playAudio(mainMenu);
break;
case '#':
// Confirm input
processInput();
break;
default:
// Invalid input
playAudio(invalidInputMessage);
}
}

TypeScript Interface:

interface DtmfEvent {
event: "dtmf";
streamId: string;
digit: string;
}

4. clearedAudio Event

Acknowledgment that the audio playback buffer has been cleared in response to your clearAudio command.

When: Immediately after Superfone processes your clearAudio command.

Fields:

FieldTypeDescription
eventstringAlways "clearedAudio"
streamIdstringStream identifier from start event
sequenceNumbernumberSequence number from your clearAudio command

Example:

{
"event": "clearedAudio",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"sequenceNumber": 42
}

What to do:

  • Confirm that audio buffer was cleared
  • Resume sending new audio (if needed)
  • Update UI or logs

Use Case: You sent a long audio message, but the caller interrupted. You send clearAudio to stop playback, then send new audio based on the interruption.

TypeScript Interface:

interface ClearedAudioEvent {
event: "clearedAudio";
streamId: string;
sequenceNumber: number;
}

5. playedStream Event

Acknowledgment that a checkpoint you marked with the checkpoint command has been reached (all audio before the checkpoint has finished playing).

When: After all audio sent before the checkpoint command has been played to the caller.

Fields:

FieldTypeDescription
eventstringAlways "playedStream"
streamIdstringStream identifier from start event
namestringCheckpoint name from your checkpoint command

Example:

{
"event": "playedStream",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"name": "greeting_complete"
}

What to do:

  • Trigger next action (e.g., start listening for response)
  • Update state machine (e.g., move from "greeting" to "listening")
  • Log timing metrics

Use Case: You send a greeting message, then send a checkpoint named "greeting_complete". When you receive playedStream with name: "greeting_complete", you know the greeting finished playing and can now listen for the caller's response.

TypeScript Interface:

interface PlayedStreamEvent {
event: "playedStream";
streamId: string;
name: string;
}

Outgoing Commands (Your Server → Superfone)

6. playAudio Command

Send audio to the caller. Audio is queued and played in the order received.

When: Whenever you want to play audio to the caller (TTS, pre-recorded message, AI-generated speech).

Fields:

FieldTypeRequiredDescription
eventstringYesAlways "playAudio"
media.payloadstringYesBase64-encoded PCM audio data
media.contentTypestringYesAudio codec: "audio/PCMU" or "audio/PCMA"
media.sampleRatenumberYesSample rate in Hz: 8000 or 16000

Example:

{
"event": "playAudio",
"media": {
"payload": "//7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+...",
"contentType": "audio/PCMU",
"sampleRate": 8000
}
}

Encoding Audio:

// Encode PCM audio to base64
const audioBuffer = fs.readFileSync('greeting.pcm');
const base64Audio = audioBuffer.toString('base64');

// Send playAudio command
const command = {
event: 'playAudio',
media: {
payload: base64Audio,
contentType: 'audio/PCMU',
sampleRate: 8000,
},
};

ws.send(JSON.stringify(command));

Notes:

  • Audio is queued and played in FIFO order
  • Use the same codec and sample rate as specified in your Stream JSON response
  • Audio chunks are typically 20ms (160 samples at 8000 Hz, 320 samples at 16000 Hz)
  • You can send multiple playAudio commands in rapid succession (they will be queued)

TypeScript Interface:

interface PlayAudioCommand {
event: "playAudio";
media: {
payload: string;
contentType: "audio/PCMU" | "audio/PCMA";
sampleRate: 8000 | 16000;
};
}

7. clearAudio Command

Clear the audio playback buffer. All queued audio is discarded immediately.

When: When you need to interrupt playback (e.g., caller interrupted, error occurred, new context).

Fields:

FieldTypeRequiredDescription
eventstringYesAlways "clearAudio"
sequenceNumbernumberYesUnique sequence number for tracking acknowledgment

Example:

{
"event": "clearAudio",
"sequenceNumber": 42
}

What happens:

  1. Superfone immediately stops playing audio to the caller
  2. All queued playAudio commands are discarded
  3. Superfone sends clearedAudio event with the same sequenceNumber

Example:

let clearSequence = 0;

function clearAudioBuffer(ws) {
clearSequence++;

const command = {
event: 'clearAudio',
sequenceNumber: clearSequence,
};

ws.send(JSON.stringify(command));
console.log(`Sent clearAudio with sequence ${clearSequence}`);
}

// Handle acknowledgment
function handleClearedAudioEvent(event) {
console.log(`Audio cleared, sequence ${event.sequenceNumber}`);
// Now safe to send new audio
}

Use Case: Voice AI bot is speaking a long response, but the caller interrupts. You send clearAudio to stop playback, then send new audio based on the interruption.

TypeScript Interface:

interface ClearAudioCommand {
event: "clearAudio";
sequenceNumber: number;
}

8. checkpoint Command

Mark a named checkpoint in the audio stream. When all audio sent before this checkpoint has finished playing, Superfone sends a playedStream event.

When: When you need to know when specific audio has finished playing (e.g., greeting complete, question asked).

Fields:

FieldTypeRequiredDescription
eventstringYesAlways "checkpoint"
streamIdstringYesStream identifier from start event
namestringYesUnique name for this checkpoint

Example:

{
"event": "checkpoint",
"streamId": "01JJXYZ123ABC456DEF789GHI",
"name": "greeting_complete"
}

What happens:

  1. Superfone marks the current position in the audio queue
  2. When all audio before this checkpoint finishes playing, Superfone sends playedStream event with the same name

Example:

// Send greeting audio
sendAudioToCallee(greetingAudio, ws);

// Mark checkpoint
const checkpoint = {
event: 'checkpoint',
streamId: currentStreamId,
name: 'greeting_complete',
};
ws.send(JSON.stringify(checkpoint));

// Later, when you receive playedStream event:
function handlePlayedStreamEvent(event) {
if (event.name === 'greeting_complete') {
console.log('Greeting finished playing, now listening for response');
startListening();
}
}

Use Cases:

  • Turn-taking: Know when to stop speaking and start listening
  • State machine: Transition between states (greeting → listening → responding)
  • Timing metrics: Measure how long audio takes to play
  • Synchronization: Coordinate audio playback with other actions

TypeScript Interface:

interface CheckpointCommand {
event: "checkpoint";
streamId: string;
name: string;
}

Complete Example

Here's a complete WebSocket handler that processes all 8 event types:

const WebSocket = require('ws');

const wss = new WebSocket.Server({ port: 3000 });

wss.on('connection', (ws) => {
let streamId = null;
let callId = null;
let clearSequence = 0;

ws.on('message', (data) => {
const event = JSON.parse(data.toString());

switch (event.event) {
case 'start':
streamId = event.streamId;
callId = event.callId;
console.log(`Stream started: ${streamId}`);
break;

case 'media':
// Decode and process audio
const audioBuffer = Buffer.from(event.media.payload, 'base64');
processAudio(audioBuffer);
break;

case 'dtmf':
console.log(`DTMF pressed: ${event.digit}`);
handleDtmfInput(event.digit);
break;

case 'clearedAudio':
console.log(`Audio cleared: sequence ${event.sequenceNumber}`);
break;

case 'playedStream':
console.log(`Checkpoint reached: ${event.name}`);
handleCheckpoint(event.name);
break;
}
});

// Send audio to caller
function sendAudio(audioBuffer) {
const command = {
event: 'playAudio',
media: {
payload: audioBuffer.toString('base64'),
contentType: 'audio/PCMU',
sampleRate: 8000,
},
};
ws.send(JSON.stringify(command));
}

// Clear audio buffer
function clearAudio() {
clearSequence++;
const command = {
event: 'clearAudio',
sequenceNumber: clearSequence,
};
ws.send(JSON.stringify(command));
}

// Mark checkpoint
function markCheckpoint(name) {
const command = {
event: 'checkpoint',
streamId,
name,
};
ws.send(JSON.stringify(command));
}
});

Next Steps

  • Audio Processing — Build audio processors with examples
  • Overview — Learn about codecs, sample rates, and architecture
  • Answer Webhook — Configure Stream JSON response
  • Examples — See complete working examples