WebSocket Protocol

The SFVoPI WebSocket protocol defines 8 event types for bidirectional audio streaming: 5 incoming events from Superfone and 3 outgoing commands from your server.

Protocol Overview

Direction	Event Type	Purpose
Incoming (Superfone → Your Server)	`start`	Stream established, ready for media
Incoming	`media`	Audio chunk from caller
Incoming	`dtmf`	DTMF key pressed by caller
Incoming	`clearedAudio`	Acknowledgment of `clearAudio` command
Incoming	`playedStream`	Acknowledgment of `checkpoint` command
Outgoing (Your Server → Superfone)	`playAudio`	Send audio to caller
Outgoing	`clearAudio`	Clear audio playback buffer
Outgoing	`checkpoint`	Mark a named checkpoint in stream

All messages are JSON-encoded strings sent over the WebSocket connection.

Incoming Events (Superfone → Your Server)

1. `start` Event

Sent when the WebSocket connection is established and the audio stream is ready. This is the first event you'll receive.

When: Immediately after WebSocket connection is established (or after reconnection).

Fields:

Field	Type	Description
`event`	`string`	Always `"start"`
`streamId`	`string`	Unique identifier for this audio stream (ULID format)
`callId`	`string`	UUID of the call

Example:

{
  "event": "start",
  "streamId": "01JJXYZ123ABC456DEF789GHI",
  "callId": "550e8400-e29b-41d4-a716-446655440000"
}

What to do:

Store streamId and callId for this connection
Initialize your audio processor (transcription, AI, recording)
Start any timers or background tasks
If this is a reconnection, resume processing from where you left off

TypeScript Interface:

interface StartEvent {
  event: "start";
  streamId: string;
  callId: string;
}

2. `media` Event

Sent continuously during the call, containing audio chunks from the caller.

When: Every 20ms (typically) while the caller is speaking or silent.

Fields:

Field	Type	Description
`event`	`string`	Always `"media"`
`streamId`	`string`	Stream identifier from `start` event
`media.payload`	`string`	Base64-encoded PCM audio data
`media.contentType`	`string`	Audio codec: `"audio/PCMU"` or `"audio/PCMA"`
`media.sampleRate`	`number`	Sample rate in Hz: `8000` or `16000`

Example:

{
  "event": "media",
  "streamId": "01JJXYZ123ABC456DEF789GHI",
  "media": {
    "payload": "//7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+...",
    "contentType": "audio/PCMU",
    "sampleRate": 8000
  }
}

What to do:

Decode base64 payload to get raw PCM audio bytes
Process audio (transcribe, analyze, record)
Optionally send audio back using playAudio command

Decoding Audio:

JavaScript
TypeScript
Python

// Decode base64 to Buffer
const audioBuffer = Buffer.from(mediaEvent.media.payload, 'base64');

// audioBuffer is now raw PCM audio (16-bit signed little-endian)
// For 8000 Hz: 160 samples = 320 bytes = 20ms of audio
// For 16000 Hz: 320 samples = 640 bytes = 20ms of audio

// Example: Save to file
fs.appendFileSync('recording.pcm', audioBuffer);

// Example: Send to transcription service
await transcriptionService.processAudio(audioBuffer);

interface MediaEvent {
  event: "media";
  streamId: string;
  media: {
    payload: string;
    contentType: "audio/PCMU" | "audio/PCMA";
    sampleRate: 8000 | 16000;
  };
}

function handleMediaEvent(event: MediaEvent): void {
  // Decode base64 to Buffer
  const audioBuffer = Buffer.from(event.media.payload, 'base64');
  
  // Process audio
  processAudio(audioBuffer, event.media.sampleRate);
}

import base64

def handle_media_event(event):
    # Decode base64 to bytes
    audio_bytes = base64.b64decode(event['media']['payload'])
    
    # audio_bytes is now raw PCM audio (16-bit signed little-endian)
    # Process audio
    process_audio(audio_bytes, event['media']['sampleRate'])

TypeScript Interface:

interface MediaEvent {
  event: "media";
  streamId: string;
  media: {
    payload: string;
    contentType: "audio/PCMU" | "audio/PCMA";
    sampleRate: 8000 | 16000;
  };
}

3. `dtmf` Event

Sent when the caller presses a key on their phone's keypad.

When: Immediately after DTMF tone is detected by Superfone.

Fields:

Field	Type	Description
`event`	`string`	Always `"dtmf"`
`streamId`	`string`	Stream identifier from `start` event
`digit`	`string`	Key pressed: `"0"` to `"9"`, `"*"`, `"#"`, `"A"` to `"D"`

Example:

{
  "event": "dtmf",
  "streamId": "01JJXYZ123ABC456DEF789GHI",
  "digit": "5"
}

What to do:

Handle IVR menu navigation ("Press 1 for sales, 2 for support")
Collect input (account number, PIN, confirmation)
Trigger actions (skip, replay, transfer)

Example Handler:

JavaScript
TypeScript
Python

function handleDtmfEvent(event) {
  console.log(`Caller pressed: ${event.digit}`);
  
  switch (event.digit) {
    case '1':
      // Transfer to sales
      playAudio(salesGreeting);
      break;
    case '2':
      // Transfer to support
      playAudio(supportGreeting);
      break;
    case '*':
      // Go back to main menu
      playAudio(mainMenu);
      break;
    case '#':
      // Confirm input
      processInput();
      break;
    default:
      // Invalid input
      playAudio(invalidInputMessage);
  }
}

interface DtmfEvent {
  event: "dtmf";
  streamId: string;
  digit: string;
}

function handleDtmfEvent(event: DtmfEvent): void {
  console.log(`Caller pressed: ${event.digit}`);
  
  // IVR menu logic
  if (event.digit >= '1' && event.digit <= '9') {
    handleMenuOption(parseInt(event.digit));
  } else if (event.digit === '*') {
    goBackToMainMenu();
  } else if (event.digit === '#') {
    confirmInput();
  }
}

def handle_dtmf_event(event):
    digit = event['digit']
    print(f"Caller pressed: {digit}")
    
    if digit == '1':
        # Transfer to sales
        play_audio(sales_greeting)
    elif digit == '2':
        # Transfer to support
        play_audio(support_greeting)
    elif digit == '*':
        # Go back to main menu
        play_audio(main_menu)
    elif digit == '#':
        # Confirm input
        process_input()

TypeScript Interface:

interface DtmfEvent {
  event: "dtmf";
  streamId: string;
  digit: string;
}

4. `clearedAudio` Event

Acknowledgment that the audio playback buffer has been cleared in response to your clearAudio command.

When: Immediately after Superfone processes your clearAudio command.

Fields:

Field	Type	Description
`event`	`string`	Always `"clearedAudio"`
`streamId`	`string`	Stream identifier from `start` event
`sequenceNumber`	`number`	Sequence number from your `clearAudio` command

Example:

{
  "event": "clearedAudio",
  "streamId": "01JJXYZ123ABC456DEF789GHI",
  "sequenceNumber": 42
}

What to do:

Confirm that audio buffer was cleared
Resume sending new audio (if needed)
Update UI or logs

Use Case: You sent a long audio message, but the caller interrupted. You send clearAudio to stop playback, then send new audio based on the interruption.

TypeScript Interface:

interface ClearedAudioEvent {
  event: "clearedAudio";
  streamId: string;
  sequenceNumber: number;
}

5. `playedStream` Event

Acknowledgment that a checkpoint you marked with the checkpoint command has been reached (all audio before the checkpoint has finished playing).

When: After all audio sent before the checkpoint command has been played to the caller.

Fields:

Field	Type	Description
`event`	`string`	Always `"playedStream"`
`streamId`	`string`	Stream identifier from `start` event
`name`	`string`	Checkpoint name from your `checkpoint` command

Example:

{
  "event": "playedStream",
  "streamId": "01JJXYZ123ABC456DEF789GHI",
  "name": "greeting_complete"
}

What to do:

Trigger next action (e.g., start listening for response)
Update state machine (e.g., move from "greeting" to "listening")
Log timing metrics

Use Case: You send a greeting message, then send a checkpoint named "greeting_complete". When you receive playedStream with name: "greeting_complete", you know the greeting finished playing and can now listen for the caller's response.

TypeScript Interface:

interface PlayedStreamEvent {
  event: "playedStream";
  streamId: string;
  name: string;
}

Outgoing Commands (Your Server → Superfone)

6. `playAudio` Command

Send audio to the caller. Audio is queued and played in the order received.

When: Whenever you want to play audio to the caller (TTS, pre-recorded message, AI-generated speech).

Fields:

Field	Type	Required	Description
`event`	`string`	Yes	Always `"playAudio"`
`media.payload`	`string`	Yes	Base64-encoded PCM audio data
`media.contentType`	`string`	Yes	Audio codec: `"audio/PCMU"` or `"audio/PCMA"`
`media.sampleRate`	`number`	Yes	Sample rate in Hz: `8000` or `16000`

Example:

{
  "event": "playAudio",
  "media": {
    "payload": "//7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+/v7+...",
    "contentType": "audio/PCMU",
    "sampleRate": 8000
  }
}

Encoding Audio:

JavaScript
TypeScript
Python

// Encode PCM audio to base64
const audioBuffer = fs.readFileSync('greeting.pcm');
const base64Audio = audioBuffer.toString('base64');

// Send playAudio command
const command = {
  event: 'playAudio',
  media: {
    payload: base64Audio,
    contentType: 'audio/PCMU',
    sampleRate: 8000,
  },
};

ws.send(JSON.stringify(command));

interface PlayAudioCommand {
  event: "playAudio";
  media: {
    payload: string;
    contentType: "audio/PCMU" | "audio/PCMA";
    sampleRate: 8000 | 16000;
  };
}

function sendAudioToCallee(audioBuffer: Buffer, ws: WebSocket): void {
  const command: PlayAudioCommand = {
    event: 'playAudio',
    media: {
      payload: audioBuffer.toString('base64'),
      contentType: 'audio/PCMU',
      sampleRate: 8000,
    },
  };
  
  ws.send(JSON.stringify(command));
}

import base64
import json

def send_audio_to_callee(audio_bytes, ws):
    command = {
        'event': 'playAudio',
        'media': {
            'payload': base64.b64encode(audio_bytes).decode('utf-8'),
            'contentType': 'audio/PCMU',
            'sampleRate': 8000,
        },
    }
    
    ws.send(json.dumps(command))

Notes:

Audio is queued and played in FIFO order
Use the same codec and sample rate as specified in your Stream JSON response
Audio chunks are typically 20ms (160 samples at 8000 Hz, 320 samples at 16000 Hz)
You can send multiple playAudio commands in rapid succession (they will be queued)

TypeScript Interface:

interface PlayAudioCommand {
  event: "playAudio";
  media: {
    payload: string;
    contentType: "audio/PCMU" | "audio/PCMA";
    sampleRate: 8000 | 16000;
  };
}

7. `clearAudio` Command

Clear the audio playback buffer. All queued audio is discarded immediately.

When: When you need to interrupt playback (e.g., caller interrupted, error occurred, new context).

Fields:

Field	Type	Required	Description
`event`	`string`	Yes	Always `"clearAudio"`
`sequenceNumber`	`number`	Yes	Unique sequence number for tracking acknowledgment

Example:

{
  "event": "clearAudio",
  "sequenceNumber": 42
}

What happens:

Superfone immediately stops playing audio to the caller
All queued playAudio commands are discarded
Superfone sends clearedAudio event with the same sequenceNumber

Example:

JavaScript
TypeScript
Python

let clearSequence = 0;

function clearAudioBuffer(ws) {
  clearSequence++;
  
  const command = {
    event: 'clearAudio',
    sequenceNumber: clearSequence,
  };
  
  ws.send(JSON.stringify(command));
  console.log(`Sent clearAudio with sequence ${clearSequence}`);
}

// Handle acknowledgment
function handleClearedAudioEvent(event) {
  console.log(`Audio cleared, sequence ${event.sequenceNumber}`);
  // Now safe to send new audio
}

interface ClearAudioCommand {
  event: "clearAudio";
  sequenceNumber: number;
}

let clearSequence = 0;

function clearAudioBuffer(ws: WebSocket): void {
  clearSequence++;
  
  const command: ClearAudioCommand = {
    event: 'clearAudio',
    sequenceNumber: clearSequence,
  };
  
  ws.send(JSON.stringify(command));
}

clear_sequence = 0

def clear_audio_buffer(ws):
    global clear_sequence
    clear_sequence += 1
    
    command = {
        'event': 'clearAudio',
        'sequenceNumber': clear_sequence,
    }
    
    ws.send(json.dumps(command))
    print(f"Sent clearAudio with sequence {clear_sequence}")

Use Case: Voice AI bot is speaking a long response, but the caller interrupts. You send clearAudio to stop playback, then send new audio based on the interruption.

TypeScript Interface:

interface ClearAudioCommand {
  event: "clearAudio";
  sequenceNumber: number;
}

8. `checkpoint` Command

Mark a named checkpoint in the audio stream. When all audio sent before this checkpoint has finished playing, Superfone sends a playedStream event.

When: When you need to know when specific audio has finished playing (e.g., greeting complete, question asked).

Fields:

Field	Type	Required	Description
`event`	`string`	Yes	Always `"checkpoint"`
`streamId`	`string`	Yes	Stream identifier from `start` event
`name`	`string`	Yes	Unique name for this checkpoint

Example:

{
  "event": "checkpoint",
  "streamId": "01JJXYZ123ABC456DEF789GHI",
  "name": "greeting_complete"
}

What happens:

Superfone marks the current position in the audio queue
When all audio before this checkpoint finishes playing, Superfone sends playedStream event with the same name

Example:

JavaScript
TypeScript
Python

// Send greeting audio
sendAudioToCallee(greetingAudio, ws);

// Mark checkpoint
const checkpoint = {
  event: 'checkpoint',
  streamId: currentStreamId,
  name: 'greeting_complete',
};
ws.send(JSON.stringify(checkpoint));

// Later, when you receive playedStream event:
function handlePlayedStreamEvent(event) {
  if (event.name === 'greeting_complete') {
    console.log('Greeting finished playing, now listening for response');
    startListening();
  }
}

interface CheckpointCommand {
  event: "checkpoint";
  streamId: string;
  name: string;
}

function sendGreetingWithCheckpoint(ws: WebSocket, streamId: string): void {
  // Send greeting audio
  sendAudioToCallee(greetingAudio, ws);
  
  // Mark checkpoint
  const checkpoint: CheckpointCommand = {
    event: 'checkpoint',
    streamId,
    name: 'greeting_complete',
  };
  ws.send(JSON.stringify(checkpoint));
}

function handlePlayedStreamEvent(event: PlayedStreamEvent): void {
  if (event.name === 'greeting_complete') {
    startListening();
  }
}

def send_greeting_with_checkpoint(ws, stream_id):
    # Send greeting audio
    send_audio_to_callee(greeting_audio, ws)
    
    # Mark checkpoint
    checkpoint = {
        'event': 'checkpoint',
        'streamId': stream_id,
        'name': 'greeting_complete',
    }
    ws.send(json.dumps(checkpoint))

def handle_played_stream_event(event):
    if event['name'] == 'greeting_complete':
        print('Greeting finished playing, now listening for response')
        start_listening()

Use Cases:

Turn-taking: Know when to stop speaking and start listening
State machine: Transition between states (greeting → listening → responding)
Timing metrics: Measure how long audio takes to play
Synchronization: Coordinate audio playback with other actions

TypeScript Interface:

interface CheckpointCommand {
  event: "checkpoint";
  streamId: string;
  name: string;
}

Complete Example

Here's a complete WebSocket handler that processes all 8 event types:

JavaScript
TypeScript

const WebSocket = require('ws');

const wss = new WebSocket.Server({ port: 3000 });

wss.on('connection', (ws) => {
  let streamId = null;
  let callId = null;
  let clearSequence = 0;

  ws.on('message', (data) => {
    const event = JSON.parse(data.toString());

    switch (event.event) {
      case 'start':
        streamId = event.streamId;
        callId = event.callId;
        console.log(`Stream started: ${streamId}`);
        break;

      case 'media':
        // Decode and process audio
        const audioBuffer = Buffer.from(event.media.payload, 'base64');
        processAudio(audioBuffer);
        break;

      case 'dtmf':
        console.log(`DTMF pressed: ${event.digit}`);
        handleDtmfInput(event.digit);
        break;

      case 'clearedAudio':
        console.log(`Audio cleared: sequence ${event.sequenceNumber}`);
        break;

      case 'playedStream':
        console.log(`Checkpoint reached: ${event.name}`);
        handleCheckpoint(event.name);
        break;
    }
  });

  // Send audio to caller
  function sendAudio(audioBuffer) {
    const command = {
      event: 'playAudio',
      media: {
        payload: audioBuffer.toString('base64'),
        contentType: 'audio/PCMU',
        sampleRate: 8000,
      },
    };
    ws.send(JSON.stringify(command));
  }

  // Clear audio buffer
  function clearAudio() {
    clearSequence++;
    const command = {
      event: 'clearAudio',
      sequenceNumber: clearSequence,
    };
    ws.send(JSON.stringify(command));
  }

  // Mark checkpoint
  function markCheckpoint(name) {
    const command = {
      event: 'checkpoint',
      streamId,
      name,
    };
    ws.send(JSON.stringify(command));
  }
});

import { WebSocketServer, WebSocket } from 'ws';

interface StartEvent {
  event: "start";
  streamId: string;
  callId: string;
}

interface MediaEvent {
  event: "media";
  streamId: string;
  media: {
    payload: string;
    contentType: string;
    sampleRate: number;
  };
}

interface DtmfEvent {
  event: "dtmf";
  streamId: string;
  digit: string;
}

interface ClearedAudioEvent {
  event: "clearedAudio";
  streamId: string;
  sequenceNumber: number;
}

interface PlayedStreamEvent {
  event: "playedStream";
  streamId: string;
  name: string;
}

type IncomingEvent =
  | StartEvent
  | MediaEvent
  | DtmfEvent
  | ClearedAudioEvent
  | PlayedStreamEvent;

const wss = new WebSocketServer({ port: 3000 });

wss.on('connection', (ws: WebSocket) => {
  let streamId: string | null = null;
  let callId: string | null = null;
  let clearSequence = 0;

  ws.on('message', (data: Buffer) => {
    const event: IncomingEvent = JSON.parse(data.toString());

    switch (event.event) {
      case 'start':
        streamId = event.streamId;
        callId = event.callId;
        console.log(`Stream started: ${streamId}`);
        break;

      case 'media':
        const audioBuffer = Buffer.from(event.media.payload, 'base64');
        processAudio(audioBuffer);
        break;

      case 'dtmf':
        console.log(`DTMF pressed: ${event.digit}`);
        handleDtmfInput(event.digit);
        break;

      case 'clearedAudio':
        console.log(`Audio cleared: sequence ${event.sequenceNumber}`);
        break;

      case 'playedStream':
        console.log(`Checkpoint reached: ${event.name}`);
        handleCheckpoint(event.name);
        break;
    }
  });

  function sendAudio(audioBuffer: Buffer): void {
    const command = {
      event: 'playAudio',
      media: {
        payload: audioBuffer.toString('base64'),
        contentType: 'audio/PCMU',
        sampleRate: 8000,
      },
    };
    ws.send(JSON.stringify(command));
  }

  function clearAudio(): void {
    clearSequence++;
    const command = {
      event: 'clearAudio',
      sequenceNumber: clearSequence,
    };
    ws.send(JSON.stringify(command));
  }

  function markCheckpoint(name: string): void {
    if (!streamId) return;
    const command = {
      event: 'checkpoint',
      streamId,
      name,
    };
    ws.send(JSON.stringify(command));
  }
});

Next Steps

Audio Processing — Build audio processors with examples
Overview — Learn about codecs, sample rates, and architecture
Answer Webhook — Configure Stream JSON response
Examples — See complete working examples

Protocol Overview​

Incoming Events (Superfone → Your Server)​

1. start Event​

2. media Event​

3. dtmf Event​

4. clearedAudio Event​

5. playedStream Event​

Outgoing Commands (Your Server → Superfone)​

6. playAudio Command​

7. clearAudio Command​

8. checkpoint Command​

Complete Example​

Next Steps​

Protocol Overview

Incoming Events (Superfone → Your Server)

1. `start` Event

2. `media` Event

3. `dtmf` Event

4. `clearedAudio` Event

5. `playedStream` Event

Outgoing Commands (Your Server → Superfone)

6. `playAudio` Command

7. `clearAudio` Command

8. `checkpoint` Command

Complete Example

Next Steps