WebSocket API - ReadRealm

ReadRealm uses Socket.IO ^4.8 on top of NestJS WebSockets to power two real-time features:

Book chat — per-book discussion rooms with message history
Speech recognition — streaming audio to Azure for live transcription and AI voice responses

Both gateways currently allow all origins (cors: { origin: '*' }). Restrict this in production by setting the CORS_ORIGIN environment variable.

Connection setup

Install the Socket.IO client

npm install socket.io-client

Connect to the server

Both gateways run on the same NestJS server. Connect once and reuse the socket across features.

import { io } from 'socket.io-client';

const socket = io('http://localhost:3000', {
  // Pass your JWT in the auth handshake for future auth middleware
  auth: { token: '<your-jwt>' },
});

socket.on('connect', () => {
  console.log('Connected:', socket.id);
});

For the speech gateway, force the websocket transport (the gateway configures transports: ['websocket']):

const speechSocket = io('http://localhost:3000', {
  transports: ['websocket'],
  auth: { token: '<your-jwt>' },
});

Listen for the connection status event

The speech gateway emits connectionStatus immediately on connect:

speechSocket.on('connectionStatus', (payload) => {
  // payload: { connected: true }
  console.log('Speech gateway ready:', payload);
});

Book chat events

Chat rooms are keyed by book: joining the room for book 42 puts you in the Socket.IO room book_42. All events are scoped to that room.

Client → server

Event	Payload	Description
`joinRoom`	`{ bookId: number, userId: string, username: string }`	Join the room for a book. Triggers `previousMessages` and `userJoined`.
`leaveRoom`	`{ bookId: number, username: string }`	Leave the book room. Triggers `userLeft` for remaining members.
`chatMessage`	`{ bookId: number, userId: string, username: string, content: string }`	Send a message to the room. Triggers `newMessage` for all members.

Server → client

Event	Payload	Description
`previousMessages`	`Message[]`	Up to 50 past messages for the room, sorted newest-first. Sent only to the joining client.
`userJoined`	`{ username: string, timestamp: Date }`	Broadcast to the entire room when a user joins.
`userLeft`	`{ username: string, timestamp: Date }`	Broadcast to the remaining room members when a user leaves.
`newMessage`	`Message`	Broadcast to all room members when a message is saved.

Message object

Messages are persisted in MongoDB via Mongoose (@Schema({ timestamps: true })). Each Message object contains:

Field	Type	Description
`bookId`	`number`	The book this message belongs to.
`userId`	`string`	ID of the user who sent the message.
`username`	`string`	Display name of the sender.
`content`	`string`	Message text.
`createdAt`	`Date`	Timestamp added automatically by Mongoose.
`updatedAt`	`Date`	Timestamp added automatically by Mongoose.

Code example — joining a room and chatting

import { io } from 'socket.io-client';

const socket = io('http://localhost:3000');

// Join the chat room for book 42
socket.emit('joinRoom', {
  bookId: 42,
  userId: 'user_abc123',
  username: 'Ada',
});

// Receive message history once joined
socket.on('previousMessages', (messages) => {
  console.log('Previous messages:', messages);
});

// Someone else joined
socket.on('userJoined', ({ username, timestamp }) => {
  console.log(`${username} joined at ${timestamp}`);
});

// Incoming message
socket.on('newMessage', (message) => {
  console.log(`${message.username}: ${message.content}`);
});

// Send a message
socket.emit('chatMessage', {
  bookId: 42,
  userId: 'user_abc123',
  username: 'Ada',
  content: 'Has anyone reached chapter 10?',
});

// Leave the room
socket.emit('leaveRoom', { bookId: 42, username: 'Ada' });

Speech events

The speech gateway streams audio to Azure OpenAI Realtime (Whisper + GPT-4o) and returns transcripts and AI audio responses. It uses the websocket transport and has a 100 MB max buffer for audio payloads.

Start a session with start before sending any audio. Azure Realtime uses server-side VAD (voice activity detection), so you do not need to signal silence manually.

Client → server

Event	Payload	Description
`start`	`{ systemMessage: string, temperature: number }`	Initialise a new Azure Realtime session. Both fields are required.
`sendAudio`	`{ audio: string }`	Send a Base64-encoded PCM audio chunk. Requires an active session.
`stop`	(none)	End the current session. Any buffered audio is flushed to Azure before closing.

Server → client

Event	Payload	Description
`connectionStatus`	`{ connected: true }`	Emitted immediately on connect.
`sessionStatus`	`{ active: boolean }`	Emitted after `start` (`active: true`) and `stop` (`active: false`).
`transcript`	`string`	Streaming transcript delta, prefixed user input, or status markers such as `<< Session Started >>` and `<< Speech Started >>`.
`audio`	`string`	Base64-encoded PCM audio delta from the AI response, or the string `'Session start'` / `'clear'` as control signals.
`state`	`InputState`	Numeric enum: `0` = Working, `1` = ReadyToStart, `2` = ReadyToStop.
`error`	`string`	Error message string when a session or audio operation fails.
`done`	(none)	Reserved for session completion signalling.

transcript events arrive incrementally. Accumulate deltas on the client to build the full response text. A ---\n transcript event signals the end of a response turn.

Code example — live speech session

import { io } from 'socket.io-client';

const socket = io('http://localhost:3000', {
  transports: ['websocket'],
});

socket.on('connectionStatus', ({ connected }) => {
  if (!connected) return;

  // Start a session once connected
  socket.emit('start', {
    systemMessage: 'You are a helpful reading assistant.',
    temperature: 0.8,
  });
});

socket.on('sessionStatus', ({ active }) => {
  console.log('Session active:', active);
});

// Accumulate transcript deltas
let transcript = '';
socket.on('transcript', (delta: string) => {
  transcript += delta;
  process.stdout.write(delta); // stream to UI
});

// Receive AI audio response chunks (Base64 PCM)
socket.on('audio', (chunk: string) => {
  if (chunk === 'clear' || chunk === 'Session start') return;
  // Decode and pipe to an AudioContext or audio element
  const pcm = Buffer.from(chunk, 'base64');
  playAudioChunk(pcm);
});

socket.on('error', (message: string) => {
  console.error('Speech error:', message);
});

// Send a recorded audio chunk (Base64 PCM, 16-bit, 24 kHz mono)
function sendAudioChunk(pcmBuffer: Buffer) {
  socket.emit('sendAudio', {
    audio: pcmBuffer.toString('base64'),
  });
}

// End the session
function stopSession() {
  socket.emit('stop');
}

Audio format

The server expects and produces 16-bit signed PCM at 24 kHz, mono (s16le). Encode your microphone stream to this format before calling sendAudio. Azure’s response audio arrives in the same format via audio delta events.

Documentation Index

​Connection setup

​Book chat events

​Client → server

​Server → client

​Message object

​Code example — joining a room and chatting

​Speech events

​Client → server

​Server → client

​Code example — live speech session

​Audio format

Connection setup

Book chat events

Client → server

Server → client

Message object

Code example — joining a room and chatting

Speech events

Client → server

Server → client

Code example — live speech session

Audio format