Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/aliammari1/readrealm/llms.txt

Use this file to discover all available pages before exploring further.

ReadRealm uses Socket.IO ^4.8 on top of NestJS WebSockets to power two real-time features:
  • Book chat — per-book discussion rooms with message history
  • Speech recognition — streaming audio to Azure for live transcription and AI voice responses
Both gateways currently allow all origins (cors: { origin: '*' }). Restrict this in production by setting the CORS_ORIGIN environment variable.

Connection setup

1

Install the Socket.IO client

npm install socket.io-client
2

Connect to the server

Both gateways run on the same NestJS server. Connect once and reuse the socket across features.
import { io } from 'socket.io-client';

const socket = io('http://localhost:3000', {
  // Pass your JWT in the auth handshake for future auth middleware
  auth: { token: '<your-jwt>' },
});

socket.on('connect', () => {
  console.log('Connected:', socket.id);
});
For the speech gateway, force the websocket transport (the gateway configures transports: ['websocket']):
const speechSocket = io('http://localhost:3000', {
  transports: ['websocket'],
  auth: { token: '<your-jwt>' },
});
3

Listen for the connection status event

The speech gateway emits connectionStatus immediately on connect:
speechSocket.on('connectionStatus', (payload) => {
  // payload: { connected: true }
  console.log('Speech gateway ready:', payload);
});

Book chat events

Chat rooms are keyed by book: joining the room for book 42 puts you in the Socket.IO room book_42. All events are scoped to that room.

Client → server

EventPayloadDescription
joinRoom{ bookId: number, userId: string, username: string }Join the room for a book. Triggers previousMessages and userJoined.
leaveRoom{ bookId: number, username: string }Leave the book room. Triggers userLeft for remaining members.
chatMessage{ bookId: number, userId: string, username: string, content: string }Send a message to the room. Triggers newMessage for all members.

Server → client

EventPayloadDescription
previousMessagesMessage[]Up to 50 past messages for the room, sorted newest-first. Sent only to the joining client.
userJoined{ username: string, timestamp: Date }Broadcast to the entire room when a user joins.
userLeft{ username: string, timestamp: Date }Broadcast to the remaining room members when a user leaves.
newMessageMessageBroadcast to all room members when a message is saved.

Message object

Messages are persisted in MongoDB via Mongoose (@Schema({ timestamps: true })). Each Message object contains:
FieldTypeDescription
bookIdnumberThe book this message belongs to.
userIdstringID of the user who sent the message.
usernamestringDisplay name of the sender.
contentstringMessage text.
createdAtDateTimestamp added automatically by Mongoose.
updatedAtDateTimestamp added automatically by Mongoose.

Code example — joining a room and chatting

import { io } from 'socket.io-client';

const socket = io('http://localhost:3000');

// Join the chat room for book 42
socket.emit('joinRoom', {
  bookId: 42,
  userId: 'user_abc123',
  username: 'Ada',
});

// Receive message history once joined
socket.on('previousMessages', (messages) => {
  console.log('Previous messages:', messages);
});

// Someone else joined
socket.on('userJoined', ({ username, timestamp }) => {
  console.log(`${username} joined at ${timestamp}`);
});

// Incoming message
socket.on('newMessage', (message) => {
  console.log(`${message.username}: ${message.content}`);
});

// Send a message
socket.emit('chatMessage', {
  bookId: 42,
  userId: 'user_abc123',
  username: 'Ada',
  content: 'Has anyone reached chapter 10?',
});

// Leave the room
socket.emit('leaveRoom', { bookId: 42, username: 'Ada' });

Speech events

The speech gateway streams audio to Azure OpenAI Realtime (Whisper + GPT-4o) and returns transcripts and AI audio responses. It uses the websocket transport and has a 100 MB max buffer for audio payloads.
Start a session with start before sending any audio. Azure Realtime uses server-side VAD (voice activity detection), so you do not need to signal silence manually.

Client → server

EventPayloadDescription
start{ systemMessage: string, temperature: number }Initialise a new Azure Realtime session. Both fields are required.
sendAudio{ audio: string }Send a Base64-encoded PCM audio chunk. Requires an active session.
stop(none)End the current session. Any buffered audio is flushed to Azure before closing.

Server → client

EventPayloadDescription
connectionStatus{ connected: true }Emitted immediately on connect.
sessionStatus{ active: boolean }Emitted after start (active: true) and stop (active: false).
transcriptstringStreaming transcript delta, prefixed user input, or status markers such as << Session Started >> and << Speech Started >>.
audiostringBase64-encoded PCM audio delta from the AI response, or the string 'Session start' / 'clear' as control signals.
stateInputStateNumeric enum: 0 = Working, 1 = ReadyToStart, 2 = ReadyToStop.
errorstringError message string when a session or audio operation fails.
done(none)Reserved for session completion signalling.
transcript events arrive incrementally. Accumulate deltas on the client to build the full response text. A ---\n transcript event signals the end of a response turn.

Code example — live speech session

import { io } from 'socket.io-client';

const socket = io('http://localhost:3000', {
  transports: ['websocket'],
});

socket.on('connectionStatus', ({ connected }) => {
  if (!connected) return;

  // Start a session once connected
  socket.emit('start', {
    systemMessage: 'You are a helpful reading assistant.',
    temperature: 0.8,
  });
});

socket.on('sessionStatus', ({ active }) => {
  console.log('Session active:', active);
});

// Accumulate transcript deltas
let transcript = '';
socket.on('transcript', (delta: string) => {
  transcript += delta;
  process.stdout.write(delta); // stream to UI
});

// Receive AI audio response chunks (Base64 PCM)
socket.on('audio', (chunk: string) => {
  if (chunk === 'clear' || chunk === 'Session start') return;
  // Decode and pipe to an AudioContext or audio element
  const pcm = Buffer.from(chunk, 'base64');
  playAudioChunk(pcm);
});

socket.on('error', (message: string) => {
  console.error('Speech error:', message);
});

// Send a recorded audio chunk (Base64 PCM, 16-bit, 24 kHz mono)
function sendAudioChunk(pcmBuffer: Buffer) {
  socket.emit('sendAudio', {
    audio: pcmBuffer.toString('base64'),
  });
}

// End the session
function stopSession() {
  socket.emit('stop');
}

Audio format

The server expects and produces 16-bit signed PCM at 24 kHz, mono (s16le). Encode your microphone stream to this format before calling sendAudio. Azure’s response audio arrives in the same format via audio delta events.