ReadRealm uses Socket.IODocumentation Index
Fetch the complete documentation index at: https://mintlify.com/aliammari1/readrealm/llms.txt
Use this file to discover all available pages before exploring further.
^4.8 on top of NestJS WebSockets to power two real-time features:
- Book chat — per-book discussion rooms with message history
- Speech recognition — streaming audio to Azure for live transcription and AI voice responses
Both gateways currently allow all origins (
cors: { origin: '*' }). Restrict this in production by setting the CORS_ORIGIN environment variable.Connection setup
Connect to the server
Both gateways run on the same NestJS server. Connect once and reuse the socket across features.For the speech gateway, force the
websocket transport (the gateway configures transports: ['websocket']):Book chat events
Chat rooms are keyed by book: joining the room for book42 puts you in the Socket.IO room book_42. All events are scoped to that room.
Client → server
| Event | Payload | Description |
|---|---|---|
joinRoom | { bookId: number, userId: string, username: string } | Join the room for a book. Triggers previousMessages and userJoined. |
leaveRoom | { bookId: number, username: string } | Leave the book room. Triggers userLeft for remaining members. |
chatMessage | { bookId: number, userId: string, username: string, content: string } | Send a message to the room. Triggers newMessage for all members. |
Server → client
| Event | Payload | Description |
|---|---|---|
previousMessages | Message[] | Up to 50 past messages for the room, sorted newest-first. Sent only to the joining client. |
userJoined | { username: string, timestamp: Date } | Broadcast to the entire room when a user joins. |
userLeft | { username: string, timestamp: Date } | Broadcast to the remaining room members when a user leaves. |
newMessage | Message | Broadcast to all room members when a message is saved. |
Message object
Messages are persisted in MongoDB via Mongoose (@Schema({ timestamps: true })). Each Message object contains:
| Field | Type | Description |
|---|---|---|
bookId | number | The book this message belongs to. |
userId | string | ID of the user who sent the message. |
username | string | Display name of the sender. |
content | string | Message text. |
createdAt | Date | Timestamp added automatically by Mongoose. |
updatedAt | Date | Timestamp added automatically by Mongoose. |
Code example — joining a room and chatting
Speech events
The speech gateway streams audio to Azure OpenAI Realtime (Whisper + GPT-4o) and returns transcripts and AI audio responses. It uses thewebsocket transport and has a 100 MB max buffer for audio payloads.
Client → server
| Event | Payload | Description |
|---|---|---|
start | { systemMessage: string, temperature: number } | Initialise a new Azure Realtime session. Both fields are required. |
sendAudio | { audio: string } | Send a Base64-encoded PCM audio chunk. Requires an active session. |
stop | (none) | End the current session. Any buffered audio is flushed to Azure before closing. |
Server → client
| Event | Payload | Description |
|---|---|---|
connectionStatus | { connected: true } | Emitted immediately on connect. |
sessionStatus | { active: boolean } | Emitted after start (active: true) and stop (active: false). |
transcript | string | Streaming transcript delta, prefixed user input, or status markers such as << Session Started >> and << Speech Started >>. |
audio | string | Base64-encoded PCM audio delta from the AI response, or the string 'Session start' / 'clear' as control signals. |
state | InputState | Numeric enum: 0 = Working, 1 = ReadyToStart, 2 = ReadyToStop. |
error | string | Error message string when a session or audio operation fails. |
done | (none) | Reserved for session completion signalling. |
transcript events arrive incrementally. Accumulate deltas on the client to build the full response text. A ---\n transcript event signals the end of a response turn.Code example — live speech session
Audio format
The server expects and produces 16-bit signed PCM at 24 kHz, mono (s16le). Encode your microphone stream to this format before calling sendAudio. Azure’s response audio arrives in the same format via audio delta events.