Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/aliammari1/readrealm/llms.txt

Use this file to discover all available pages before exploring further.

ReadRealm converts book text into audio using Azure OpenAI TTS (the tts deployment). Two HTTP endpoints handle on-demand and streaming audio generation. A separate real-time speech feature — backed by Azure Cognitive Services Realtime — is WebSocket-based.

Configuration

The TTS and real-time speech services are configured through environment variables:
VariableUsed byDescription
AZURE_API_TTS_KEYTTS serviceAPI key for the Azure OpenAI TTS deployment
AZURE_API_TTS_ENDPOINTTTS serviceEndpoint URL for the Azure OpenAI TTS deployment
AZURE_API_TTS_MODELTTS serviceDeployment name (e.g. tts)
AZURE_API_REALTIME_KEYReal-time speechAPI key for the Azure Realtime speech deployment
AZURE_API_REALTIME_ENDPOINTReal-time speechEndpoint URL for the Azure Realtime speech gateway
AZURE_API_REALTIME_MODELReal-time speechDeployment name for real-time voice
These variables are read from the NestJS ConfigService. Map them in your environment or .env file before starting the API.

GET /book/tts/stream/:title

Looks up a book by title on Gutendex, fetches its plain-text content, and streams the TTS audio as a chunked audio/mpeg response. The connection stays open until the full audio is piped through.
Only the first 4,096 characters of the book text are sent to Azure OpenAI for conversion. Very long books are truncated.

Path parameters

title
string
required
URL-encoded book title. The server decodes it before querying Gutendex (e.g. alice%20in%20wonderland).

Response headers

HeaderValue
Content-Typeaudio/mpeg
Transfer-Encodingchunked
Cache-Controlno-cache
Content-Dispositioninline

Response body

A binary MP3 audio stream piped directly from the Azure OpenAI response. Write it to a file or pipe it to a media player.
# Save to a file
curl -o alice.mp3 \
  'http://localhost:3000/book/tts/stream/alice%20in%20wonderland'

# Play inline with mpv (or any media player that reads stdin)
curl -s 'http://localhost:3000/book/tts/stream/alice%20in%20wonderland' | mpv -

Error responses

StatusCondition
404No book with the given title found on Gutendex, or book has no plain-text content
500Azure TTS call failed or stream error occurred

POST /book/ebook

Generates TTS audio from a book body supplied directly in the request. Use this when you already have the book text and do not need the server to fetch it from Gutendex. If the textData field is an empty string, the server substitutes a default CreateBookDto instance before invoking Azure TTS.

Request body

id
number
required
Numeric book ID.
author
string
required
Author name.
title
string
required
Book title.
publicationDate
number
required
Publication year.
numOfPages
number
required
Page count.
coverImage
string
required
Cover image URL.
genre
string
required
Genre.
textData
string
Plain-text content to synthesise. Only the first 4,096 characters are sent to Azure. If empty, a default book object is used.
curl -X POST http://localhost:3000/book/ebook \
  -H 'Content-Type: application/json' \
  -d '{
    "id": 28520,
    "author": "Lewis Carroll",
    "title": "Alice in Wonderland",
    "publicationDate": 1865,
    "numOfPages": 96,
    "coverImage": "https://covers.openlibrary.org/b/id/8739161-M.jpg",
    "genre": "Fantasy",
    "textData": "Alice was beginning to get very tired of sitting by her sister on the bank..."
  }'

Response

A binary audio response body (audio/mpeg) returned by the Azure OpenAI TTS API. The voice used is alloy and the output format is mp3.

Error responses

StatusCondition
404textData is absent and the fallback CreateBookDto also has no text
500Azure TTS call failed

Real-time speech (WebSocket)

ReadRealm also supports two-way real-time voice powered by the Azure Cognitive Services Realtime API. This uses a WebSocket connection managed by the SpeechRealtimeService:
  • The client streams raw PCM audio buffers to the server.
  • The server forwards them to Azure in fixed-size chunks (4,800 bytes at 24 kHz, mono).
  • Azure performs server-side VAD (voice activity detection) using whisper-1 for transcription.
  • Audio responses (response.audio.delta) and transcript deltas are forwarded back to the client in real time via the socket.
  • On stop, the session audio is saved to MP3 using FFmpeg (libmp3lame, 128 kbps).
For connection setup, event names, and payload formats, see WebSockets.

WebSocket events overview

DirectionEventDescription
Server → ClientaudioBase64-encoded audio delta or "Session start" / "clear" control strings
Server → ClienttranscriptIncremental transcript text or status markers
Server → ClientstateInput state change: 0 (Working), 1 (ReadyToStart), 2 (ReadyToStop)
Server → ClienterrorError message string

WebSocket reference

Full connection flow, client events, and payload schemas for both chat and real-time speech.