Skip to main content

Live Captions

Prerequisites

To make use of Live Captions, it must be explicitly enabled for your organization. Activation may be subject to additional pricing or service terms.

You can verify whether this feature is available by navigating to dashboard.nanostream.cloud/organisation in your dashboard.
In the Enabled Packages section, locate the entry for Live Captions. If it shows Upgrade needed, please contact us.

Screenshot: Enabled Packages

To activate Live Captions or learn more about available plans, feel free to reach out via nanocosmos.net/contact. We're happy to assist you in finding the best setup for your use case.

Live Caption Security Requirements

Live captions are only available for secure playback. Therefore, by enabling live captions, you will also need to enable the secure feature. Please reach out to our sales team via nanocosmos.net/contact or by email at sales(at)nanocosmos.net if you have any questions.
To learn more about secure playback, visit the dedicated article Secure Playback (H5Live).

Overview

Live Captions convert spoken audio into readable text in real time. This AI-driven feature enhances accessibility and content comprehension across a wide range of live-streaming, especially for:

  • Events with spoken content
  • Viewers in sound-off environments
  • Users who are hard of hearing
  • Corporate, educational, or public-facing broadcasts

This provides dynamic, accurate, and easy-to-follow text output that helps viewers stay engaged even without audio. Live captions start automatically as soon as the stream becomes active. The first caption lines typically appear within 5–7 seconds, depending on the selected ASR engine. Captions stop automatically when the stream ends.

To ensure low-latency and reliable delivery, all captions are produced and transmitted through a dedicated real-time output channel, separate from the video stream.

Please note

Live Captions and the caption switcher are not included in the default H5Live Player UI. This means: they are not embedded automatically. To allow viewers to enable, disable, or style captions, your playback environment must integrate caption handling explicitly. For implementation guidance or UI integration examples, please contact our support team via nanocosmos.net/support

How It Works

During an active stream, the audio is forwarded to the selected Automatic Speech Recognition (ASR) engine. The engine converts speech into text and outputs a continuous caption stream. The H5Live Player synchronizes to this caption feed and displays it to viewers in real-time.

Custom set-up for the ASR service

Some of our ASR services require a 24-hour advance notice. Please contact our sales team via nanocosmos.net/contact to find the best configuration for your business. They will also be happy to give you in-depth advice and recommendations on the ASR types for your use case.

ASR Engines And Langauges

As already explained earlier, Live Captions rely on Automatic Speech Recognition (ASR) engines to convert spoken audio into real-time text. This section explains which ASR provider is available, and which languages it supports. You will also learn the difference between source and target languages and how they are applied during live caption generation.

Supported ASR Engines

Deepgram is an enterprise-grade ASR engine designed for high accuracy and very low latency in real-time captioning. It uses neural-network models optimized for live audio and supports multiple languages for instant transcription.

Supported Languages

The source language is the spoken language of the incoming audio. This language is used by the ASR engine to interpret the speech and generate text. The target language defines the output language of the captions. Only engines that support translation can provide multiple output languages.

Region-specific language codes

Some ASR engines offer region-specific language codes (e.g. es-419 for Latin American Spanish). Use these variants when your target audience is primarily from a specific region and you want improved recognition of regional accents, vocabulary, and spelling conventions.

If you do not require a regional focus, the generic language code (e.g. es) is typically sufficient.

LanguageIDSourceTarget
Bulgarianbg
Catalanca
Czechcs
Danishda
Danish (Denmark)da-DK
Dutchnl
Flemish (Belgium)nl-BE
English (Generic)en
English (United States)en-US
English (Australia)en-AU
English (United Kingdom)en-GB
English (India)en-IN
English (New Zealand)en-NZ
Estonianet
Finnishfi
French (Generic)fr
French (Canada)fr-CA
German (Generic)de
German (Switzerland)de-CH
Greekel
Hindihi
Hungarianhu
Indonesianid
Italianit
Japaneseja
Korean (Generic)ko
Korean (South Korea)ko-KR
Latvianlv
Lithuanianlt
Malayms
Norwegianno
Polishpl
Portuguese (Generic)pt
Portuguese (Brazil)pt-BR
Portuguese (Portugal)pt-PT
Romanianro
Russianru
Slovaksk
Spanish (Generic)es
Spanish (Latin America)es-419
Swedish (Generic)sv
Swedish (Sweden)sv-SE
Turkishtr
Ukrainianuk
Vietnamesevi
Multi target translation

Because nanoStream Live Captions feature supports translation, any source language listed above can be combined with any target language.

Missing a language?

If your desired language is not listed, please get in touch with us via nanocosmos.net/contact or by email at sales(at)nanocosmos.net.
We can evaluate your use case, discuss engine support, and, where possible, include you in upcoming beta programs for additional languages.

Managing Live Captions

good to know

You can modify live captions settings at any time. However, it’s important to note that the stream must be re-ingested for the changes to take effect.

API Integration

Live Captions and translation are controlled via Stream Options, managed through the bintu API.

You can access the requests with the following permission level:

nanoAdminnanoUsernanoReadOnly

Supported API actions

  • POST → Add live captions to a stream
  • GET → Retrieve current caption settings
  • PUT → Update existing settings
  • DELETE → Remove live captions settings

Parameters

  • YOUR_STREAM_ID: the unique ID of your stream in nanoStream Cloud
  • X-BINTU-APIKEY: your API key for authentication
Locate your API Key

To find your API key, please sign in to your nanoStream Cloud/Bintu account and copy your API key here.

Body

  • NAME: must always be set to "captions" to enable Live Captions for the stream.
  • OPTIONS: Contains all configuration options for AI processing, this includes: engine, sourceLanguage, targetLanguages (optional; if translation is required)

Transcription Request Example

bintu/post_stream_options.sh
curl --request POST \
--url https://bintu.nanocosmos.de/stream/%7Bid%7D/options \
--header 'X-BINTU-APIKEY: REPLACE_WITH_YOUR_API_KEY' \
--header 'content-type: application/json' \
--data '{"name":"captions","options":{"engine":"deepgram","sourceLanguage":"en"}}'

Translation Request Example

bintu/post_stream_options.sh
curl --request POST \
--url https://bintu.nanocosmos.de/stream/%7Bid%7D/options \
--header 'X-BINTU-APIKEY: REPLACE_WITH_YOUR_API_KEY' \
--header 'content-type: application/json' \
--data '{"name":"captions","options":{"engine":"deepgram","sourceLanguage":"en","targetLanguages":"en,de"}}'
Advanced Developer bintu API docs

For additional languages, advanced configuration options, and complete request/response samples, please refer to the official bintu API documentation: doc.pages.nanocosmos.de/bintuapi-docs
The API reference provides full details on all available endpoints and workflows for managing stream options, including the topic discussed in this section.

Need assistance?

Need assistance?

We're here to support you throughout your Live Captions integration. If you would like to discuss your Live Captions and translation requirements, pricing, or custom solutions, feel free to contact our sales team via nanocosmos.net/contact or by email at sales(at)nanocosmos.net.

If you require technical help or want to report an issue, simply use our official support form.

We're happy to assist you in finding the best configuration for your workflow.