This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHubThe ElevenLabs Speech to Text node transcribes audio files into text. It uses ElevenLabs’ API to convert spoken words into a written transcript, supporting features like automatic language detection, identifying different speakers, and tagging non-speech sounds like music or laughter.
Inputs
| Parameter | Data Type | Required | Range | Description |
|---|---|---|---|---|
audio | AUDIO | Yes | - | Audio to transcribe. |
model | COMBO | Yes | "scribe_v2" | Model to use for transcription. Selecting this model reveals additional parameters. |
tag_audio_events | BOOLEAN | No | - | Annotate sounds like (laughter), (music), etc. in transcript. This parameter is revealed when the "scribe_v2" model is selected. (default: False) |
diarize | BOOLEAN | No | - | Annotate which speaker is talking. This parameter is revealed when the "scribe_v2" model is selected. (default: False) |
diarization_threshold | FLOAT | No | 0.1 - 0.4 | Speaker separation sensitivity. Lower values are more sensitive to speaker changes. This parameter is revealed when the "scribe_v2" model is selected and diarize is enabled. (default: 0.22) |
temperature | FLOAT | No | 0.0 - 2.0 | Randomness control. 0.0 uses model default. Higher values increase randomness. This parameter is revealed when the "scribe_v2" model is selected. (default: 0.0) |
timestamps_granularity | COMBO | No | "word""character""none" | Timing precision for transcript words. This parameter is revealed when the "scribe_v2" model is selected. (default: “word”) |
language_code | STRING | No | - | ISO-639-1 or ISO-639-3 language code (e.g., ‘en’, ‘es’, ‘fra’). Leave empty for automatic detection. (default: "") |
num_speakers | INT | No | 0 - 32 | Maximum number of speakers to predict. Set to 0 for automatic detection. (default: 0) |
seed | INT | No | 0 - 2147483647 | Seed for reproducibility (determinism not guaranteed). (default: 1) |
num_speakers parameter cannot be set to a value greater than 0 when the diarize option is enabled. You must either disable diarize or set num_speakers to 0.
Outputs
| Output Name | Data Type | Description |
|---|---|---|
text | STRING | The transcribed text from the audio. |
language_code | STRING | The detected language code of the audio. |
words_json | STRING | A JSON-formatted string containing detailed word-level information, including timestamps and speaker labels if enabled. |