Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The ElevenLabs Text to Speech node converts written text into spoken audio using the ElevenLabs API. It allows you to select a specific voice and fine-tune various speech characteristics like stability, speed, and style to generate a customized audio output.

Inputs

ParameterData TypeRequiredRangeDescription
voiceCUSTOMYesN/AVoice to use for speech synthesis. Connect from Voice Selector or Instant Voice Clone.
textSTRINGYesN/AThe text to convert to speech.
stabilityFLOATNo0.0 - 1.0Voice stability. Lower values give broader emotional range, higher values produce more consistent but potentially monotonous speech (default: 0.5).
apply_text_normalizationCOMBONo"auto"
"on"
"off"
Text normalization mode. ‘auto’ lets the system decide, ‘on’ always applies normalization, ‘off’ skips it.
modelDYNAMICCOMBONo"eleven_multilingual_v2"
"eleven_v3"
Model to use for text-to-speech. Selecting a model reveals its specific parameters.
language_codeSTRINGNoN/AISO-639-1 or ISO-639-3 language code (e.g., ‘en’, ‘es’, ‘fra’). Leave empty for automatic detection (default: "").
seedINTNo0 - 2147483647Seed for reproducibility (determinism not guaranteed) (default: 1).
output_formatCOMBONo"mp3_44100_192"
"opus_48000_192"
Audio output format.
Model-Specific Parameters: When the model parameter is set to "eleven_multilingual_v2", the following additional parameters become available:
  • speed: Speech speed. 1.0 is normal, <1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).
  • similarity_boost: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).
  • use_speaker_boost: Boost similarity to the original speaker voice (default: False).
  • style: Style exaggeration. Higher values increase stylistic expression but may reduce stability (default: 0.0, range: 0.0 - 0.2).
When the model parameter is set to "eleven_v3", the following additional parameters become available:
  • speed: Speech speed. 1.0 is normal, <1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).
  • similarity_boost: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).

Outputs

Output NameData TypeDescription
audioAUDIOThe generated audio from the text-to-speech conversion.