ElevenLabsTextToSpeech - ComfyUI Built-in Node Documentation

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

The ElevenLabs Text to Speech node converts written text into spoken audio using the ElevenLabs API. It allows you to select a specific voice and fine-tune various speech characteristics like stability, speed, and style to generate a customized audio output.

Inputs

Parameter	Data Type	Required	Range	Description
`voice`	CUSTOM	Yes	N/A	Voice to use for speech synthesis. Connect from Voice Selector or Instant Voice Clone.
`text`	STRING	Yes	N/A	The text to convert to speech.
`stability`	FLOAT	No	0.0 - 1.0	Voice stability. Lower values give broader emotional range, higher values produce more consistent but potentially monotonous speech (default: 0.5).
`apply_text_normalization`	COMBO	No	`"auto"` `"on"` `"off"`	Text normalization mode. ‘auto’ lets the system decide, ‘on’ always applies normalization, ‘off’ skips it.
`model`	DYNAMICCOMBO	No	`"eleven_multilingual_v2"` `"eleven_v3"`	Model to use for text-to-speech. Selecting a model reveals its specific parameters.
`language_code`	STRING	No	N/A	ISO-639-1 or ISO-639-3 language code (e.g., ‘en’, ‘es’, ‘fra’). Leave empty for automatic detection (default: "").
`seed`	INT	No	0 - 2147483647	Seed for reproducibility (determinism not guaranteed) (default: 1).
`output_format`	COMBO	No	`"mp3_44100_192"` `"opus_48000_192"`	Audio output format.

Model-Specific Parameters: When the model parameter is set to "eleven_multilingual_v2", the following additional parameters become available:

speed: Speech speed. 1.0 is normal, <1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).
similarity_boost: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).
use_speaker_boost: Boost similarity to the original speaker voice (default: False).
style: Style exaggeration. Higher values increase stylistic expression but may reduce stability (default: 0.0, range: 0.0 - 0.2).

When the model parameter is set to "eleven_v3", the following additional parameters become available:

speed: Speech speed. 1.0 is normal, <1.0 slower, >1.0 faster (default: 1.0, range: 0.7 - 1.3).
similarity_boost: Similarity boost. Higher values make the voice more similar to the original (default: 0.75, range: 0.0 - 1.0).

Outputs

Output Name	Data Type	Description
`audio`	AUDIO	The generated audio from the text-to-speech conversion.

Nodes

​Inputs

​Outputs

Inputs

Outputs