Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The LTXVConcatAVLatent node combines a video latent representation and an audio latent representation into a single, concatenated latent output. It merges the samples tensors from both inputs and, if present, their noise_mask tensors as well, preparing them for further processing in a video generation pipeline.

Inputs

ParameterData TypeRequiredRangeDescription
video_latentLATENTYesThe latent representation of the video data.
audio_latentLATENTYesThe latent representation of the audio data.
Note: The samples tensors from the video_latent and audio_latent inputs are concatenated. If either input contains a noise_mask, it will be used; if one is missing, a mask of ones (same shape as the corresponding samples) is created for it. The resulting masks are then also concatenated.

Outputs

Output NameData TypeDescription
latentLATENTA single latent dictionary containing the concatenated samples and, if applicable, the concatenated noise_mask from the video and audio inputs.