This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHubThe LTXVConcatAVLatent node combines a video latent representation and an audio latent representation into a single, concatenated latent output. It merges the
samples tensors from both inputs and, if present, their noise_mask tensors as well, preparing them for further processing in a video generation pipeline.
Inputs
| Parameter | Data Type | Required | Range | Description |
|---|---|---|---|---|
video_latent | LATENT | Yes | The latent representation of the video data. | |
audio_latent | LATENT | Yes | The latent representation of the audio data. |
samples tensors from the video_latent and audio_latent inputs are concatenated. If either input contains a noise_mask, it will be used; if one is missing, a mask of ones (same shape as the corresponding samples) is created for it. The resulting masks are then also concatenated.
Outputs
| Output Name | Data Type | Description |
|---|---|---|
latent | LATENT | A single latent dictionary containing the concatenated samples and, if applicable, the concatenated noise_mask from the video and audio inputs. |