Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The WanSoundImageToVideoExtend node extends image-to-video generation by incorporating audio conditioning and reference images. It takes positive and negative conditioning along with video latent data and optional audio embeddings to generate extended video sequences. The node processes these inputs to create coherent video outputs that can be synchronized with audio cues.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-Positive conditioning prompts that guide what the video should include
negativeCONDITIONINGYes-Negative conditioning prompts that specify what the video should avoid
vaeVAEYes-Variational Autoencoder used for encoding and decoding video frames
lengthINTYes1 to MAX_RESOLUTIONNumber of frames to generate for the video sequence (default: 77, step: 4)
video_latentLATENTYes-Initial video latent representation that serves as the starting point for extension
audio_encoder_outputAUDIOENCODEROUTPUTNo-Optional audio embeddings that can influence video generation based on sound characteristics
ref_imageIMAGENo-Optional reference image that provides visual guidance for the video generation
control_videoIMAGENo-Optional control video that can guide the motion and style of the generated video

Outputs

Output NameData TypeDescription
positiveCONDITIONINGProcessed positive conditioning with video context applied
negativeCONDITIONINGProcessed negative conditioning with video context applied
latentLATENTGenerated video latent representation containing the extended video sequence