Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The Wan22FunControlToVideo node prepares conditioning and latent representations for video generation using the Wan video model architecture. It processes positive and negative conditioning inputs along with optional reference images and control videos to create the necessary latent space representations for video synthesis. The node handles spatial scaling and temporal dimensions to generate appropriate conditioning data for video models.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-Positive conditioning input for guiding the video generation
negativeCONDITIONINGYes-Negative conditioning input for guiding the video generation
vaeVAEYes-VAE model used for encoding images to latent space
widthINTNo16 to MAX_RESOLUTIONOutput video width in pixels (default: 832, step: 16)
heightINTNo16 to MAX_RESOLUTIONOutput video height in pixels (default: 480, step: 16)
lengthINTNo1 to MAX_RESOLUTIONNumber of frames in the video sequence (default: 81, step: 4)
batch_sizeINTNo1 to 4096Number of video sequences to generate (default: 1)
ref_imageIMAGENo-Optional reference image for providing visual guidance
control_videoIMAGENo-Optional control video for guiding the generation process
Note: The length parameter is processed in chunks of 4 frames, and the node automatically handles temporal scaling for the latent space. When ref_image is provided, it influences the conditioning through reference latents. When control_video is provided, it directly affects the concat latent representation used in conditioning.

Outputs

Output NameData TypeDescription
positiveCONDITIONINGModified positive conditioning with video-specific latent data
negativeCONDITIONINGModified negative conditioning with video-specific latent data
latentLATENTEmpty latent tensor with appropriate dimensions for video generation