Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The WanHuMoImageToVideo node converts images to video sequences by generating latent representations for video frames. It processes conditioning inputs and can incorporate reference images and audio embeddings to influence the video generation. The node outputs modified conditioning data and latent representations suitable for video synthesis.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-Positive conditioning input that guides the video generation toward desired content
negativeCONDITIONINGYes-Negative conditioning input that steers the video generation away from unwanted content
vaeVAEYes-VAE model used for encoding reference images into latent space
widthINTYes16 to MAX_RESOLUTIONWidth of the output video frames in pixels (default: 832, must be divisible by 16)
heightINTYes16 to MAX_RESOLUTIONHeight of the output video frames in pixels (default: 480, must be divisible by 16)
lengthINTYes1 to MAX_RESOLUTIONNumber of frames in the generated video sequence (default: 97)
batch_sizeINTYes1 to 4096Number of video sequences to generate simultaneously (default: 1)
audio_encoder_outputAUDIOENCODEROUTPUTNo-Optional audio encoding data that can influence video generation based on audio content
ref_imageIMAGENo-Optional reference image used to guide the video generation style and content
Note: When a reference image is provided, it gets encoded and added to both positive and negative conditioning. When audio encoder output is provided, it gets processed and incorporated into the conditioning data.

Outputs

Output NameData TypeDescription
positiveCONDITIONINGModified positive conditioning with reference image and/or audio embeddings incorporated
negativeCONDITIONINGModified negative conditioning with reference image and/or audio embeddings incorporated
latentLATENTGenerated latent representation containing the video sequence data