Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The WanImageToVideo node prepares conditioning and latent representations for video generation tasks. It creates an empty latent space for video generation and can optionally incorporate starting images and CLIP vision outputs to guide the video generation process. The node modifies both positive and negative conditioning inputs based on the provided image and vision data.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-Positive conditioning input for guiding the generation
negativeCONDITIONINGYes-Negative conditioning input for guiding the generation
vaeVAEYes-VAE model for encoding images to latent space
widthINTYes16 to MAX_RESOLUTIONWidth of the output video (default: 832, step: 16)
heightINTYes16 to MAX_RESOLUTIONHeight of the output video (default: 480, step: 16)
lengthINTYes1 to MAX_RESOLUTIONNumber of frames in the video (default: 81, step: 4)
batch_sizeINTYes1 to 4096Number of videos to generate in a batch (default: 1)
clip_vision_outputCLIP_VISION_OUTPUTNo-Optional CLIP vision output for additional conditioning
start_imageIMAGENo-Optional starting image to initialize the video generation
Note: When start_image is provided, the node encodes the image sequence and applies masking to the conditioning inputs. The clip_vision_output parameter, when provided, adds vision-based conditioning to both positive and negative inputs.

Outputs

Output NameData TypeDescription
positiveCONDITIONINGModified positive conditioning with image and vision data incorporated
negativeCONDITIONINGModified negative conditioning with image and vision data incorporated
latentLATENTEmpty latent space tensor ready for video generation