Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The HunyuanImageToVideo node converts images into video latent representations using the Hunyuan video model. It takes conditioning inputs and optional starting images to generate video latents that can be further processed by video generation models. The node supports different guidance types for controlling how the starting image influences the video generation process.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-Positive conditioning input for guiding the video generation
vaeVAEYes-VAE model used for encoding images into latent space
widthINTYes16 to MAX_RESOLUTIONWidth of the output video in pixels (default: 848, step: 16)
heightINTYes16 to MAX_RESOLUTIONHeight of the output video in pixels (default: 480, step: 16)
lengthINTYes1 to MAX_RESOLUTIONNumber of frames in the output video (default: 53, step: 4)
batch_sizeINTYes1 to 4096Number of videos to generate simultaneously (default: 1)
guidance_typeCOMBOYes”v1 (concat)"
"v2 (replace)"
"custom”
Method for incorporating the starting image into video generation
start_imageIMAGENo-Optional starting image to initialize the video generation
Note: When start_image is provided, the node uses different guidance methods based on the selected guidance_type:
  • “v1 (concat)”: Concatenates the image latent with the video latent
  • “v2 (replace)”: Replaces initial video frames with the image latent
  • “custom”: Uses the image as a reference latent for guidance

Outputs

Output NameData TypeDescription
positiveCONDITIONINGModified positive conditioning with image guidance applied when start_image is provided
latentLATENTVideo latent representation ready for further processing by video generation models