Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The HunyuanVideo15ImageToVideo node prepares conditioning and latent space data for video generation based on the HunyuanVideo 1.5 model. It creates an initial latent representation for a video sequence and can optionally integrate a starting image or a CLIP vision output to guide the generation process.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-The positive conditioning prompts that describe what the video should contain.
negativeCONDITIONINGYes-The negative conditioning prompts that describe what the video should avoid.
vaeVAEYes-The VAE (Variational Autoencoder) model used to encode the starting image into the latent space.
widthINTNo16 to MAX_RESOLUTIONThe width of the output video frames in pixels. Must be divisible by 16. (default: 848)
heightINTNo16 to MAX_RESOLUTIONThe height of the output video frames in pixels. Must be divisible by 16. (default: 480)
lengthINTNo1 to MAX_RESOLUTIONThe total number of frames in the video sequence. (default: 33)
batch_sizeINTNo1 to 4096The number of video sequences to generate in a single batch. (default: 1)
start_imageIMAGENo-An optional starting image to initialize the video generation. If provided, it is encoded and used to condition the first frames.
clip_vision_outputCLIP_VISION_OUTPUTNo-Optional CLIP vision embeddings to provide additional visual conditioning for the generation.
Note: When a start_image is provided, it is automatically resized to match the specified width and height using bilinear interpolation. The first length frames of the image batch are used. The encoded image is then added to both the positive and negative conditioning as a concat_latent_image with a corresponding concat_mask.

Outputs

Output NameData TypeDescription
positiveCONDITIONINGThe modified positive conditioning, which may now include the encoded starting image or CLIP vision output.
negativeCONDITIONINGThe modified negative conditioning, which may now include the encoded starting image or CLIP vision output.
latentLATENTAn empty latent tensor with dimensions configured for the specified batch size, video length, width, and height.