HunyuanVideo15ImageToVideo - ComfyUI Built-in Node Documentation

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

The HunyuanVideo15ImageToVideo node prepares conditioning and latent space data for video generation based on the HunyuanVideo 1.5 model. It creates an initial latent representation for a video sequence and can optionally integrate a starting image or a CLIP vision output to guide the generation process.

Inputs

Parameter	Data Type	Required	Range	Description
`positive`	CONDITIONING	Yes	-	The positive conditioning prompts that describe what the video should contain.
`negative`	CONDITIONING	Yes	-	The negative conditioning prompts that describe what the video should avoid.
`vae`	VAE	Yes	-	The VAE (Variational Autoencoder) model used to encode the starting image into the latent space.
`width`	INT	No	16 to MAX_RESOLUTION	The width of the output video frames in pixels. Must be divisible by 16. (default: 848)
`height`	INT	No	16 to MAX_RESOLUTION	The height of the output video frames in pixels. Must be divisible by 16. (default: 480)
`length`	INT	No	1 to MAX_RESOLUTION	The total number of frames in the video sequence. (default: 33)
`batch_size`	INT	No	1 to 4096	The number of video sequences to generate in a single batch. (default: 1)
`start_image`	IMAGE	No	-	An optional starting image to initialize the video generation. If provided, it is encoded and used to condition the first frames.
`clip_vision_output`	CLIP_VISION_OUTPUT	No	-	Optional CLIP vision embeddings to provide additional visual conditioning for the generation.

Note: When a start_image is provided, it is automatically resized to match the specified width and height using bilinear interpolation. The first length frames of the image batch are used. The encoded image is then added to both the positive and negative conditioning as a concat_latent_image with a corresponding concat_mask.

Outputs

Output Name	Data Type	Description
`positive`	CONDITIONING	The modified positive conditioning, which may now include the encoded starting image or CLIP vision output.
`negative`	CONDITIONING	The modified negative conditioning, which may now include the encoded starting image or CLIP vision output.
`latent`	LATENT	An empty latent tensor with dimensions configured for the specified batch size, video length, width, and height.

HunyuanImageToVideo - ComfyUI Built-in Node Documentation

Kandinsky5ImageToVideo - ComfyUI Built-in Node Documentation

Nodes

​Inputs

​Outputs

Inputs

Outputs