Kandinsky5ImageToVideo - ComfyUI Built-in Node Documentation

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

The Kandinsky5ImageToVideo node prepares conditioning and latent space data for video generation using the Kandinsky model. It creates an empty video latent tensor and can optionally encode a starting image to guide the initial frames of the generated video, modifying the positive and negative conditioning accordingly.

Inputs

Parameter	Data Type	Required	Range	Description
`positive`	CONDITIONING	Yes	N/A	The positive conditioning prompts to guide the video generation.
`negative`	CONDITIONING	Yes	N/A	The negative conditioning prompts to steer the video generation away from certain concepts.
`vae`	VAE	Yes	N/A	The VAE model used to encode the optional starting image into the latent space.
`width`	INT	No	16 to 8192 (step 16)	The width of the output video in pixels (default: 768).
`height`	INT	No	16 to 8192 (step 16)	The height of the output video in pixels (default: 512).
`length`	INT	No	1 to 8192 (step 4)	The number of frames in the video (default: 121).
`batch_size`	INT	No	1 to 4096	The number of video sequences to generate simultaneously (default: 1).
`start_image`	IMAGE	No	N/A	An optional starting image. If provided, it is encoded and used to replace the noisy start of the model’s output latents.

Note: When a start_image is provided, it is automatically resized to match the specified width and height using bilinear interpolation. The first length frames of the image batch are used for encoding. The encoded latent is then injected into both the positive and negative conditioning to guide the video’s initial appearance.

Outputs

Output Name	Data Type	Description
`positive`	CONDITIONING	The modified positive conditioning, potentially updated with encoded start image data.
`negative`	CONDITIONING	The modified negative conditioning, potentially updated with encoded start image data.
`latent`	LATENT	An empty video latent tensor with zeros, shaped for the specified dimensions.
`cond_latent`	LATENT	The clean, encoded latent representation of the provided start images. This is used internally to replace the noisy beginning of the generated video latents.

HunyuanVideo15ImageToVideo - ComfyUI Built-in Node Documentation

LTXVAddGuide - ComfyUI Built-in Node Documentation

Nodes

​Inputs

​Outputs

Inputs

Outputs