This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHubThe HunyuanVideo15ImageToVideo node prepares conditioning and latent space data for video generation based on the HunyuanVideo 1.5 model. It creates an initial latent representation for a video sequence and can optionally integrate a starting image or a CLIP vision output to guide the generation process.
Inputs
| Parameter | Data Type | Required | Range | Description |
|---|---|---|---|---|
positive | CONDITIONING | Yes | - | The positive conditioning prompts that describe what the video should contain. |
negative | CONDITIONING | Yes | - | The negative conditioning prompts that describe what the video should avoid. |
vae | VAE | Yes | - | The VAE (Variational Autoencoder) model used to encode the starting image into the latent space. |
width | INT | No | 16 to MAX_RESOLUTION | The width of the output video frames in pixels. Must be divisible by 16. (default: 848) |
height | INT | No | 16 to MAX_RESOLUTION | The height of the output video frames in pixels. Must be divisible by 16. (default: 480) |
length | INT | No | 1 to MAX_RESOLUTION | The total number of frames in the video sequence. (default: 33) |
batch_size | INT | No | 1 to 4096 | The number of video sequences to generate in a single batch. (default: 1) |
start_image | IMAGE | No | - | An optional starting image to initialize the video generation. If provided, it is encoded and used to condition the first frames. |
clip_vision_output | CLIP_VISION_OUTPUT | No | - | Optional CLIP vision embeddings to provide additional visual conditioning for the generation. |
start_image is provided, it is automatically resized to match the specified width and height using bilinear interpolation. The first length frames of the image batch are used. The encoded image is then added to both the positive and negative conditioning as a concat_latent_image with a corresponding concat_mask.
Outputs
| Output Name | Data Type | Description |
|---|---|---|
positive | CONDITIONING | The modified positive conditioning, which may now include the encoded starting image or CLIP vision output. |
negative | CONDITIONING | The modified negative conditioning, which may now include the encoded starting image or CLIP vision output. |
latent | LATENT | An empty latent tensor with dimensions configured for the specified batch size, video length, width, and height. |