- Purpose: Prepare the conditioning information needed for video generation, using the Wan 2.1 Fun Control model.
Inputs
| Parameter Name | Required | Data Type | Description | Default Value |
|---|---|---|---|---|
| positive | Yes | CONDITIONING | Standard ComfyUI positive conditioning data, typically from a “CLIP Text Encode” node. The positive prompt describes the content, subject matter, and artistic style that the user envisions for the generated video. | N/A |
| negative | Yes | CONDITIONING | Standard ComfyUI negative conditioning data, typically generated by a “CLIP Text Encode” node. The negative prompt specifies elements, styles, or artifacts that the user wants to avoid in the generated video. | N/A |
| vae | Yes | VAE | Requires a VAE (Variational Autoencoder) model compatible with the Wan 2.1 Fun model family, used for encoding and decoding image/video data. | N/A |
| width | Yes | INT | The desired width of output video frames in pixels, with a default value of 832, minimum value of 16, maximum value determined by nodes.MAX_RESOLUTION, and a step size of 16. | 832 |
| height | Yes | INT | The desired height of output video frames in pixels, with a default value of 480, minimum value of 16, maximum value determined by nodes.MAX_RESOLUTION, and a step size of 16. | 480 |
| length | Yes | INT | The total number of frames in the generated video, with a default value of 81, minimum value of 1, maximum value determined by nodes.MAX_RESOLUTION, and a step size of 4. | 81 |
| batch_size | Yes | INT | The number of videos generated in a single batch, with a default value of 1, minimum value of 1, and maximum value of 4096. | 1 |
| clip_vision_output | No | CLIP_VISION_OUTPUT | (Optional) Visual features extracted by a CLIP vision model, allowing for visual style and content guidance. | None |
| start_image | No | IMAGE | (Optional) An initial image that influences the beginning of the generated video. | None |
| control_video | No | IMAGE | (Optional) Allows users to provide a preprocessed ControlNet reference video that will guide the motion and potential structure of the generated video. | None |
Outputs
| Parameter Name | Data Type | Description |
|---|---|---|
| positive | CONDITIONING | Provides enhanced positive conditioning data, including encoded start_image and control_video. |
| negative | CONDITIONING | Provides negative conditioning data that has also been enhanced, containing the same concat_latent_image. |
| latent | LATENT | A dictionary containing an empty latent tensor with the key “samples”. |