Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The WanFirstLastFrameToVideo node creates video conditioning by combining start and end frames with text prompts. It generates a latent representation for video generation by encoding the first and last frames, applying masks to guide the generation process, and incorporating CLIP vision features when available. This node prepares both positive and negative conditioning for video models to generate coherent sequences between specified start and end points.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-Positive text conditioning for guiding the video generation
negativeCONDITIONINGYes-Negative text conditioning for guiding the video generation
vaeVAEYes-VAE model used for encoding images to latent space
widthINTNo16 to MAX_RESOLUTIONOutput video width (default: 832, step: 16)
heightINTNo16 to MAX_RESOLUTIONOutput video height (default: 480, step: 16)
lengthINTNo1 to MAX_RESOLUTIONNumber of frames in the video sequence (default: 81, step: 4)
batch_sizeINTNo1 to 4096Number of videos to generate simultaneously (default: 1)
clip_vision_start_imageCLIP_VISION_OUTPUTNo-CLIP vision features extracted from the start image
clip_vision_end_imageCLIP_VISION_OUTPUTNo-CLIP vision features extracted from the end image
start_imageIMAGENo-Starting frame image for the video sequence
end_imageIMAGENo-Ending frame image for the video sequence
Note: When both start_image and end_image are provided, the node creates a video sequence that transitions between these two frames. The clip_vision_start_image and clip_vision_end_image parameters are optional but when provided, their CLIP vision features are concatenated and applied to both positive and negative conditioning.

Outputs

Output NameData TypeDescription
positiveCONDITIONINGPositive conditioning with applied video frame encoding and CLIP vision features
negativeCONDITIONINGNegative conditioning with applied video frame encoding and CLIP vision features
latentLATENTEmpty latent tensor with dimensions matching the specified video parameters