Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The WanAnimateToVideo node generates video content by combining multiple conditioning inputs including pose references, facial expressions, and background elements. It processes various video inputs to create coherent animated sequences while maintaining temporal consistency across frames. The node handles latent space operations and can extend existing videos by continuing motion patterns.

Inputs

ParameterData TypeRequiredRangeDescription
positiveCONDITIONINGYes-Positive conditioning for guiding the generation towards desired content
negativeCONDITIONINGYes-Negative conditioning for steering the generation away from unwanted content
vaeVAEYes-VAE model used for encoding and decoding image data
widthINTNo16 to MAX_RESOLUTIONOutput video width in pixels (default: 832, step: 16)
heightINTNo16 to MAX_RESOLUTIONOutput video height in pixels (default: 480, step: 16)
lengthINTNo1 to MAX_RESOLUTIONNumber of frames to generate (default: 77, step: 4)
batch_sizeINTNo1 to 4096Number of videos to generate simultaneously (default: 1)
clip_vision_outputCLIP_VISION_OUTPUTNo-Optional CLIP vision model output for additional conditioning
reference_imageIMAGENo-Reference image used as starting point for generation
face_videoIMAGENo-Video input providing facial expression guidance
pose_videoIMAGENo-Video input providing pose and motion guidance
continue_motion_max_framesINTNo1 to MAX_RESOLUTIONMaximum number of frames to continue from previous motion (default: 5, step: 4)
background_videoIMAGENo-Background video to composite with generated content
character_maskMASKNo-Mask defining character regions for selective processing
continue_motionIMAGENo-Previous motion sequence to continue from for temporal consistency
video_frame_offsetINTNo0 to MAX_RESOLUTIONThe amount of frames to seek in all the input videos. Used for generating longer videos by chunk. Connect to the video_frame_offset output of the previous node for extending a video. (default: 0, step: 1)
Parameter Constraints:
  • When pose_video is provided and trim_to_pose_video logic is active, the output length will be adjusted to match the pose video duration
  • face_video is automatically resized to 512x512 resolution when processed
  • continue_motion frames are limited by continue_motion_max_frames parameter
  • Input videos (face_video, pose_video, background_video, character_mask) are offset by video_frame_offset before processing
  • If character_mask contains only one frame, it will be repeated across all frames
  • When clip_vision_output is provided, it’s applied to both positive and negative conditioning

Outputs

Output NameData TypeDescription
positiveCONDITIONINGModified positive conditioning with additional video context
negativeCONDITIONINGModified negative conditioning with additional video context
latentLATENTGenerated video content in latent space format
trim_latentINTLatent space trimming information for downstream processing
trim_imageINTImage space trimming information for reference motion frames
video_frame_offsetINTUpdated frame offset for continuing video generation in chunks