Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The SVD_img2vid_Conditioning node prepares conditioning data for video generation using Stable Video Diffusion. It takes an initial image and processes it through CLIP vision and VAE encoders to create positive and negative conditioning pairs, along with an empty latent space for video generation. This node sets up the necessary parameters for controlling motion, frame rate, and augmentation levels in the generated video.

Inputs

ParameterData TypeRequiredRangeDescription
clip_visionCLIP_VISIONYes-CLIP vision model for encoding the input image
init_imageIMAGEYes-Initial image to use as the starting point for video generation
vaeVAEYes-VAE model for encoding the image into latent space
widthINTYes16 to MAX_RESOLUTIONOutput video width (default: 1024, step: 8)
heightINTYes16 to MAX_RESOLUTIONOutput video height (default: 576, step: 8)
video_framesINTYes1 to 4096Number of frames to generate in the video (default: 14)
motion_bucket_idINTYes1 to 1023Controls the amount of motion in the generated video (default: 127)
fpsINTYes1 to 1024Frames per second for the generated video (default: 6)
augmentation_levelFLOATYes0.0 to 10.0Level of noise augmentation to apply to the input image (default: 0.0, step: 0.01)

Outputs

Output NameData TypeDescription
positiveCONDITIONINGPositive conditioning data containing image embeddings and video parameters
negativeCONDITIONINGNegative conditioning data with zeroed embeddings and video parameters
latentLATENTEmpty latent space tensor ready for video generation