Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The SV3D_Conditioning node prepares conditioning data for 3D video generation using the SV3D model. It takes an initial image and processes it through CLIP vision and VAE encoders to create positive and negative conditioning, along with a latent representation. The node generates camera elevation and azimuth sequences for multi-frame video generation based on the specified number of video frames.

Inputs

ParameterData TypeRequiredRangeDescription
clip_visionCLIP_VISIONYes-The CLIP vision model used for encoding the input image
init_imageIMAGEYes-The initial image that serves as the starting point for 3D video generation
vaeVAEYes-The VAE model used for encoding the image into latent space
widthINTNo16 to MAX_RESOLUTIONThe output width for the generated video frames (default: 576, must be divisible by 8)
heightINTNo16 to MAX_RESOLUTIONThe output height for the generated video frames (default: 576, must be divisible by 8)
video_framesINTNo1 to 4096The number of frames to generate for the video sequence (default: 21)
elevationFLOATNo-90.0 to 90.0The camera elevation angle in degrees for the 3D view (default: 0.0)

Outputs

Output NameData TypeDescription
positiveCONDITIONINGThe positive conditioning data containing image embeddings and camera parameters for generation
negativeCONDITIONINGThe negative conditioning data with zeroed embeddings for contrastive generation
latentLATENTAn empty latent tensor with dimensions matching the specified video frames and resolution