Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The StableZero123_Conditioning node processes an input image and camera angles to generate conditioning data and latent representations for 3D model generation. It uses a CLIP vision model to encode the image features, combines them with camera embedding information based on elevation and azimuth angles, and produces positive and negative conditioning along with a latent representation for downstream 3D generation tasks.

Inputs

ParameterData TypeRequiredRangeDescription
clip_visionCLIP_VISIONYes-The CLIP vision model used to encode image features
init_imageIMAGEYes-The input image to be processed and encoded
vaeVAEYes-The VAE model used for encoding pixels to latent space
widthINTNo16 to MAX_RESOLUTIONOutput width for the latent representation (default: 256, must be divisible by 8)
heightINTNo16 to MAX_RESOLUTIONOutput height for the latent representation (default: 256, must be divisible by 8)
batch_sizeINTNo1 to 4096Number of samples to generate in the batch (default: 1)
elevationFLOATNo-180.0 to 180.0Camera elevation angle in degrees (default: 0.0)
azimuthFLOATNo-180.0 to 180.0Camera azimuth angle in degrees (default: 0.0)
Note: The width and height parameters must be divisible by 8 as the node automatically divides them by 8 to create the latent representation dimensions.

Outputs

Output NameData TypeDescription
positiveCONDITIONINGPositive conditioning data combining image features and camera embeddings
negativeCONDITIONINGNegative conditioning data with zero-initialized features
latentLATENTLatent representation with dimensions [batch_size, 4, height//8, width//8]