Skip to main content
This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub
The TextEncodeZImageOmni node is an advanced conditioning node that encodes a text prompt along with optional reference images into a conditioning format suitable for image generation models. It can process up to three images, optionally encoding them with a vision encoder and/or a VAE to produce reference latents, and integrates these visual references with the text prompt using a specific template structure.

Inputs

ParameterData TypeRequiredRangeDescription
clipCLIPYesThe CLIP model used for tokenizing and encoding the text prompt.
image_encoderCLIPVisionNoAn optional vision encoder model. If provided, it will be used to encode the input images, and the resulting embeddings will be added to the conditioning.
promptSTRINGYesThe text prompt to be encoded. This field supports multiline input and dynamic prompts.
auto_resize_imagesBOOLEANNoWhen enabled (default: True), input images will be automatically resized based on their pixel area before being passed to the VAE for encoding.
vaeVAENoAn optional VAE model. If provided, it will be used to encode the input images into latent representations, which are added to the conditioning as reference latents.
image1IMAGENoThe first optional reference image.
image2IMAGENoThe second optional reference image.
image3IMAGENoThe third optional reference image.
Note: The node can accept a maximum of three images (image1, image2, image3). The image_encoder and vae inputs are only utilized if at least one image is provided. When auto_resize_images is True and a vae is connected, images are resized to have a total pixel area close to 1024x1024 before encoding.

Outputs

Output NameData TypeDescription
CONDITIONINGCONDITIONINGThe final conditioning output, which contains the encoded text prompt and may include encoded image embeddings and/or reference latents if images were provided.