This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHubThe TextEncodeZImageOmni node is an advanced conditioning node that encodes a text prompt along with optional reference images into a conditioning format suitable for image generation models. It can process up to three images, optionally encoding them with a vision encoder and/or a VAE to produce reference latents, and integrates these visual references with the text prompt using a specific template structure.
Inputs
| Parameter | Data Type | Required | Range | Description |
|---|---|---|---|---|
clip | CLIP | Yes | The CLIP model used for tokenizing and encoding the text prompt. | |
image_encoder | CLIPVision | No | An optional vision encoder model. If provided, it will be used to encode the input images, and the resulting embeddings will be added to the conditioning. | |
prompt | STRING | Yes | The text prompt to be encoded. This field supports multiline input and dynamic prompts. | |
auto_resize_images | BOOLEAN | No | When enabled (default: True), input images will be automatically resized based on their pixel area before being passed to the VAE for encoding. | |
vae | VAE | No | An optional VAE model. If provided, it will be used to encode the input images into latent representations, which are added to the conditioning as reference latents. | |
image1 | IMAGE | No | The first optional reference image. | |
image2 | IMAGE | No | The second optional reference image. | |
image3 | IMAGE | No | The third optional reference image. |
image1, image2, image3). The image_encoder and vae inputs are only utilized if at least one image is provided. When auto_resize_images is True and a vae is connected, images are resized to have a total pixel area close to 1024x1024 before encoding.
Outputs
| Output Name | Data Type | Description |
|---|---|---|
CONDITIONING | CONDITIONING | The final conditioning output, which contains the encoded text prompt and may include encoded image embeddings and/or reference latents if images were provided. |