TextEncodeZImageOmni - ComfyUI Built-in Node Documentation

This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHub

The TextEncodeZImageOmni node is an advanced conditioning node that encodes a text prompt along with optional reference images into a conditioning format suitable for image generation models. It can process up to three images, optionally encoding them with a vision encoder and/or a VAE to produce reference latents, and integrates these visual references with the text prompt using a specific template structure.

Inputs

Parameter	Data Type	Required	Description
`clip`	CLIP	Yes	The CLIP model used for tokenizing and encoding the text prompt.
`image_encoder`	CLIPVision	No	An optional vision encoder model. If provided, it will be used to encode the input images, and the resulting embeddings will be added to the conditioning.
`prompt`	STRING	Yes	The text prompt to be encoded. This field supports multiline input and dynamic prompts.
`auto_resize_images`	BOOLEAN	No	When enabled (default: True), input images will be automatically resized based on their pixel area before being passed to the VAE for encoding.
`vae`	VAE	No	An optional VAE model. If provided, it will be used to encode the input images into latent representations, which are added to the conditioning as reference latents.
`image1`	IMAGE	No	The first optional reference image.
`image2`	IMAGE	No	The second optional reference image.
`image3`	IMAGE	No	The third optional reference image.

Note: The node can accept a maximum of three images (image1, image2, image3). The image_encoder and vae inputs are only utilized if at least one image is provided. When auto_resize_images is True and a vae is connected, images are resized to have a total pixel area close to 1024x1024 before encoding.

Outputs

Output Name	Data Type	Description
`CONDITIONING`	CONDITIONING	The final conditioning output, which contains the encoded text prompt and may include encoded image embeddings and/or reference latents if images were provided.

Nodes

​Inputs

​Outputs

Inputs

Outputs