This documentation was AI-generated. If you find any errors or have suggestions for improvement, please feel free to contribute! Edit on GitHubThis node prepares data for training by encoding images and text. It takes a list of images and a corresponding list of text captions, then uses a VAE model to convert the images into latent representations and a CLIP model to convert the text into conditioning data. The resulting paired latents and conditioning are output as lists, ready for use in training workflows.
Inputs
| Parameter | Data Type | Required | Range | Description |
|---|---|---|---|---|
images | IMAGE | Yes | N/A | List of images to encode. |
vae | VAE | Yes | N/A | VAE model for encoding images to latents. |
clip | CLIP | Yes | N/A | CLIP model for encoding text to conditioning. |
texts | STRING | No | N/A | List of text captions. Can be length n (matching images), 1 (repeated for all), or omitted (uses empty string). |
- The number of items in the
textslist must be 0, 1, or exactly match the number of items in theimageslist. If it is 0, an empty string is used for all images. If it is 1, that single text is repeated for all images.
Outputs
| Output Name | Data Type | Description |
|---|---|---|
latents | LATENT | List of latent dicts. |
conditioning | CONDITIONING | List of conditioning lists. |