This Python script uses the Stable Diffusion XL model from the Hugging Face Diffusers library to generate images based on text prompts. Additionally, it utilizes the Google Generative AI Gemini model to generate relevant keywords for each prompt, which are then added as IPTC metadata to the generated images.
Before running the script, ensure that you have the following:
- Python 3.7 or later installed on your system.
- A Google Cloud Platform account with the Gemini API enabled and an API key generated. Replace
YOUR_GEMINI_API_KEY
in the code with your actual Gemini API key. - Google Colab or a local environment with CUDA support for GPU acceleration (recommended for faster image generation).
- Install the required Python packages by running the following command: !pip install diffusers torch accelerate clip pyexiv2 Copy code
- If running on Google Colab, mount your Google Drive:
from google.colab import drive
drive.mount('/content/drive')
Prepare a text file containing the prompts for image generation, with one prompt per line. Upload the text file to the script when prompted. The script will generate num_images (default is 5) for each prompt and save them in the given public folder on your Google Drive. The generated images will have relevant keywords added as IPTC metadata, which can be useful for image search and organization.
You can customize the following parameters in the script: num_images: The number of images to generate for each prompt (default is 5). PUBLIC_FOLDER_ID: The path to the folder where the generated images will be saved ( Note: the folder should be made public)
The script assumes that you have a CUDA-enabled GPU available. If you're running on a CPU, the image generation process will be significantly slower. The script requires a stable internet connection to generate images and fetch keywords from the Gemini model. Generating high-resolution images may consume significant computational resources and time.