Nazgul Salikhova
Email: [email protected]
Group: B22-AAI-02
The goal of this project is to fine-tune a diffusion model using an emoji dataset to enable text-to-image generation of new emojis based on text prompts.
The dataset consists of emojis downloaded from Kaggle Emoji Image Dataset. A subset of 2,000 images (mainly Windows emojis) was used for training.
- Fine-tuning a Stable Diffusion model for emoji generation.
- Emoji dataset preprocessing, including cropping and augmentation.
- Text-to-image generation using custom prompts.
- Diffusion Models: Stable Diffusion (
CompVis/stable-diffusion-v1-4
) - Frameworks: PyTorch, HuggingFace Diffusers
- Libraries:
torchvision
,datasets
,bitsandbytes
- Optimizer: 8-bit Adam optimizer for memory efficiency
- Scheduler: Cosine learning rate scheduler with warm-up steps
-
Dataset Preparation:
- Download emoji images and annotations.
- Preprocess using a custom PyTorch Dataset class.
-
Model Setup:
- Load pretrained Stable Diffusion components.
- Fine-tune UNet with emoji images.
-
Training:
- Train the model over 20 epochs with batched gradient accumulation.
-
Inference:
- Generate images based on custom text prompts.
- Display results and save them to a specified folder.
- Image Size: 256x256
- Batch Size: 4
- Epochs: 20
- Learning Rate: 1e-4
Example Prompts:
- "Angry Unicorn"
- "Dancing Lemon with Sunglasses"
- "Cat with a Top Hat"
- "Jealous Face"
Generated emoji images were evaluated visually for their alignment with the prompts.
Rest of results you can find in .ipynb file.