This is the PyTorch implementation of JoJoGAN: One Shot Face Stylization.
Abstract:
While there have been recent advances in few-shot image stylization, these methods fail to capture stylistic details that are obvious to humans. Details such as the shape of the eyes, the boldness of the lines, are especially difficult for a model to learn, especially so under a limited data setting. In this work, we aim to perform one-shot image stylization that gets the details right. Given a reference style image, we approximate paired real data using GAN inversion and finetune a pretrained StyleGAN using that approximate paired data. We then encourage the StyleGAN to generalize so that the learned style can be applied to all other images.
Follow this YouTube tutorial to understand the installation process more easily and if you have any questions feel free to join my discord and ask there. Codes are mostly taken from the official google colab, and modified for local use.
Step 0: Download anaconda
Download this repository
Step 1:
conda create -n jojo python=3.7
conda activate jojo
cd <your codes file directory here>
Step 2 option 1: 30 series NVIDIA GPU
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
Step 2 option 2: none 30 series NVIDIA GPU
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
Step 2 option 3: CPU only (no NVIDIA GPU)
conda install pytorch torchvision torchaudio cpuonly -c pytorch
Step 3
pip install -r requirements.txt
pip install cmake
pip install dlib==19.20
conda install -c conda-forge ffmpeg
checkpoints:
- stylegan2-ffhq-config-f.pt
- e4e_ffhq_encode.pt
- restyle_psp_ffhq_encode.pt
- dlibshape_predictor_68_face_landmarks.dat
pretrained style models (optional):
- arcane_caitlyn.pt
- arcane_caitlyn_preserve_color.pt
- arcane_jinx_preserve_color.pt
- arcane_jinx.pt
- arcane_multi_preserve_color.pt
- arcane_multi.pt
- sketch_multi.pt
- disney.pt
- disney_preserve_color.pt
- jojo.pt
- jojo_preserve_color.pt
- jojo_yasuho.pt
- jojo_yasuho_preserve_color.pt
- art.pt
model structure
📂JoJoGAN/ # this is root
├── 📂models/
│ ├── 📜stylegan2-ffhq-config-f.pt
│ ├── 📜e4e_ffhq_encode.pt
│ ├── 📜restyle_psp_ffhq_encode.pt
│ ├── 📜dlibshape_predictor_68_face_landmarks.dat
│ ├── 📜<any pretrained style models>
│ │...
│...
Download the pretrained style model and put it under the models
folder like in the diagram shown above. Put the input image in the test_input
folder, in the following image_name
, you don't need to provide the file path, just the file name.
python evaluate.py --input <image_name> --model_name <model_name> --seed <random_seed> --device <cuda/cpu>
eg.
python evaluate.py --device cuda --input iu.jpeg --model_name jojo --seed 3000
Put the input video in the test_input
folder, in the following video_name
, you don't need to provide the file path, just the file name.
python evaluate_video.py --input <video_name> --model_name <model_name> --seed <random_seed> --device <cuda/cpu>
eg.
python evaluate_video.py --device cuda --input elon.mp4 --model_name jojo --seed 3000
Add images with the same style into the folder style_images
. See inside the folder for example.
python train_custom_style.py --model_name <new_name> --alpha <alpha_value> --preserve_color <True/False> --num_iter <number_of_iterations> --device <cuda/cpu>
model_name
: give your new model a name, maybe based on the style images?alpha
: the alpha value that'll determine the strength of the style.0
= strongest,1
= weakest. Float value between 0 and 1preserve_color
: To whether preserve the color from the style images. This should be a booleanTrue
orFalse
num_iter
: Number of iterations for the training. Usually300
~500
iter would be finedevice
: If you don't have NVIDIA GPU with CUDA, usecpu
. Otherwise,cuda
(basically the default and you don't need to declare)
eg.
python train_custom_style.py --model_name custom --alpha 0.0 --preserve_color False --num_iter 300 --device cuda
To evaluate the model, follow the previous step will do, just change the model_name
to the one you just created. It'll just be like:
python evaluate.py --device cuda --input iu.jpeg --model_name custom --seed 3000
When your style's face cannot be detected you can try using force_train.py
. This is how I trained the colossal model. Save this image, drag it into photoshop or photopea, match the style image you want with the features of this colossal titan. Eyes to eyes, nose to nose, ears to ears, jaws to jaws if possible. The more accurate the better. Drag it into the style_images_aligned
folder and do:
python force_train.py --model_name <insert_name_here> --force_name <insert_style_image_here> --num_iter 300 --device cuda
eg.
python force_train.py --model_name colossal --force_name colossal --num_iter 300 --device cuda
and after getting the trained model, you can evaluate normally like any other models.
my fork edits end here.
-
2022-02-03
Updated the paper. Improved stylization quality using discriminator perceptual loss. Added sketch model -
2021-12-26
Added wandb logging. Fixed finetuning bug which begins finetuning from previously loaded checkpoint instead of the base face model. Added art model -
2021-12-25
Added arcane_multi model which is trained on 4 arcane faces instead of 1 (if anyone has more clean data, let me know!). Better preserves features -
2021-12-23
Paper is uploaded to arxiv. -
2021-12-22
Integrated into Huggingface Spaces 🤗 using Gradio. Try it out -
2021-12-22
Added pydrive authentication to avoid download limits from gdrive! Fixed running on cpu on colab.
Everything to get started is in the colab notebook.
If you use this code or ideas from our paper, please cite our paper:
@article{chong2021jojogan,
title={JoJoGAN: One Shot Face Stylization},
author={Chong, Min Jin and Forsyth, David},
journal={arXiv preprint arXiv:2112.11641},
year={2021}
}
This code borrows from StyleGAN2 by rosalinity, e4e. Some snippets of colab code from StyleGAN-NADA