We propose a novel approach to generate 3D scenes (both animated and static) from text using a Transformer based NLP architecture and non-differential renderer.
- Blender (version 2.78c) (Please see details from here)
- Pytorch (==1.2.0)
- Transformers (==3.0.2)
- Numpy
- Pickel
- OpenCV
A simplistic .yml file has been added for reference.
All other examples are under '\Output' folder.
To run prediction on the model, Mstatic -
cd scripts
python runner.py --type "image" --target "image" --pred_count 15
Here, pred_count specifies number of prediction to run. For evaluation, 64 sample test files have been attached.
To run prediction on the model, Manimated -
cd scripts
python runner.py --type "video" --target "video" --pred_count 15
To run prediction on the model, Mfull -
cd scripts
python runner.py --type "combined" --target "image" --pred_count 15
Replace target with video to generate videos instead. An image takes around 3-4 seconds to be rendered. A video takes around 2-4 minutes to be rendered.
All generated images(static scenes) and videos(animated scenes) are saved in the output folder.
To run evaluation on the model, Mstatic -
cd scripts
python runner.py --type "image" --sector "evaluate"
To run evaluation on the model, Manimated -
cd scripts
python runner.py --type "video" --sector "evaluate"
To run evaluation on the model, Mfull -
cd scripts
python runner.py --type "combined" --sector "evaluate"
For the model, Mstatic -
cd scripts
python runner.py --type "image" --sector "predict_single" --description <YOUR_DESCRIPTION>
For the model, Manimated -
cd scripts
python runner.py --type "video" --sector "predict_single" --description <YOUR_DESCRIPTION>
For the model, Mfull -
cd scripts
python runner.py --type "combined" --sector "predict_single" --description <YOUR_DESCRIPTION>
Our dataset is generated on top of the CLEVR dataset. The CLEVR dataset includes some JSON files that include all the scenes they used. We take these JSON files and generate 13 kinds of scene descriptions for each of these files. Follow these steps to generate the dataset:
- Download the CLEVR dataset
- Pass the JSON file path to line 5
- If you want to generate image descriptions, use the templates from description_template_numpy.py file, and for video descriptions, use the templates from description_template_video_numpy.py file in line 1
- Run the python file
python generate_description_numpy.py
- Pre-calculate the pickle files of TransformerXL output for training and testing the model using this notebook.