This file is the deliverable for the first Exercise of the course 194.077 Applied Deep Learning from TU Wien: https://tiss.tuwien.ac.at/course/courseDetails.xhtml?dswid=4761&dsrid=925&courseNr=194077&semester=2019W
My project aims to be a three dimensional generative adversarial network (GAN) for generating voxel models, using a three dimensional capsule network. CNNs are quite good when it comes to detecting features, but they do not take the part-to-whole-relation into account. The following picture illustrates this:
Source: https://towardsdatascience.com/capsule-networks-the-new-deep-learning-network-bd917e6818e8
This picture looks somewhat like a face, but more like an abstract painting, than an actual face. All of the features required for a face are there, but their alignment is odd, so are their relative sizes. For two dimensional settings this is not optimal, but the results of CNNs are still good enough, so it is not a problem. In three dimensional settings, this is different: spatial arrangements are more important, due to the extra dimension, especially when generating models.
Capsule networks were introduced by Geoffrey Hinton, who is not pleased with the pooling operations in CNNs:
The pooling operation used in convolutional neural networks is a big mistake and the fact that it works so well is a disaster. (Source)
They are a relatively new concept, but they have been used for GANs and for 3D data. To the best of my knowledge, this is the first attemt to use a GAN with a capsule network to generate a voxel model.
Machine learning with 3D objects or scenes is a relatively new area, therefore there is not as much annotated data available as for image recognition. Luckily in recent years this started to change, here you can find a good overview of available datasets.
The dataset of my choice is ModelNet40, it consits of 12311 models from 40 categories. The models are polygon meshes and therefore I have to convert them into voxel models first. I'll do this using PyMesh, alternatively I could use binvox together with Gmsh. I was thinking of setting the grid of voxels to 64x64x64, as a compromise between computational effort and quality of the results, but this might be subject to change.
Bring your own method.
Task | Hours |
---|---|
Getting familiar with the data / used libraries | 10 |
In-depth reading of related publications | 10 |
Coding of solution | 25 |
Creating presentation of results | 10 |
- Sabour, Sara, Nicholas Frosst, and Geoffrey E. Hinton. "Dynamic routing between capsules." Advances in neural information processing systems. 2017.
- Jaiswal, Ayush, et al. "Capsulegan: Generative adversarial capsule network." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
- Cheraghian, Ali, and Lars Petersson. "3DCapsule: Extending the Capsule Architecture to Classify 3D Point Clouds." 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019.
- Wu, Jiajun, et al. "Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling." Advances in neural information processing systems. 2016.
- Brock, Andrew, et al. "Generative and discriminative voxel modeling with convolutional neural networks." arXiv preprint arXiv:1608.04236 (2016).
- PyTorch implementation of capsule networks by @gram-ai
- TensorFlow implementation of a GAN using capsule networks by @gusgad
- PyTorch implementation of 3D Point Capsule Networks by @yongheng1991
- Theano implementation of Voxel-Based Variational Autoencoders by @ajbrock
- General resources about 3D machine learining by @timzhang642
I am aware, that I might bite off more than I can chew, but whatever the final result will be, the journey is its own reward 😃