This is the project repository for CMU 10-615 Art and Machine Learning Project 3. In this project, we created a music concrete using NSynth, a deep learning enhanced sound synthesis model, and Wav2Clip, an audio-to-image model. Through our work, we seek to explore the themes of fusion and transmutation.
Models: We applied the NSynth model developed by Magenta using this Colab Notebook. We applied Ho-Hsiang Wu's implementation of Wav2Clip.
Datasets: The sounds we used in this project were manually selected from the ESC-50 Dataset and from FreeSound.org.
- Zhouyao Xie: School of Computer Science, Language Technology Institute, Master of Computational Data Science
- Nikhil Yadala: School of Computer Science, Language Technology Institute, Master of Computational Data Science
- Yifan He: College of Fine Arts, School of Music, Music and Technology
- Guannan Tang: College of Engineering, Materials Science Department
Our report is included in this repository. You can also check out our report via this link.
Our presentation slide can be found here.
A video of our final work, which is a music concrete accompanied by Wav2Clip-generated images, can be found here.