An attempt at designing a network which generates Movie Scripts (how cool is that!)
I have shortlisted some candidates for acomplishing this taks:
- Seq2seq LSTM with encoders. Idea behind using this is to have multiple levels of abstraction which we will obtain from a seq2seq model. We might use LSTM's with some around 700 memory cells, we get this through experience.
- Variational Autoencoders. Unlike just autoencoders, variational autoencoders have a special property of giving contineous RV's. I want to use this model analogous to how 'Thought vectors' are. With contineous RV's instead of discrete, we can have script with accurate and enriched meaning. Again meaning at what level of abstraction? What kind of meaning? etc are questions I dont know and I hope to find out soon.
- A GAN where Gen produces dialogues and Cop tells if it comes close to the storyline (This will be cool!). We'll need a different loss function here to penalise the network based on the cosine similarity with subsequent story segments for each character.
Two obvious candidates:
- Cornell's Movie Dialog Corpus
- I love Elementary. Its a modern take on /Sherlock Holmes/. Drama, mystery, perfect. So I am going to try to curate a dataset from Elementary Season 1 or 5 subtitles. Unfortunately, I havent been able to find Subtitles for the deaf or hard-of-hearing (SDH) which have character labled dialogs insted of just dialogs. Still looking, if you guys find any let me know.