Taskboard

Taskboard for VisualWord2Vec!

Port the relationship modeling code to Github (5 June)
Read and understand both the modeling and training (MATLAB) code (12 June)
Clean up and document the training code to replicate results in the ICCV paper (15 June)
Install and setup Caffe framework (already setup) (12 June)
Remove irrelevant parts for the current work (19 June)
Generate the AP curve based on amount of training data (deferred)
Re-train word2vec on MS COCO dataset to account for tokenization/lemmatization/case issues (June 23)
Get results using new word2vec model after fixing best thresholds from validation**(July 10)**
Try with/without \n while training word2vec, get numbers after validation**(July 10)**
~~Understand Jiasen's word2vec_image code to know how to tweak word2vec~~
Search and setup tools for refining the neural network (22 June)(23 Aug)
Cluster the visual vectors, refine trained CNN to classify into one of the clusters, get new word2vec features (26 June)(23 Aug)
t-SNE embedding for the relations word to see the difference (20 Aug)
Get the common sense task accuracies before and after training using cluster ids (28 Aug)
Setup clustering (kmeans, for now) into C (28 Aug)
Setup common sense task (text features only) into C (28 Aug)
Vary number of clusters to get different accuracies on the common sense task
Fine tune from both MS COCO and Wiki datasets
Different learning rates for inner and outer vectors

How to incorporate visual features in word2vec?

Cluster the abstract scene vectors into N clusters, associating each vector with a cluster id
Train the word2vec using a visual text (for example, MS COCO)
Refine the network by replacing the last layer with N outputs, predicting the cluster id and back-propagating the error
Get the new word2vec representations of the words and evaluate on the common sense task
Also visualize the words associated with clip art scenes through t-sne (or any other) before and after refining