-
Notifications
You must be signed in to change notification settings - Fork 3
Taskboard
Satwik Kottur edited this page Sep 1, 2015
·
17 revisions
Port the relationship modeling code to Github (5 June)Read and understand both the modeling and training (MATLAB) code (12 June)Clean up and document the training code to replicate results in the ICCV paper (15 June)Install and setup Caffe framework (already setup) (12 June)Remove irrelevant parts for the current work (19 June)Generate the AP curve based on amount of training data (deferred)Re-train word2vec on MS COCO dataset to account for tokenization/lemmatization/case issues (June 23)Get results using new word2vec model after fixing best thresholds from validation**(July 10)**Try with/without\n
while training word2vec, get numbers after validation**(July 10)**Understand Jiasen's word2vec_image code to know how to tweak word2vecSearch and setup tools for refining the neural network (22 June)(23 Aug)Cluster the visual vectors, refine trained CNN to classify into one of the clusters, get new word2vec features (26 June)(23 Aug)t-SNE embedding for the relations word to see the difference (20 Aug)Get the common sense task accuracies before and after training using cluster ids (28 Aug)- Setup clustering (kmeans, for now) into C (28 Aug)
- Setup common sense task (text features only) into C (28 Aug)
- Vary number of clusters to get different accuracies on the common sense task
- Fine tune from both MS COCO and Wiki datasets
- Different learning rates for inner and outer vectors
How to incorporate visual features in word2vec?
- Predicting cluster id: Algorithm:
- Cluster the abstract scene vectors into
N
clusters, associating each vector with a cluster id - Train the word2vec using a visual text (for example, MS COCO)
- Refine the network by replacing the last layer with
N
outputs, predicting the cluster id and back-propagating the error - Get the new word2vec representations of the words and evaluate on the common sense task
- Also visualize the words associated with clip art scenes through t-sne (or any other) before and after refining
- L1 regularized sparse SVM to learn the model (?)