Skip to content
Satwik Kottur edited this page Sep 1, 2015 · 17 revisions

Taskboard for VisualWord2Vec!

Code

  • Port the relationship modeling code to Github (5 June)
  • Read and understand both the modeling and training (MATLAB) code (12 June)
  • Clean up and document the training code to replicate results in the ICCV paper (15 June)
  • Install and setup Caffe framework (already setup) (12 June)
  • Remove irrelevant parts for the current work (19 June)
  • Generate the AP curve based on amount of training data (deferred)
  • Re-train word2vec on MS COCO dataset to account for tokenization/lemmatization/case issues (June 23)
  • Get results using new word2vec model after fixing best thresholds from validation**(July 10)**
  • Try with/without \n while training word2vec, get numbers after validation**(July 10)**
  • Understand Jiasen's word2vec_image code to know how to tweak word2vec
  • Search and setup tools for refining the neural network (22 June)(23 Aug)
  • Cluster the visual vectors, refine trained CNN to classify into one of the clusters, get new word2vec features (26 June)(23 Aug)
  • t-SNE embedding for the relations word to see the difference (20 Aug)
  • Get the common sense task accuracies before and after training using cluster ids (28 Aug)
  • Setup clustering (kmeans, for now) into C (28 Aug)
  • Setup common sense task (text features only) into C (28 Aug)
  • Vary number of clusters to get different accuracies on the common sense task
  • Fine tune from both MS COCO and Wiki datasets
  • Different learning rates for inner and outer vectors

Ideas

How to incorporate visual features in word2vec?

  1. Predicting cluster id: Algorithm:
  • Cluster the abstract scene vectors into N clusters, associating each vector with a cluster id
  • Train the word2vec using a visual text (for example, MS COCO)
  • Refine the network by replacing the last layer with N outputs, predicting the cluster id and back-propagating the error
  • Get the new word2vec representations of the words and evaluate on the common sense task
  • Also visualize the words associated with clip art scenes through t-sne (or any other) before and after refining
  1. L1 regularized sparse SVM to learn the model (?)

Resources

Clone this wiki locally