Distributional and Word Embedding Models

Word2Vec (W2V)

Word2Vec provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research.

Global Vectors (GloVe)

GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Explicit Semantic Analysis (ESA)

Explicit Semantic Analysis (ESA) is a vectorial representation of text (individual words or entire documents) that uses a document corpus as a knowledge base. Specifically, in ESA, a word is represented as a column vector in the tf–idf matrix of the text corpus and a document (string of words) is represented as the centroid of the vectors representing its words.

Dependency-Based Word Embeddings

The context elements are the syntactic contexts of the target word, rather than the words in a window around it.

Latent Semantic Analysis (LSA)

Latent Semantic Analysis (LSA) is an algorithm that uses a collection of documents to construct a semantic space. The algorithm constructs a word-by-document matrix where each row corresponds to a unique word in the document corpus and each column corresponds to a document. The value at each position is how many times the row’s word occurs in the column’s document. Then the Singular Value Decomposition is calculated for the word-document matrix to produce three matrices (UΣV), U – the wordspace, Σ – the singular values, and V – the document space. The columns of U are then truncated to a small number of dimensions (typically 300), which produces the final semantic vectors.

Positive Pointwise Mutual Information (PPMI)

The Positive Pointwise Mutual Information, which works well for measuring semantic similarity in the Term-Sentence-Matrix, is used in our method to assign weights for each entry in the Term-Sentence-Matrix. The Sentence-Rank-Matrix generated from this weighted TSM, is then used to extract a summary from the document.

ConceptNet Numberbatch

ConceptNet Numberbatch consists of state-of-the-art semantic vectors (also known as word embeddings) that can be used directly as a representation of word meanings or as a starting point for further machine learning. It is built using an ensemble that combines models which generates by PPMI, Word2Vec and GloVe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributional and Word Embedding Models

Word2Vec (W2V)

See also:

Global Vectors (GloVe)

See also:

Explicit Semantic Analysis (ESA)

See also:

Dependency-Based Word Embeddings

See also:

Latent Semantic Analysis (LSA)

See also:

Positive Pointwise Mutual Information (PPMI)

See also:

ConceptNet Numberbatch

See also:

Clone this wiki locally