Vector Embeddings: 101

What is a vector embedding ?

Vector embedding is a process of allocating elements in a multi-dimensional vector space where elements of similar attributes are located in close proximity.

Here is a visual representation of the concept using a word vector : 

All the gardening related words are co-located, as are the words related to property. Sun is related to gardening and solar panels, thus it is in close proximity to both the groups.

The embeddings are created using the neural network models and help immensely in capturing the intricate correlations between the elements. Vector embedding can be performed on various entities –  words, phrases, documents, audio, images, videos, user profiles etc.

In a vector embedding of words, the words are converted to a numerical representation. For the sake of simplicity, suppose we are representing the words on a 6-dimensional vector space, the words numerical vector representations would look like :

The different dimensions represent the various features or properties of the entities.

Here are the commonly used embedding models : 

word2Vec : As the name suggests, this model embeds each word present in the input. This model uses one of the two architectures :

  • Continuous Bag of Words (CBOW) : Under this architecture, the model is provided a context and predicts the most likely word.
  • Skip-gram : In Skip-gram architecture, the model is provided a set of words and predicts the context around them.

GloVe : GloVe stands for Global Vectors. This model is trained on global co-occurrence from a corpus and local context window methods. GloVe performs better when the datasets are diverse.

FastText : FastText works by breaking a word into multiple sub-words and attempts to understand the morphology of words. FastText is useful to embed unknown or unseen words. FastText is known for its excellent processing efficiency as it uses hierarchical classifier.

BERT : BERT is Bidirectional Encoder Representations from Transformers. Context of a word is crucial in BERT and it creates contextual embeddings. BERT creates different outputs for the same word when used in different contexts.

Published by: Urvashi Poonia