Articles → NLP → Word Embeddings In NLP

Word Embeddings In NLP

This article describes word embeddings in NLP.

What Is A Vector?

In mathematics, a vector is an ordered list of numbers. For example.

2D vector

(3,4)

3D vector

(1, -2, 5)

How similarity is measured in vector?

Vector Direction of Two Vectors	It Means
Same	High value
Perpendicular	Zero
180 degrees	Negative

What Are N-Dimensional Vectors?

The n-dimensional vectors are mathematical representations of words that capture their meaning, context, and relationships in a numerical format.

It means how many numbers are used to represent each word.

What Is The Dense Vector?

A dense vector is a vector (list of numbers) in which most or all elements have meaningful non-zero values. In NLP, these numbers capture the semantic meaning of a word — that is, how it relates to other words in context.

What Are Word Embeddings?

Word embeddings transform words into dense vectors of real numbers, typically in an n-dimensional space (where n might be 50, 100, 300, or even 768+ depending on the model).

Example

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from collections import defaultdict
import numpy as np

# Download required resources
nltk.download('punkt')
nltk.download('stopwords')

# Sample text
text = "The cat sat on the mat. The dog barked at the cat. The cat ran away."

# Tokenize and clean
tokens = word_tokenize(text.lower())
tokens = [word for word in tokens if word.isalpha() and word not in stopwords.words('english')]

# Build vocabulary
vocab = list(set(tokens))
vocab_index = {word: i for i, word in enumerate(vocab)}

# Create co-occurrence matrix
window_size = 2
co_matrix = np.zeros((len(vocab), len(vocab)))

for i, word in enumerate(tokens):
    word_idx = vocab_index[word]
    for j in range(max(0, i - window_size), min(len(tokens), i + window_size + 1)):
        if i != j:
            neighbor = tokens[j]
            neighbor_idx = vocab_index[neighbor]
            co_matrix[word_idx][neighbor_idx] += 1

# Display matrix
print("Vocabulary:", vocab)
print("Co-occurrence Matrix:\n", co_matrix)

Output

Posted By -	Karan Gupta

Posted On -	Friday, November 21, 2025

Updated On -	Friday, January 9, 2026

Query/Feedback

Your Email Id		**

Subject		*

Query/Feedback	Characters remaining 250	**