Articles → NLP → TF-IDF In NLP
TF-IDF In NLP
Purpose
Terms Frequency
Inverse Document Frequency (IDF)
TF-IDF Weight
Example
D1: "NLP is fun and exciting"
D2: "NLP is a branch of AI"
D3: "AI and ML are related"
Words | Number Of Documents In Which The Term Appears | IDF | Comments |
---|
NLP | 2 | log(3/2) = 0.18 | lower |
AI | 2 | log(3/2) = 0.18 | lower |
ML | 1 | log(3/1) = 0.48 | higher |
exciting | 1 | log(3/1) = 0.48 | higher |
fun | 1 | log(3/1) = 0.48 | higher |
branch | 1 | log(3/1) = 0.48 | higher |
related | 1 | log(3/1) = 0.48 | higher |
is, and | 2 | log(3/2) = 0.18 | lower |
Example
from sklearn.feature_extraction.text import TfidfVectorizer
# Sample documents
docs = [
"NLP is fun and exciting",
"NLP is a branch of AI",
"AI and ML are related"
]
# Create TF-IDF Vectorizer
vectorizer = TfidfVectorizer()
# Fit and transform
X = vectorizer.fit_transform(docs)
# Get feature names (vocabulary)
features = vectorizer.get_feature_names_out()
# Convert to array for readability
tfidf_matrix = X.toarray()
# Print results
import pandas as pd
df = pd.DataFrame(tfidf_matrix, columns=features)
print(df)
Output
Posted By - | Karan Gupta |
|
Posted On - | Tuesday, September 16, 2025 |