Articles → NLP → TF-IDF In NLP

TF-IDF In NLP






Purpose







Terms Frequency




Picture showing the formula to calculate the term frequency



Inverse Document Frequency (IDF)




Picture showing the formula to calculate inverse document frequency



TF-IDF Weight


Picture showing the formula of calculating TF-IDF weight











Example




D1: "NLP is fun and exciting"
D2: "NLP is a branch of AI"
D3: "AI and ML are related"




WordsNumber Of Documents In Which The Term AppearsIDFComments
NLP2log(3/2) = 0.18lower
AI2log(3/2) = 0.18lower
ML1log(3/1) = 0.48higher
exciting1log(3/1) = 0.48higher
fun1log(3/1) = 0.48higher
branch1log(3/1) = 0.48higher
related1log(3/1) = 0.48higher
is, and2log(3/2) = 0.18lower



Example


from sklearn.feature_extraction.text import TfidfVectorizer

# Sample documents
docs = [
    "NLP is fun and exciting",
    "NLP is a branch of AI",
    "AI and ML are related"
]

# Create TF-IDF Vectorizer
vectorizer = TfidfVectorizer()

# Fit and transform
X = vectorizer.fit_transform(docs)

# Get feature names (vocabulary)
features = vectorizer.get_feature_names_out()

# Convert to array for readability
tfidf_matrix = X.toarray()

# Print results
import pandas as pd
df = pd.DataFrame(tfidf_matrix, columns=features)
print(df)



Output


Picture showing the output of TF-IDF In NLP





Posted By  -  Karan Gupta
 
Posted On  -  Tuesday, September 16, 2025

Query/Feedback


Your Email Id
 
Subject
 
Query/FeedbackCharacters remaining 250