Articles → NLP → One-Hot Encoding Using NLTK
One-Hot Encoding Using NLTK
Code
from sklearn.feature_extraction.text import CountVectorizer
# Example documents
documents = [
"I like NLP",
"I like machine learning",
"NLP is fun"
]
# Initialize CountVectorizer with binary=True
vectorizer = CountVectorizer(
lowercase=True,
stop_words='english',
binary=True # This makes it One-Hot Encoding
)
# Convert documents to one-hot matrix
X = vectorizer.fit_transform(documents)
# Vocabulary
print("Vocabulary:", vectorizer.get_feature_names_out())
# One-hot encoded matrix
print("\nOne-Hot Encoding Matrix:")
print(X.toarray())
Output
| Posted By - | Karan Gupta |
| |
| Posted On - | Tuesday, March 3, 2026 |