Articles → NLP → Tokenization In NLPTokenization In NLPThis article describes tokenization in NLP.What Is Tokenization? The process of breaking down the text into smaller units (tokens) is called tokenization. The tokens can be words, characters, or even sub-words.Tokenization Types NLP supports the following tokenization types.Tokenization TypePurposeWord tokenizerSplits a piece of text into individual words.Sentence tokenizerSplit the text into sentences.Blank line tokenizerSplits text into paragraphs based on blank lines.Regexp tokenizerSplits text into tokens using regular expression (regex) patterns.Sub-Word tokenizerSplit text into smaller units, larger than characters but smaller than words.Posted By - Karan Gupta Posted On - Tuesday, August 19, 2025 Updated On - Saturday, December 27, 2025 Query/Feedback Your Email Id** Subject* Query/Feedback Characters remaining 250**
Query/Feedback