Tokenizer

See how text is broken into tokens — the atomic units that LLMs process. Each token maps to an integer ID in the model's vocabulary.

Input Text

Token Stats

Legend

Word start
Continuation
Punctuation
Whitespace

Tokenized Text

Click a preset to load an interesting configuration.