Tokenizer

See how text is broken into tokens — the atomic units that LLMs process. Each token maps to an integer ID in the model's vocabulary.

Input Text

Token Stats

Selected Token

Click a token to inspect it.

Legend

Word start
Continuation
Punctuation
Whitespace

Tokenized Text

Click a preset to load an interesting configuration.