Tokenization

How text is split into tokens for language models, including tokenizer design, vocabulary construction, byte-pair encoding, and the downstream effects of tokenization choices on model behavior and output.

Reading List