It uses a small language model to compute the entropy of the next byte in a sequence and then starts a new patch when the entropy increases; essentially, the small model is predicting the end of a ...