Multi Token Prediction in llama.cpp

24 May, 2026

The MTP PR got merged into llama.cpp a few days back. When I wrote it I did not expect it blow up like it did.

It is now the most "liked" PR in the llama.cpp repo. What bigger compliment can you expect when Georgi publicly compliments your work:

Screenshot from 2026-05-24 23-52-06

I agree with Georgi's sentiment- Qwen3.6-27B is truly a pivotal moment for local AI, it's a model that "just works". The Qwen team really cooked with this one.

Funnily at some point, running Qwen3.6-27B with the pi agent, I was dogfooding my own PR - Qwen3.6-27B suggesting changes to it's own code. For the first time I was able to offload a task to a local model with sufficient confidence in its abilities. In a future blog post I will write about my local AI coding stack using the pi extension I cooked up.