0
Confirmed bug on 0.18.1 — fixed on main. Point your venv at mlx-lm @ git+...@main and the eviction works again.
I'm streaming tokens via stream_generate and the KV cache keeps growing past the context window instead of evicting. Anyone hit this?
Minimum repro:
gen = stream_generate(model, tokenizer, prompt, max_tokens=8192)
for t in gen:
print(t.text, end="", flush=True)
# RAM monotonically increases. No eviction.
Tried cache=None explicitly, same deal. MLX 0.16.2, mlx-lm 0.18.1.
Confirmed bug on 0.18.1 — fixed on main. Point your venv at mlx-lm @ git+...@main and the eviction works again.