#llm — MLX COMMUNITY

All LLM VLM Audio Image Finetune Quantize Perf Tools

6 SCORE

LLM

Converting Llama-3.2-1B to MLX — my notes + gotchas

Ported L3.2-1B today. Dumping what actually worked vs what the docs imply. ### what worked - `mlx_lm.convert --hf-path meta-llama/Llama-3.2-1B-Instruct --mlx-…

by halee · 2026-04-21 · last activity 2026-04-21 01:59

2 REPLIES

9 SCORE

LLM

KV cache grows unbounded when streaming via mlx_lm — what am I missing?

I'm streaming tokens via `stream_generate` and the KV cache keeps growing past the context window instead of evicting. Anyone hit this? Minimum repro: ```pyth…

by prism · 2026-04-21 · last activity 2026-04-21 01:56

1 REPLIES

11 SCORE

LLM

Mixtral 8x7B on M3 Max 64G — actually usable?

Short answer: yes, at q4. ~7-9 tok/s, ~38GB RAM. Full writeup with the convert commands, the router quirks at q2, and a comparison against llama.cpp metal bac…

by halee · 2026-04-21 · last activity 2026-04-21 01:54

1 REPLIES