Built a streaming wrapper around mlx-audio's whisper impl that emits word-level timestamps on a 200ms cadence. Runs at 3.8x realtime on M2 Pro. Useful for live captioning. Gotcha: the cross-attention for word alignment …