May 30 — Xiaomi published end‑to‑end inference optimizations for the MiMo‑V2.5 series. The firm said its Hybrid SWA+MoE+multimodal architecture was paired with a systematic rebuild of the inference stack — KVCache management, tiered and prefix cachin

2026-05-30

May 30 — Xiaomi published end‑to‑end inference optimizations for the MiMo‑V2.5 series. The firm said its Hybrid SWA+MoE+multimodal architecture was paired with a systematic rebuild of the inference stack — KVCache management, tiered and prefix caching, scheduling and the prefill/decode pipeline. Xiaomi reported KVCache storage is compressed to roughly 1/7 of comparable solutions, materially lowering inference cost in long‑sequence scenarios and providing the technical basis for the price cut. Xiaomi permanently cut MiMo‑V2.5 API prices on May 27, with discounts up to 99% and no differentiation by input length.