Qwen 3.5 “Medium” Series Launches MoE‑Enabled Model with 35B‑3.5B Params and 1M‑Token
Photo by Possessed Photography on Unsplash
Alibaba’s Qwen team launched the Qwen 3.5 “Medium” series, including a 35‑billion‑parameter MoE model (Qwen3.5‑35B‑A3B) that activates roughly 3 billion parameters per token and a 1‑million‑token context, reports indicate.
Quick Summary
- •Alibaba’s Qwen team launched the Qwen 3.5 “Medium” series, including a 35‑billion‑parameter MoE model (Qwen3.5‑35B‑A3B) that activates roughly 3 billion parameters per token and a 1‑million‑token context, reports indicate.
- •Key company: Qwen
- •Also mentioned: Alibaba
Alibaba’s Qwen team is pushing the frontier of mixture‑of‑experts (MoE) models with the new “Medium” series, a line that includes the flagship Qwen3.5‑35B‑A3B. The “A3B” suffix, as explained in a technical brief, signals that roughly three billion parameters are activated per token, meaning inference costs are far lower than a dense 35‑billion‑parameter run despite the model’s larger total footprint. The same brief notes that the 35B‑A3B variant already outperforms Alibaba’s earlier 235‑billion‑parameter MoE flagship (Qwen3‑235B‑A22B) on several key evaluation suites, a claim the company attributes to improvements in architecture, data curation, and reinforcement‑learning pipelines.
The series also rolls out a 122‑billion‑parameter MoE (Qwen3.5‑122B‑A10B), a 27‑billion dense model, and a production‑grade “Flash” variant that mirrors the 35B‑A3B’s capabilities while adding a one‑million‑token context window. CNBC reported that the Flash model is positioned as the answer to “long‑horizon agents,” bundling built‑in tool‑calling APIs with the massive context length—features that developers need for tasks such as codebase indexing, giant‑document summarisation, or multimodal memory retrieval.
Weight formats are being opened alongside the launch. A tweet from Alibaba’s AI team confirmed that FP8 weights for the Medium series are now publicly available, with 4‑bit quantised weights slated for release within days. This rapid rollout of low‑precision checkpoints is intended to ease deployment on commodity hardware, a practical concern highlighted by early adopters who have struggled with the VRAM overhead of MoE routing compared with dense models.
Industry observers see the 1 M‑token context as a strategic differentiator in the burgeoning Chinese chatbot race, where the focus is shifting from pure conversational fluency to autonomous agents capable of sustained reasoning. The CNBC piece underscores that Alibaba’s announcement arrives at a moment when rivals such as Baidu and Tencent are also courting the agent market, making the combination of MoE efficiency and ultra‑long context a potential competitive edge.
If the Qwen3.5‑35B‑A3B lives up to its benchmarks, it could redefine the cost‑performance calculus for midsize language models. By activating only a fraction of its parameters per token, the model promises dense‑level quality at MoE‑level compute, while the Flash variant offers the throughput needed for real‑time tool use. As the community begins to benchmark the new series against strong dense 30‑40 B models, the results will reveal whether Alibaba’s claim—that smarter architecture and data can trump sheer scale—holds true in practice.
Sources
No primary source found (coverage-based)
- Reddit - r/LocalLLaMA New
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.