Qwen launches FP8-weighted 3.5 Medium models, adding 27B and 122B variants on HuggingFace.
Photo by Possessed Photography on Unsplash
27 billion. That’s the size of the smallest new Qwen 3.5 Medium model, now released with FP8 weights on HuggingFace, alongside a 122 billion‑parameter sibling, reports indicate.
Quick Summary
- •27 billion. That’s the size of the smallest new Qwen 3.5 Medium model, now released with FP8 weights on HuggingFace, alongside a 122 billion‑parameter sibling, reports indicate.
- •Key company: Qwen
- •Also mentioned: HuggingFace
The FP8‑weighted variants of Qwen 3.5 Medium expand Alibaba’s open‑source LLM portfolio with two new checkpoints—one at 27 billion parameters and a larger 122 billion‑parameter model—both now hosted on HuggingFace (HuggingFace model listings, Qwen author). The 27 B model, identified as Qwen/Qwen3.5-27B-FP8, and the 122 B model, Qwen/Qwen3.5-122B-A10B-FP8, are tagged for “image‑text‑to‑text” and “conversational” pipelines, indicating they support multimodal inputs as well as pure dialogue use cases. Their release follows a brief rollout of a 35 B mixture‑of‑experts (MoE) checkpoint, Qwen/Qwen3.5-35B-A3B-FP8, also on HuggingFace, underscoring a systematic push to make FP8 quantization available across the medium‑size tier.
FP8 quantization, which stores model weights in an 8‑bit floating‑point format, promises a substantial reduction in memory footprint and inference latency without the accuracy loss typically associated with lower‑precision integer formats. In the accompanying HuggingFace posts, the Qwen team notes that “4‑bit weights are coming in the next couple of days,” suggesting a staged roadmap that will further compress the models for edge‑or‑on‑prem deployments (Qwen tweet, 2023‑10‑23). By publishing the models under an Apache‑2.0 license and marking them “endpoints_compatible,” Alibaba signals that developers can integrate the checkpoints directly into existing transformer pipelines, accelerating adoption in both research and production environments.
The timing of the release aligns with broader industry momentum around efficient LLM inference. VentureBeat’s coverage of Alibaba’s earlier Qwen 3‑235B‑A22B‑2507 model highlighted how the company’s open‑source offerings can “beat Kimi‑2” and deliver “low‑compute versions” for cost‑constrained users (VentureBeat, Carl Franzen). The new 27 B FP8 model, by contrast, targets a different segment: local‑machine experimentation and small‑scale commercial workloads that cannot afford the GPU memory demands of 100‑plus‑billion‑parameter models. The 122 B checkpoint, meanwhile, provides a high‑performance alternative for cloud providers seeking to balance scale with reduced hardware cost, leveraging FP8’s efficiency gains.
From a strategic standpoint, Alibaba’s decision to open‑source these medium‑size models reinforces its ambition to position Qwen as a viable competitor to Western offerings such as Meta’s Llama 3 and OpenAI’s GPT‑4. The company’s prior success with the Tongyi Qianwen chatbot (launched April 2023) demonstrated market appetite for Chinese‑language LLMs, and the current FP8 rollout extends that momentum into multimodal and enterprise‑grade scenarios. By making the models readily downloadable—though downloads remain at zero at the time of writing, reflecting their nascent status—the firm invites community benchmarking that could validate FP8’s claimed performance parity with higher‑precision variants.
Analysts will likely watch early adoption metrics, such as download counts and integration reports, to gauge whether the FP8‑quantized Qwen 3.5 Medium models achieve the promised trade‑off between efficiency and accuracy. If the forthcoming 4‑bit versions deliver comparable results, Alibaba could set a new baseline for ultra‑compact LLMs, potentially reshaping the economics of AI deployment for startups and large enterprises alike. For now, the 27 B and 122 B FP8 checkpoints represent the most concrete evidence of Alibaba’s commitment to democratizing high‑performance generative AI through open‑source channels.
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.