Hugging Face launches three Qwen 3.5 models—27B FP8, 35B A3B FP8, and 122B A10B FP8.
Photo by Helena Lopes (unsplash.com/@helenalopesph) on Unsplash
27 billion. That’s the parameter count of Hugging Face’s newest Qwen 3.5 model, launched together with 35 billion and 122 billion‑parameter variants, all using FP8 precision, according to a recent report.
Quick Summary
- •27 billion. That’s the parameter count of Hugging Face’s newest Qwen 3.5 model, launched together with 35 billion and 122 billion‑parameter variants, all using FP8 precision, according to a recent report.
- •Key company: Qwen
- •Also mentioned: HuggingFace
Hugging Face’s release of the Qwen 3.5 family marks the first public rollout of large‑scale FP8‑precision models on the platform, a move that underscores the growing emphasis on efficiency as model sizes climb. The three variants—Qwen 3.5‑27B‑FP8, Qwen 3.5‑35B‑A3B‑FP8 and Qwen 3.5‑122B‑A10B‑FP8—were all uploaded to the Hugging Face Model Hub on the same day, each tagged with “fp8” and “image‑text‑to‑text” pipelines (Hugging Face model pages, Qwen). The 27‑billion‑parameter model is the baseline, while the 35‑billion and 122‑billion versions employ mixture‑of‑experts (MoE) architectures—A3B and A10B respectively—allowing them to scale parameter counts without a proportional increase in compute cost. All three are released under the Apache‑2.0 license, making them immediately usable for commercial and research applications.
The choice of FP8 (8‑bit floating point) is notable because it pushes the frontier of quantization beyond the more common FP16 or INT8 schemes. According to the model metadata, the FP8 format preserves the dynamic range needed for both language and multimodal tasks while cutting memory footprints by roughly half compared to FP16. This reduction translates into lower inference latency and cheaper deployment on commodity GPUs, a benefit that aligns with Alibaba’s broader strategy of democratizing AI agents in China. CNBC reported that Alibaba’s unveiling of Qwen 3.5 signals a shift in the Chinese chatbot market from pure conversational bots toward more capable AI agents that can handle image‑text inputs (CNBC). By open‑sourcing the models, Alibaba can leverage the global developer ecosystem that congregates on Hugging Face, accelerating adoption and feedback loops.
From a technical standpoint, the MoE variants introduce expert routing layers that activate only a subset of the model’s parameters per token. The 35B‑A3B model activates three experts per token, while the 122B‑A10B model activates ten, as indicated by the “A3B” and “A10B” suffixes in the model names. This design enables the larger model to maintain inference speed comparable to the smaller baseline despite its 122‑billion‑parameter scale. The “safetensors” format used for all three releases further streamlines loading and reduces the risk of corrupted weights, a practical consideration for large‑scale deployments (Hugging Face model pages, Qwen).
Hugging Face’s platform metrics show that each model has garnered a single “like” and zero downloads so far, reflecting the early stage of community uptake. However, the presence of the “endpoints_compatible” tag suggests that the models are ready for deployment via Hugging Face Inference Endpoints, allowing enterprises to spin up API services without managing the underlying infrastructure. This aligns with the broader industry trend of offering AI as a service, where cost‑effective inference is a decisive factor. By providing FP8‑optimized models, Hugging Face positions itself as a conduit for the next wave of high‑performance, low‑cost AI services.
The strategic implications extend beyond Alibaba’s domestic market. The open‑source release invites competition from other AI leaders—OpenAI, Anthropic, and Google’s DeepMind—all of which are racing to deliver larger, more capable models while keeping operational expenses in check. The Qwen 3.5 family demonstrates that Chinese AI research can match the scale of Western counterparts and, through quantization, potentially outpace them on efficiency metrics. As enterprises evaluate large language models for production workloads, the combination of massive parameter counts, MoE scaling, and FP8 precision could become a decisive differentiator, especially for multimodal applications that require both text and image understanding.
Sources
This article was created using AI technology and reviewed by the SectorHQ editorial team for accuracy and quality.