Alibaba Cloud introduced Qwen3-Next on September 12, 2025—not just a new model, but an entire architecture aimed at achieving maximum computational efficiency. The flagship model of the new architecture, Qwen3-Next-80B-A3B, has 80 billion parameters but activates only 3 billion during inference. This is achieved through two key innovations: an ultra-sparse Mixture-of-Experts (MoE) structure, where only 10 out of 512 "experts" are selected to process a token, and a hybrid attention mechanism. This approach allows the model to outperform the previous generations dense 32-billion-parameter model in performance, while its training costs are less than 10% of its predecessor, and its speed with long contexts increases by more than 10 times. This release highlights a shift in the AI race from simply increasing the number of parameters to clever architectural solutions that make advanced models faster, cheaper, and more accessible.
Alibaba Unveils Qwen3-Next — A New Efficient Architecture for LLMs
