[New model] Qwen3-next support #2917

wangxiyuan · 2025-09-14T06:45:27Z

What this PR does / why we need it?

Add Qwen3-next support.

Does this PR introduce any user-facing change?

Yes, users can use Qwen3 next.
Related doc: #2916 the tutorial will be ready in here

How was this patch tested?

Doc CI passed

Related: #2884

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@ff68035

github-actions · 2025-09-14T06:45:36Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for the new Qwen3-next model, which is a hybrid attention model. The changes are extensive, touching attention mechanisms, model runner logic, and adding new custom operators. While the implementation is comprehensive, I've identified several critical issues related to code duplication, performance bottlenecks in the new model's prefill implementation, and potential correctness issues due to hardcoded values. There are also some high-severity issues regarding dead code and missed performance optimizations. Addressing these points will significantly improve the robustness and performance of the new model support.

vllm_ascend/models/qwen3_next.py

vllm_ascend/patch/worker/patch_common/patch_block_table.py

vllm_ascend/models/qwen3_next.py

Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: Your Name <you@example.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

vllm_ascend/models/qwen3_next.py

Angazenn · 2025-09-14T09:34:42Z

vllm_ascend/models/qwen3_next.py

+        #     print(f"self.layer_idx: {self.layer_idx}, 111 mixed_qkv_non_spec: {mixed_qkv_non_spec}")
+
+        # 2.1: process the mutli-query part
+        # if spec_sequence_masks is not None:


spec_sequence_masks is not None is used to choose non-MTP branch. Since MTP for qwen3 next is not supported on npu now, maybe we can check it here.

vllm_ascend/worker/npu_input_batch.py

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

github-actions bot added module:ops module:core labels Sep 14, 2025

gemini-code-assist bot reviewed Sep 14, 2025

View reviewed changes

vllm_ascend/models/qwen3_next.py Show resolved Hide resolved

vllm_ascend/patch/worker/patch_common/patch_block_table.py Show resolved Hide resolved

vllm_ascend/models/qwen3_next.py Outdated Show resolved Hide resolved

wangxiyuan force-pushed the qwen3-next branch from 271794a to 73a8e05 Compare September 14, 2025 08:02

MengqingCao and others added 17 commits September 14, 2025 16:14

[hybrid kv] init support of hybrid kv

2a21f14

Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

update

90dc42f

Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

init kv cache pass

5b81e48

Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

update

c8b02fc

Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

GDN support

82c472e

Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Update qwen3 moe

4fe8cf9

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

fix registry bugs && remove print

648a3fc

Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

repeat qk

ea89e5d

Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

fix recurrent_gated_delta_rule

9832095

Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

fix

47ef6e6

Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

fix seq_lens

fd3e0dd

Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

local

0a2beb4

Signed-off-by: Your Name <you@example.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

local 2

f62dda3

Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

local 3

98a3dbb

Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

add vllm patch

4c9d94f

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

fix may_reinitialize_input_batch bug

c0b8af7

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

revert unnecessary

c78e773

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

wangxiyuan force-pushed the qwen3-next branch from 626d6b8 to c78e773 Compare September 14, 2025 08:28

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 14, 2025

wangxiyuan added 2 commits September 14, 2025 17:38

fix lint

8214f81

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

fix lint

f57e114

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

vllm-project deleted a comment from gemini-code-assist bot Sep 14, 2025

Angazenn reviewed Sep 14, 2025

View reviewed changes

Yikun reviewed Sep 14, 2025

View reviewed changes

vllm_ascend/worker/npu_input_batch.py Show resolved Hide resolved

fix attention get_supported_block_size error

b25d549

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Yikun added ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Sep 14, 2025

fix kv cache blcok shape

384f36c

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>