-
Notifications
You must be signed in to change notification settings - Fork 433
[New model] Qwen3-next support #2917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the new Qwen3-next model, which is a hybrid attention model. The changes are extensive, touching attention mechanisms, model runner logic, and adding new custom operators. While the implementation is comprehensive, I've identified several critical issues related to code duplication, performance bottlenecks in the new model's prefill implementation, and potential correctness issues due to hardcoded values. There are also some high-severity issues regarding dead code and missed performance optimizations. Addressing these points will significantly improve the robustness and performance of the new model support.
271794a
to
73a8e05
Compare
Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Angazenn <supperccell@163.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
626d6b8
to
c78e773
Compare
# print(f"self.layer_idx: {self.layer_idx}, 111 mixed_qkv_non_spec: {mixed_qkv_non_spec}") | ||
|
||
# 2.1: process the mutli-query part | ||
# if spec_sequence_masks is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spec_sequence_masks is not None
is used to choose non-MTP branch. Since MTP for qwen3 next is not supported on npu now, maybe we can check it here.
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
b46d843
to
b4a3566
Compare
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
35e027f
to
706348d
Compare
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
3f6e838
to
81bb245
Compare
What this PR does / why we need it?
Add Qwen3-next support.
Does this PR introduce any user-facing change?
Yes, users can use Qwen3 next.
Related doc: #2916 the tutorial will be ready in here
How was this patch tested?
Doc CI passed
Related: #2884