|
1 | 1 | # Release note
|
2 | 2 |
|
| 3 | +## v0.10.2rc1 - 2025.09.15 |
| 4 | + |
| 5 | +This is the 1st release candidate of v0.10.2 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started. |
| 6 | + |
| 7 | +### Highlights |
| 8 | + |
| 9 | +- Add support for Qwen3 Next. Please note that expert parallel and MTP feature doesn't work with this release. The server may crash in some case as well. We'll make it stable enough soon. Follow the [official guide](https://vllm-ascend.readthedocs.io/en/latest/tutorials/multi_npu_qwen3_next.html) to get start [#2917](https://github.com/vllm-project/vllm-ascend/pull/2917) |
| 10 | +- Add quantization support for aclgraph [#2841](https://github.com/vllm-project/vllm-ascend/pull/2841) |
| 11 | + |
| 12 | +### Core |
| 13 | + |
| 14 | +- Aclgraph now works with Ray backend. [#2589](https://github.com/vllm-project/vllm-ascend/pull/2589) |
| 15 | +- MTP now works with the token > 1. [#2708](https://github.com/vllm-project/vllm-ascend/pull/2708) |
| 16 | +- Qwen2.5 VL now works with quantization. [#2778](https://github.com/vllm-project/vllm-ascend/pull/2778) |
| 17 | +- Improved the performance with async scheduler enabled. [#2783](https://github.com/vllm-project/vllm-ascend/pull/2783) |
| 18 | +- Fixed the performance regression with non MLA model when use default scheduler. [#2894](https://github.com/vllm-project/vllm-ascend/pull/2894) |
| 19 | + |
| 20 | +### Other |
| 21 | +- The performance of w8a8 quantization is improved. [#2275](https://github.com/vllm-project/vllm-ascend/pull/2275) |
| 22 | +- The performance of moe model is improved. [#2689](https://github.com/vllm-project/vllm-ascend/pull/2689) [#2842](https://github.com/vllm-project/vllm-ascend/pull/2842) |
| 23 | +- Fixed resources limit error when apply speculative decoding and aclgraph. [#2472](https://github.com/vllm-project/vllm-ascend/pull/2472) |
| 24 | +- Fixed the git config error in docker images. [#2746](https://github.com/vllm-project/vllm-ascend/pull/2746) |
| 25 | +- Fixed the sliding windows attention bug with prefill. [#2758](https://github.com/vllm-project/vllm-ascend/pull/2758) |
| 26 | +- The official doc for Prefill Decode Disaggregation with Qwen3 is added. [#2751](https://github.com/vllm-project/vllm-ascend/pull/2751) |
| 27 | +- `VLLM_ENABLE_FUSED_EXPERTS_ALLGATHER_EP` env works again. [#2740](https://github.com/vllm-project/vllm-ascend/pull/2740) |
| 28 | +- A new improvement for oproj in deepseek is added. Set `oproj_tensor_parallel_size` to enable this feature[#2167](https://github.com/vllm-project/vllm-ascend/pull/2167) |
| 29 | +- Fix a bug that deepseek with torchair doesn't work as expect when `graph_batch_sizes` is set. [#2760](https://github.com/vllm-project/vllm-ascend/pull/2760) |
| 30 | +- Avoid duplicate generation of sin_cos_cache in rope when kv_seqlen > 4k. [#2744](https://github.com/vllm-project/vllm-ascend/pull/2744) |
| 31 | +- The performance of Qwen3 dense model is improved with flashcomm_v1. Set `VLLM_ASCEND_ENABLE_DENSE_OPTIMIZE=1` and `VLLM_ASCEND_ENABLE_FLASHCOMM=1` to enable it. [#2779](https://github.com/vllm-project/vllm-ascend/pull/2779) |
| 32 | +- The performance of Qwen3 dense model is improved with prefetch feature. Set `VLLM_ASCEND_ENABLE_PREFETCH_MLP=1` to enable it. [#2816](https://github.com/vllm-project/vllm-ascend/pull/2816) |
| 33 | +- The performance of Qwen3 MoE model is improved with rope ops update. [#2571](https://github.com/vllm-project/vllm-ascend/pull/2571) |
| 34 | +- Fix the weight load error for RLHF case. [#2756](https://github.com/vllm-project/vllm-ascend/pull/2756) |
| 35 | +- Add warm_up_atb step to speed up the inference. [#2823](https://github.com/vllm-project/vllm-ascend/pull/2823) |
| 36 | +- Fixed the aclgraph steam error for moe model. [#2827](https://github.com/vllm-project/vllm-ascend/pull/2827) |
| 37 | + |
3 | 38 | ## v0.10.1rc1 - 2025.09.04
|
4 | 39 |
|
5 | 40 | This is the 1st release candidate of v0.10.1 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started.
|
|
0 commit comments