[BugFix] Fix glm 4.5 moe accuracy bug. #2898

whx-sjtu · 2025-09-12T11:03:46Z

Fix glm 4.5 moe accuracy bug.
This is only a temporary PR used for quick run. Later I will make a PR to vllm to solve this problem formally.

vLLM version: v0.10.2
vLLM main: vllm-project/vllm@8e5cdcd

gemini-code-assist

Code Review

This pull request aims to fix an accuracy bug in the GLM-4.5 MoE model. The changes involve refactoring the MoE dispatcher selection, registering a new custom GLM-4 MoE model for Ascend, and modifying the MoE forward pass logic. While the intent is to fix an accuracy issue, I've found a critical bug in the new CustomGlm4MoE forward pass related to incorrect tensor parallel reductions which will likely cause its own accuracy problems. Additionally, there's a potential for a KeyError crash in the new dispatcher selection logic. My review includes suggestions to address these critical issues.

gemini-code-assist · 2025-09-12T11:05:54Z

vllm_ascend/ascend_forward_context.py

+        dispatcher_name = _moe_method_to_dispatcher[moe_comm_method]
        dispatcher = get_token_dispatcher(dispatcher_name)


The direct dictionary access _moe_method_to_dispatcher[moe_comm_method] is not robust. If moe_comm_method contains a value that is not a key in _moe_method_to_dispatcher, this will raise a KeyError and crash the program. It's safer to use .get() and handle the case of an invalid method by raising a more informative ValueError.

Suggested change

dispatcher_name = _moe_method_to_dispatcher[moe_comm_method]

dispatcher = get_token_dispatcher(dispatcher_name)

dispatcher_name = _moe_method_to_dispatcher.get(moe_comm_method)

if dispatcher_name is None:

raise ValueError(

f"Unknown MoE communication method: {moe_comm_method}. "

f"Available methods: {list(_moe_method_to_dispatcher.keys())}")

dispatcher = get_token_dispatcher(dispatcher_name)

gemini-code-assist · 2025-09-12T11:05:54Z

vllm_ascend/models/glm4_moe.py

+            if moe_comm_method_name in {"alltoallcommimpl", "mc2commimpl"}:
+                shared_output = tensor_model_parallel_all_reduce(shared_output)
+            final_hidden_states = final_hidden_states + shared_output


There appears to be a tensor distribution mismatch that will likely lead to incorrect results. shared_output is a tensor partitioned by tokens across tensor-parallel ranks. The tensor_model_parallel_all_reduce(shared_output) call incorrectly makes it a replicated tensor. This replicated tensor is then added to final_hidden_states (the output of self.experts), which is also a partitioned tensor. This addition will perform broadcasting instead of the intended element-wise addition of corresponding token partitions, causing an accuracy issue. The all_reduce operation should be performed on the combined result, not on shared_output alone. The subsequent call to maybe_all_reduce_tensor_model_parallel should handle the necessary reduction.

final_hidden_states = final_hidden_states + shared_output

github-actions · 2025-09-12T11:07:05Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

weijinqian0 · 2025-09-14T13:21:10Z

approved

Signed-off-by: whx-sjtu <2952154980@qq.com>

github-actions · 2025-09-15T17:19:59Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions bot added module:ops module:core labels Sep 12, 2025

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

whx-sjtu force-pushed the fix_glm_accu branch from 11dea05 to dc20609 Compare September 12, 2025 11:15

Yikun mentioned this pull request Sep 12, 2025

[Release]: Release checklist for v0.10.2rc1 #2859

Open

42 tasks

fix glm accu bug

0ce5bb5

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the fix_glm_accu branch from dc20609 to 0ce5bb5 Compare September 15, 2025 01:57

This was referenced Sep 12, 2025

[Bug]: GLM-4.5 Accuracy Issue with DP+EP #2767

Open

[Bug]: v0.10.0部署GLM-4.5模型精度存在问题 #2537

Open

github-actions bot added the merge-conflicts label Sep 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fix glm 4.5 moe accuracy bug. #2898

[BugFix] Fix glm 4.5 moe accuracy bug. #2898

whx-sjtu commented Sep 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

weijinqian0 commented Sep 14, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

Uh oh!

		dispatcher_name = _moe_method_to_dispatcher[moe_comm_method]
		dispatcher = get_token_dispatcher(dispatcher_name)

-        dispatcher_name = _moe_method_to_dispatcher[moe_comm_method]
-        dispatcher = get_token_dispatcher(dispatcher_name)
+        dispatcher_name = _moe_method_to_dispatcher.get(moe_comm_method)
+        if dispatcher_name is None:
+            raise ValueError(
+                f"Unknown MoE communication method: {moe_comm_method}. "
+                f"Available methods: {list(_moe_method_to_dispatcher.keys())}")
+        dispatcher = get_token_dispatcher(dispatcher_name)

[BugFix] Fix glm 4.5 moe accuracy bug. #2898

Are you sure you want to change the base?

[BugFix] Fix glm 4.5 moe accuracy bug. #2898

Conversation

whx-sjtu commented Sep 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

weijinqian0 commented Sep 14, 2025

Uh oh!

github-actions bot commented Sep 15, 2025

Uh oh!

Uh oh!

whx-sjtu commented Sep 12, 2025 •

edited by github-actions bot

Loading