Merge MuZero and Stochastic Muzero policies #381

Firerozes · 2025-07-07T18:40:22Z

Firerozes
Jul 7, 2025

The only theorical difference between MuZero and Stochastic MuZero resides in the afterstates and their estimation. Therefore, 99% of the code should be the same between these two.

Though, in their implementation in LightZero's policy folder, differences between muzero.py and stochastic_muzero.py were accumulated, mostly because MuZero was regularly updated, contrarly to the Stochastic version. The Stochastic version was designed using inheritance, but most of the code redefined in the inheritance is still the same than what MuZero's code was at the moment Stochastic MuZero was created.

To both solve this issue and prevent it from happen again, wouldn't it be better to globalize both classical and Stochastic MuZero in a single muzero.py file, with some kind of stochastic_variant=True property set to activate the Stochastic variant if the user wants it ?

If this convinces you, I can try to take care of this fusion. This would mean merging stochastic_muzero.py into muzero.py ; merging stochastic_muzero_model.py into muzero_model.py ; and merging game_buffer_stochastic_muzero.py into game_buffer_muzero.py.
I don't want to start working on this before having your blessing since :

I could be mistaking in my understanding of the algorithm.
This is quite a heavy change and I don't want to work for nothing.

I look forward to read you on the matter ! If any details or clarifications are needed feel free to reach out to me.

puyuan1996 · 2025-08-14T08:17:58Z

puyuan1996
Aug 14, 2025
Maintainer

Hello,

Thank you very much for your insightful observations and valuable suggestions.

Your analysis is spot on. Yes, in theory, the MuZero and Stochastic MuZero algorithms share a great deal of common code. From a code-reuse perspective, using a configuration option (like stochastic_variant=True) to switch between variants within a single set of files is a very logical approach.

However, our current decision to keep them separate is primarily driven by considerations for code readability and ease of use. If we were to merge them, we would need to introduce numerous conditional statements throughout several critical parts of the codebase (e.g., in the policy, the model, the search tree, buffer, etc.). This would significantly increase the complexity and reduce the readability of the code. For a user who wants to study or use one specific algorithm, the cost of understanding and debugging the code would become higher.

Therefore, to maintain the clarity, independence, and maintainability of each algorithm's implementation, our current recommendation is to keep them in separate files for the time being.

Thank you again for your deep thinking and proactive willingness to contribute! Your suggestions are incredibly valuable for the improvement of the project. We look forward to more discussions with you in the future.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge MuZero and Stochastic Muzero policies #381

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Merge MuZero and Stochastic Muzero policies #381

Uh oh!

Uh oh!

Firerozes Jul 7, 2025

Replies: 1 comment

Uh oh!

puyuan1996 Aug 14, 2025 Maintainer

Firerozes
Jul 7, 2025

puyuan1996
Aug 14, 2025
Maintainer