Skip to content

Conversation

vladmandic
Copy link
Contributor

@vladmandic vladmandic commented Sep 18, 2025

wan transformer block creates scale_shift_table on cpu and then adds it regardless of where temb tensor actually resides
and this causes typical cpu-vs-cuda device mismatch

│  473 │   │   │   shift_msa, scale_msa, gate_msa, c_shift_msa, c_scale_msa, c_gate_msa = (                                                                                                                                                                                                                                                                                                                                        │
│❱ 474 │   │   │   │   self.scale_shift_table + temb.float()                                                                                                                                                                                                                                                                                                                                                                       │
│  475 │   │   │   ).chunk(6, dim=1)                                                                                                                                                                                                                                                                                                                                                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

cc @sayakpaul @yiyixuxu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant