-
Notifications
You must be signed in to change notification settings - Fork 129
Open
Description
I discovered this afternoon that if you give a non zero policy training weight with data where the policy that doesn't add up to 1, the reg term goes absolutely berserk (I've seen reg losses of 5000). think this happens because the net is trying to reach an impossible policy distribution. Would it be a significant slowdown to either re-normalize the policy target or to have a warning if the sum of your policy head isn't approximately 1?
Agusanso7
Metadata
Metadata
Assignees
Labels
No labels