Softmax Policy Target

I discovered this afternoon that if you give a non zero policy training weight with data where the policy that doesn't add up to 1, the reg term goes absolutely berserk (I've seen reg losses of 5000).  think this happens because the net is trying to reach an impossible policy distribution. Would it be a significant slowdown to either re-normalize the policy target or to have a warning if the sum of your policy head isn't approximately 1?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Softmax Policy Target #128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Softmax Policy Target #128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions