-
Notifications
You must be signed in to change notification settings - Fork 108
Open
Description
Thank you for the clear and well-executed implementation.
Following up on this issue: #11
May I kindly ask why you chose to expand the token-mixing MLP while bottlenecking the channel-mixing MLP? Is there a particular reason behind this design, or is it simply because this setup provides the best performance?
Metadata
Metadata
Assignees
Labels
No labels