-
Notifications
You must be signed in to change notification settings - Fork 2.8k
add fa4 to sgl-kernel and support using fa4 on deepseek on blackwell #9928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0742a4c
to
49b9d15
Compare
Launch Command
Benchmark Command
Time to First Token (TTFT)
|
unit test
|
/gemini |
It looks like you tried to invoke a command with
If you intended to ask a question or provide a comment, you can simply type your message without a command. |
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for FlashAttention v4, a significant enhancement for models running on newer GPU architectures. The implementation is well-structured, integrating FA4 as a versioned backend alongside FA3. The changes are guarded to ensure that only supported features are used, which is a good practice for incremental development. The addition of a comprehensive test suite for the FA4 interface is particularly commendable as it provides strong evidence of correctness. My review found a minor code duplication issue in a test file that could be addressed to improve maintainability.
3421118
to
94ff293
Compare
This is an initial PR for FA4 backend.
ver
parameter.Based on the current interface compatibility of Flash Attention 4,I chose to share the implementation with the FA3 attention backend.