Skip to content

Conversation

cmhhelgeson
Copy link
Contributor

Description

Creates an add-on that encapsulates the bitonic sort functionality present in the webgpu_compute_sort_bitonic example. Currently only handles scalar inputs because I'm uncertain whether TSL currently emulates the boolean vector functionality of GLSL. I've also removed the timestamps from the bitonic sort example since they aren't really informative when presented at such high speed, and one can already perceive that the new encapsulated bitonic sort takes less dispatches than the previous local sort and the global sort only example.

@cmhhelgeson cmhhelgeson marked this pull request as draft September 8, 2025 05:32
@cmhhelgeson
Copy link
Contributor Author

cmhhelgeson commented Sep 8, 2025

The actual sort itself is done and works as expected for multiple data types and counts, but I'm leaving this in draft for now till its more thoroughly reviewed. As it's the first class of its type (encapsulated GPGPU sort/operation) I would like to ensure that the maintainers agree on the class structure and documentation before it goes in, as it could inform how contributors implement/structure future encapsulated GPGPU operations in the future. Obviously this is WebGPURenderer only, and the documentation should likely change to expose this more transparently to users ( class BitonicSort should also be renamed class BitonicSortGPU ).

There are also improvements that can be made to the class that may or may not be considered blocking such as:

  • Ping-Pong Data Buffers: We can ping-pong the dataBuffer and the tempBuffer (input and output buffers) between global sort steps. This could improve performance by cutting back on compute dispatches. A sort would at most need only one alignment dispatch for an in-place sort if the final global op moved data from the dataBuffer to the tempBuffer.
  • Multiple Comparison Options: The user should be able to specify a reverse sort (i.e a swap executes on a greaterThan rather than a lessThan)
  • Side Effects: Though the sort may primarily operate on a single buffer, the result of the sort might drive how multiple other buffers are reassigned. For instance, a sort that takes a series of linearized indices of a particle's location within a 3-dimensional spatial hash grid may need to both sort those indices and reassign particles within the particle buffer based on the sorted indices. The user could manage this themselves, or the bitonicSortModule could internally manage this as a sideEffect of the sort, possibly by swapping the particles alongside the index swap or by implementing a different strategy.
  • Subgroup Optimizations: I'm investigating potential optimizations using the recently implemented subgroup functionality. This article describes some potential optimizations using subgroups: https://winwang.blog/posts/bitonic-sort/ but I am still investigating other valid optimizations.

@cmhhelgeson cmhhelgeson marked this pull request as ready for review September 9, 2025 20:34
@cmhhelgeson
Copy link
Contributor Author

Is there anything blocking this PR. If possible, I would like to use it for performance optimizations within the compute bird sample, but would like the class structure to be reviewed for the reasons stated above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant