[RFC] Could you please provide the latest sample code for using the latest `parallel_nsa` function? #2

Kyfafyd · 2025-02-24T07:28:26Z

Proposal

R.T.

Rationale

No response

Kyfafyd · 2025-02-24T09:36:33Z

Also, what about the usage for naive_nsa_with_compression and parallel_nsa_with_compression?

Hanyuezhuohua · 2025-02-24T15:10:28Z

Thank you for your interest in our project. The latest sample code for using the latest parallel_nsa function has been provided, including both selected and sliding attention. You can directly try it following our instructions.

For the nsa_with_compression function, we will further integrate the compressed branch and online top-k selection into our kernel. However, this function is still under development for the parallel version.

Kyfafyd · 2025-02-25T06:28:52Z

Thanks for your response!
I would like to learn if nsa kernel can be applied for attention with 48 heads, which is not a power of 2.
Additionally, how to provide the input for g_slc, g_swa, block_indices, and block_counts in the attention computation of a vision transformer?

Hanyuezhuohua · 2025-02-25T23:59:43Z

The NSA kernel can be applied to attention with 48 query heads and 3 key-value heads. Since the query heads are grouped by key-value head, each group consists of 16 query heads.

In addition, our new nsa_with_compression function allows you to obtain block indices. Specifically, g_slc and g_swa are computed by passing the input through an MLP module with a sigmoid activation. The block_counts parameter is a user-defined constant that controls the sparsity ratio.

Kyfafyd added the enhancement New feature or request label Feb 24, 2025

Kyfafyd changed the title ~~[RFC] Could you please provide the latest code for using parallel_nsa function?~~ [RFC] Could you please provide the latest sample code for using the latest parallel_nsa function? Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Could you please provide the latest sample code for using the latest `parallel_nsa` function? #2

[RFC] Could you please provide the latest sample code for using the latest `parallel_nsa` function? #2

Kyfafyd commented Feb 24, 2025

Kyfafyd commented Feb 24, 2025

Hanyuezhuohua commented Feb 24, 2025

Kyfafyd commented Feb 25, 2025

Hanyuezhuohua commented Feb 25, 2025

[RFC] Could you please provide the latest sample code for using the latest parallel_nsa function? #2

[RFC] Could you please provide the latest sample code for using the latest parallel_nsa function? #2

Comments

Kyfafyd commented Feb 24, 2025

Proposal

Rationale

Kyfafyd commented Feb 24, 2025

Hanyuezhuohua commented Feb 24, 2025

Kyfafyd commented Feb 25, 2025

Hanyuezhuohua commented Feb 25, 2025

[RFC] Could you please provide the latest sample code for using the latest `parallel_nsa` function? #2

[RFC] Could you please provide the latest sample code for using the latest `parallel_nsa` function? #2