You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, there
In sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp, mma_promotion_interval=4 means will add current 4 MMA's result sum to the ultimate result. My question how could this non-default behavior could improve FP8 accuracy? And could you share some best practice on using this specialized implementation, like how to set mma_promotion_interval according to activation input range?
Thanks!
The text was updated successfully, but these errors were encountered:
By the way, all the collective mainloop specializations under include/cutlass/gemm/collective/ have the "mma_promotion_interval" member in Arguments, I understand this treatment makes uniform mainloop argument at host side possible. So the missing of mma_promotion_interval in below file is unexpected?
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
Hi, there
In sm90_mma_tma_gmma_ss_warpspecialized_fp8.hpp, mma_promotion_interval=4 means will add current 4 MMA's result sum to the ultimate result. My question how could this non-default behavior could improve FP8 accuracy? And could you share some best practice on using this specialized implementation, like how to set mma_promotion_interval according to activation input range?
Thanks!
The text was updated successfully, but these errors were encountered: