Skip to content

Commit

Permalink
Define block size for sm_120 (#6250)
Browse files Browse the repository at this point in the history
CUDA 12.8 introduces sm_120 that requires a reduced number of threads per sm

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #6250
  • Loading branch information
robertmaynard authored Jan 24, 2025
1 parent b1ac238 commit 2d45fa4
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2023-2024, NVIDIA CORPORATION.
* Copyright (c) 2023-2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -104,7 +104,7 @@ auto constexpr static const WARP_SIZE = index_type{32};
auto constexpr static const MAX_THREADS_PER_BLOCK = index_type{256};
#ifdef __CUDACC__
#if __CUDA_ARCH__ == 720 || __CUDA_ARCH__ == 750 || __CUDA_ARCH__ == 860 || \
__CUDA_ARCH__ == 870 || __CUDA_ARCH__ == 890
__CUDA_ARCH__ == 870 || __CUDA_ARCH__ == 890 || __CUDA_ARCH__ == 1200
auto constexpr static const MAX_THREADS_PER_SM = index_type{1024};
#else
auto constexpr static const MAX_THREADS_PER_SM = index_type{2048};
Expand Down

0 comments on commit 2d45fa4

Please sign in to comment.