See README.md on how to build the hipCUB documentation using Doxygen.
- Benchmarks for
BlockShuffle
,BlockLoad
, andBlockStore
.
- CUB backend references CUB and Thrust version 1.17.2.
- Improved benchmark coverage of
BlockScan
by addingExclusiveScan
, benchmark coverage ofBlockRadixSort
by addingSortBlockedToStriped
, and benchmark coverage ofWarpScan
by addingBroadcast
. - Updated
docs
directory structure to match the standard of rocm-docs-core.
BlockRadixRankMatch
is currently broken under the rocPRIM backend.BlockRadixRankMatch
with a warp size that does not exactly divide the block size is broken under the CUB backend.
- CMake functionality to improve build parallelism of the test suite that splits compilation units by function or by parameters.
- New overload for
BlockAdjacentDifference::SubtractLeftPartialTile
that takes a predecessor item.
- Improved build parallelism of the test suite by splitting up large compilation units for
DeviceRadixSort
,DeviceSegmentedRadixSort
andDeviceSegmentedSort
. - CUB backend references CUB and Thrust version 1.17.1.
BlockRadixRankMatch
is currently broken under the rocPRIM backend.BlockRadixRankMatch
with a warp size that does not exactly divide the block size is broken under the CUB backend.
- UniqueByKey device algorithm
- SubtractLeft, SubtractLeftPartialTile, SubtractRight, SubtractRightPartialTile overloads in BlockAdjacentDifference.
- The old overloads (FlagHeads, FlagTails, FlagHeadsAndTails) are deprecated.
- DeviceAdjacentDifference algorithm.
- Extended benchmark suite of
DeviceHistogram
,DeviceScan
,DevicePartition
,DeviceReduce
,DeviceSegmentedReduce
,DeviceSegmentedRadixSort
,DeviceRadixSort
,DeviceSpmv
,DeviceMergeSort
,DeviceSegmentedSort
- Obsolated type traits defined in util_type.hpp. Use the standard library equivalents instead.
- CUB backend references CUB and thrust version 1.16.0.
- DeviceRadixSort's num_items parameter's type is now templated instead of being an int.
- If an integral type with a size at most 4 bytes is passed (i.e. an int), the former logic applies.
- Otherwise the algorithm uses a larger indexing type that makes it possible to sort input data over 2**32 elements.
- Packages for tests and benchmark executable on all supported OSes using CPack.
- Device segmented sort
- Warp merge sort, WarpMask and thread sort from cub 1.15.0 supported in hipCUB
- Device three way partition
- Device_scan and device_segmented_scan: inclusive_scan now uses the input-type as accumulator-type, exclusive_scan uses initial-value-type.
- This particularly changes behaviour of small-size input types with large-size output types (e.g. short input, int output).
- And low-res input with high-res output (e.g. float input, double output)
- Block merge sort no longer supports non power of two blocksizes
- grid unit test hanging on HIP on Windows
- Added missing includes to hipcub.hpp
- Bfloat16 support to test cases (device_reduce & device_radix_sort)
- Device merge sort
- Block merge sort
- API update to CUB 1.14.0
- The SetupNVCC.cmake automatic target selector select all of the capabalities of all available card for NVIDIA backend.
- Initial HIP on Windows support. See README for instructions on how to build and install.
- Packaging changed to a development package (called hipcub-dev for
.deb
packages, and hipcub-devel for.rpm
packages). As hipCUB is a header-only library, there is no runtime package. To aid in the transition, the development package sets the "provides" field to provide the package hipcub, so that existing packages depending on hipcub can continue to work. This provides feature is introduced as a deprecated feature and will be removed in a future ROCm release.
- gfx1030 support added.
- Address Sanitizer build option
- BlockRadixRank unit test failure fixed.
- DiscardOutputIterator to backend header
- Support for TexObjInputIterator and TexRefInputIterator
- Support for DevicePartition
- Minimum cmake version required is now 3.10.2
- CUB backend has been updated to 1.11.0
- Benchmark build fixed
- nvcc build fixed
- Support for DiscardOutputIterator
- No new features
- No new features
- No new features
- No new features
- No new features
- No new features
- Improved tests with fixed and random seeds for test data
- Switched to hip-clang as default compiler
- CMake searches for rocPRIM locally first; downloads from github if local search fails
- HCC build deprecated
- The following unit test failures have been observed. These are due to issues in rocclr runtime.
- BlockDiscontinuity
- BlockExchange
- BlockHistogram
- BlockRadixSort
- BlockReduce
- BlockScan