v2.6.0
Release Artifacts
- π Docker container: tag
v2.6.0-dgpu
andv2.6.0-igpu
- π Python wheel:
pip install holoscan==2.6.0
- π¦οΈ Debian packages:
2.6.0.1-1
- π Documentation
See supported platforms for compatibility.
Release Notes
New Features and Improvements
-
Support for Python 3.12 has been added and support for Python 3.8 was dropped. Python wheels are now provided for Python 3.9, 3.10, 3.11 and 3.12.
-
Data Flow Tracking is available for multi-fragment distributed applications. It is currently
supported on a single machine. Later releases will support multiple machine scenarios.
Core
- Dependencies are updated to align with the IGX OS 1.1 compute stack and DLFW 24.08 containers.
Refer to the project Dockerfile and to DEVELOP.md documentation for a complete list of
package versions. - Distributed application fragments not using GPU data can now run on nodes that do not have a GPU. Any CPU-only nodes used by the application must have CUDA installed.
- Distributed applications can be run in a legacy, synchronous data transmit mode by setting
HOLOSCAN_UCX_ASYNCHRONOUS=False
(this will match the distributed app behavior from Holoscan releases v0.6 and v1.x). The synchronous mode blocks on sending a message from an output port until the send is complete. The default, asynchronous mode that has been the default since 2.0 may show a performance benefit as it does non-blocking sends, but this can make it harder to predict the required minimum memory buffer space when using an allocator such as BlockMemoryPool or RMMAllocator as the transmitter may call compute again before the prior message has been received. The user can now control which mode is preferred.
Operators/Resources
-
The GPU performance of the Holoinfer module and the Inference operator had been improved by
- eliminating CPU to GPU synchronizations
- avoiding copies from CPU to GPU buffers and back
- using CUDA events to synchronize between CUDA streams used by input and output data transfer and also preparing data for multi-GPU usage
- using CUDA graphs in the TensoRT backend
- providing a GPU accelerated version of the
max_per_channel_scaled
post processing operation
-
A new
RMMAllocator
class that uses RMM (Rapids Memory Manager) to support simultaneous host and device memory pools. This is an improvement over the existingBlockMemoryPool
allocator which could only have a memory pool on a single device.RMMAllocator
can be used with operators such asVideoStreamReplayerOp
that need to allocate both host and device memory buffers. -
A new
StreamOrderedAllocator
has been added that supports asynchronous memory allocation from a device memory pool associated with a given CUDA stream. This is based on CUDA's stream-ordered allocator APIs (cudaMallocFromPoolAsync
,cudaFreeAsync
). This allocator is not yet used in Holoscan v2.6, but is part of enabling improved CUDA stream handling support for Holoscan in the near future.
Holoviz module
Utils
HoloHub
Documentation
Breaking Changes
- Support for Python 3.8 has been dropped. To continue using Python 3.8, please use Holoscan SDK <= 2.5.
- Holoscan SDK 2.6 requires an upgraded underlying compute stack with CUDA Toolkit 12.6 and TensorRT 10.3. We recommend that customers using Holoscan SDK on the IGX 1.0 platform install Holoscan SDK 2.6 via the NGC container. Customers using Holoscan SDK via Debian or Python wheel packages on IGX 1.0 should use Holoscan <=SDK 2.5.
Bug fixes
Issue | Description |
---|---|
4792457 | Fixed various memory issues, including memory leaks and memory corruption. An issue with the validation of IPv6 addresses when resolving IP addresses (hostname) from the CLI has been fixed. An issue with heap-use-after-free when traversing the graph has been fixed. |
Known Issues
This section supplies details about issues discovered during development and QA but not resolved in this release.
Issue | Description |
---|---|
4062979 | When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered. |
4267272 | AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing nv-p2p.h . Expected to be addressed in IGX SW 1.0 GA. |
4384768 | No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing nv-p2p kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively. |
4190019 | Holoviz segfaults on multi-gpu setup when specifying device using the --gpus flag with docker run . Current workaround is to use CUDA_VISIBLE_DEVICES in the container instead. |
4210082 | v4l2 applications seg faults at exit or crashes at start with '_PyGILState_NoteThreadState: Couldn't create autoTSSkey maping' |
4339399 | High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the check_recession_period_ms parameter set to 0 by default) may still experience high CPU usage. Setting the HOLOSCAN_CHECK_RECESSION_PERIOD_MS environment variable to a value greater than 0 (e.g. 1.5 ) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler. |
4318442 | UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the UCX_TLS environment variable. |
4325468 | The V4L2VideoCapture operator only supports YUYV and AB24 source pixel formats, and only outputs the RGBA GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888. |
4325585 | Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the stop_on_deadlock_timeout parameter is improperly set to a value equal to or less than check_recession_period_ms , particularly if check_recession_period_ms is greater than zero. |
4301203 | HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA. |
4384348 | UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages. |
4481171 | Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads. |
4458192 | In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the HOLOSCAN_HEALTH_CHECK_PORT environment variable (default: 8777 ), for example, by using export HOLOSCAN_HEALTH_CHECK_PORT=8780 . |
4782662 | Installing Holoscan wheel 2.0.0 or later as root causes error. |
4768945 | Distributed applications crash when the engine file is unavailable/generating engine file. |
4753994 | Debugging Python application may lead to segfault when expanding an operator variable. |
Wayland: holoscan::viz::Init() with existing GLFW window fails. | |
4394306 | When Python bindings are created for a C++ Operator, it is not always guaranteed that the destructor will be called prior to termination of the Python application. As a workaround to this issue, it is recommended that any resource cleanup should happen in an operator's stop() method rather than in the destructor. |
4824619 | iGPU: Rendering YUV images with HolovizOp fails on first run |
4902749 | V4L2 applications segfault at start if using underlying NVV4L2 |
4909073 | V4L2 and AJA applications in x86 container report Wayland XDG_RUNTIME_DIR not set error |
4909088 | CPP video_replayer_distributed example throws UCX errors and segfaults on close |
4911129 | HoloHub Endoscopy Tool Tracking application latency exceeds 50ms on Jetson devices |
4903377 | The newly introduced RMMAllocator and StreamOrderedAllocator incorrectly parse the Bytes suffix ("B") as Megabytes ("MB") for all parameters related to memory size. Please specify any memory size parameters using the "KB", "MB", "GB" or "TB" suffixes instead. |