Release v2.6.0 · nvidia-holoscan/holoscan-sdk

Release Artifacts

🐋 Docker container: tag v2.6.0-dgpu and v2.6.0-igpu
🐍 Python wheel: pip install holoscan==2.6.0
📦️ Debian packages: 2.6.0.1-1
📕 Documentation

See supported platforms for compatibility.

Release Notes

New Features and Improvements

Support for Python 3.12 has been added and support for Python 3.8 was dropped. Python wheels are now provided for Python 3.9, 3.10, 3.11 and 3.12.
Data Flow Tracking is available for multi-fragment distributed applications. It is currently
supported on a single machine. Later releases will support multiple machine scenarios.

Core

Dependencies are updated to align with the IGX OS 1.1 compute stack and DLFW 24.08 containers.
Refer to the project Dockerfile and to DEVELOP.md documentation for a complete list of
package versions.
Distributed application fragments not using GPU data can now run on nodes that do not have a GPU. Any CPU-only nodes used by the application must have CUDA installed.
Distributed applications can be run in a legacy, synchronous data transmit mode by setting HOLOSCAN_UCX_ASYNCHRONOUS=False (this will match the distributed app behavior from Holoscan releases v0.6 and v1.x). The synchronous mode blocks on sending a message from an output port until the send is complete. The default, asynchronous mode that has been the default since 2.0 may show a performance benefit as it does non-blocking sends, but this can make it harder to predict the required minimum memory buffer space when using an allocator such as BlockMemoryPool or RMMAllocator as the transmitter may call compute again before the prior message has been received. The user can now control which mode is preferred.

Operators/Resources

The GPU performance of the Holoinfer module and the Inference operator had been improved by
- eliminating CPU to GPU synchronizations
- avoiding copies from CPU to GPU buffers and back
- using CUDA events to synchronize between CUDA streams used by input and output data transfer and also preparing data for multi-GPU usage
- using CUDA graphs in the TensoRT backend
- providing a GPU accelerated version of the max_per_channel_scaled post processing operation
A new RMMAllocator class that uses RMM (Rapids Memory Manager) to support simultaneous host and device memory pools. This is an improvement over the existing BlockMemoryPool allocator which could only have a memory pool on a single device. RMMAllocator can be used with operators such as VideoStreamReplayerOp that need to allocate both host and device memory buffers.
A new StreamOrderedAllocator has been added that supports asynchronous memory allocation from a device memory pool associated with a given CUDA stream. This is based on CUDA's stream-ordered allocator APIs (cudaMallocFromPoolAsync, cudaFreeAsync). This allocator is not yet used in Holoscan v2.6, but is part of enabling improved CUDA stream handling support for Holoscan in the near future.

Holoviz module

Utils

HoloHub

Documentation

Breaking Changes

Support for Python 3.8 has been dropped. To continue using Python 3.8, please use Holoscan SDK <= 2.5.
Holoscan SDK 2.6 requires an upgraded underlying compute stack with CUDA Toolkit 12.6 and TensorRT 10.3. We recommend that customers using Holoscan SDK on the IGX 1.0 platform install Holoscan SDK 2.6 via the NGC container. Customers using Holoscan SDK via Debian or Python wheel packages on IGX 1.0 should use Holoscan <=SDK 2.5.

Bug fixes

Issue	Description
4792457	Fixed various memory issues, including memory leaks and memory corruption. An issue with the validation of IPv6 addresses when resolving IP addresses (hostname) from the CLI has been fixed. An issue with heap-use-after-free when traversing the graph has been fixed.

Known Issues

This section supplies details about issues discovered during development and QA but not resolved in this release.

Issue	Description
4062979	When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered.
4267272	AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing `nv-p2p.h`. Expected to be addressed in IGX SW 1.0 GA.
4384768	No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing `nv-p2p` kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively.
4190019	Holoviz segfaults on multi-gpu setup when specifying device using the `--gpus` flag with `docker run`. Current workaround is to use `CUDA_VISIBLE_DEVICES` in the container instead.
4210082	v4l2 applications seg faults at exit or crashes at start with '_PyGILState_NoteThreadState: Couldn't create autoTSSkey maping'
4339399	High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the `check_recession_period_ms` parameter set to `0` by default) may still experience high CPU usage. Setting the `HOLOSCAN_CHECK_RECESSION_PERIOD_MS` environment variable to a value greater than 0 (e.g. `1.5`) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler.
4318442	UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the `UCX_TLS` environment variable.
4325468	The `V4L2VideoCapture` operator only supports `YUYV` and `AB24` source pixel formats, and only outputs the `RGBA` GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888.
4325585	Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the `stop_on_deadlock_timeout` parameter is improperly set to a value equal to or less than `check_recession_period_ms`, particularly if `check_recession_period_ms` is greater than zero.
4301203	HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA.
4384348	UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages.
4481171	Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads.
4458192	In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the `HOLOSCAN_HEALTH_CHECK_PORT` environment variable (default: `8777`), for example, by using `export HOLOSCAN_HEALTH_CHECK_PORT=8780`.
4782662	Installing Holoscan wheel 2.0.0 or later as root causes error.
4768945	Distributed applications crash when the engine file is unavailable/generating engine file.
4753994	Debugging Python application may lead to segfault when expanding an operator variable.
	Wayland: holoscan::viz::Init() with existing GLFW window fails.
4394306	When Python bindings are created for a C++ Operator, it is not always guaranteed that the destructor will be called prior to termination of the Python application. As a workaround to this issue, it is recommended that any resource cleanup should happen in an operator's `stop()` method rather than in the destructor.
4824619	iGPU: Rendering YUV images with HolovizOp fails on first run
4902749	V4L2 applications segfault at start if using underlying NVV4L2
4909073	V4L2 and AJA applications in x86 container report Wayland `XDG_RUNTIME_DIR not set` error
4909088	CPP `video_replayer_distributed` example throws UCX errors and segfaults on close
4911129	HoloHub Endoscopy Tool Tracking application latency exceeds 50ms on Jetson devices
4903377	The newly introduced `RMMAllocator` and `StreamOrderedAllocator` incorrectly parse the Bytes suffix ("B") as Megabytes ("MB") for all parameters related to memory size. Please specify any memory size parameters using the "KB", "MB", "GB" or "TB" suffixes instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.6.0

Release Artifacts

Release Notes

New Features and Improvements

Core

Operators/Resources

Holoviz module

Utils

HoloHub

Documentation

Breaking Changes

Bug fixes

Known Issues