Skip to content

v2.6.0

Compare
Choose a tag to compare
@tbirdso tbirdso released this 30 Oct 17:48
· 4 commits to main since this release

Release Artifacts

See supported platforms for compatibility.

Release Notes

New Features and Improvements

  • Support for Python 3.12 has been added and support for Python 3.8 was dropped. Python wheels are now provided for Python 3.9, 3.10, 3.11 and 3.12.

  • Data Flow Tracking is available for multi-fragment distributed applications. It is currently
    supported on a single machine. Later releases will support multiple machine scenarios.

Core
  • Dependencies are updated to align with the IGX OS 1.1 compute stack and DLFW 24.08 containers.
    Refer to the project Dockerfile and to DEVELOP.md documentation for a complete list of
    package versions.
  • Distributed application fragments not using GPU data can now run on nodes that do not have a GPU. Any CPU-only nodes used by the application must have CUDA installed.
  • Distributed applications can be run in a legacy, synchronous data transmit mode by setting HOLOSCAN_UCX_ASYNCHRONOUS=False (this will match the distributed app behavior from Holoscan releases v0.6 and v1.x). The synchronous mode blocks on sending a message from an output port until the send is complete. The default, asynchronous mode that has been the default since 2.0 may show a performance benefit as it does non-blocking sends, but this can make it harder to predict the required minimum memory buffer space when using an allocator such as BlockMemoryPool or RMMAllocator as the transmitter may call compute again before the prior message has been received. The user can now control which mode is preferred.
Operators/Resources
  • The GPU performance of the Holoinfer module and the Inference operator had been improved by

    • eliminating CPU to GPU synchronizations
    • avoiding copies from CPU to GPU buffers and back
    • using CUDA events to synchronize between CUDA streams used by input and output data transfer and also preparing data for multi-GPU usage
    • using CUDA graphs in the TensoRT backend
    • providing a GPU accelerated version of the max_per_channel_scaled post processing operation
  • A new RMMAllocator class that uses RMM (Rapids Memory Manager) to support simultaneous host and device memory pools. This is an improvement over the existing BlockMemoryPool allocator which could only have a memory pool on a single device. RMMAllocator can be used with operators such as VideoStreamReplayerOp that need to allocate both host and device memory buffers.

  • A new StreamOrderedAllocator has been added that supports asynchronous memory allocation from a device memory pool associated with a given CUDA stream. This is based on CUDA's stream-ordered allocator APIs (cudaMallocFromPoolAsync, cudaFreeAsync). This allocator is not yet used in Holoscan v2.6, but is part of enabling improved CUDA stream handling support for Holoscan in the near future.

Holoviz module
Utils
HoloHub
Documentation

Breaking Changes

  • Support for Python 3.8 has been dropped. To continue using Python 3.8, please use Holoscan SDK <= 2.5.
  • Holoscan SDK 2.6 requires an upgraded underlying compute stack with CUDA Toolkit 12.6 and TensorRT 10.3. We recommend that customers using Holoscan SDK on the IGX 1.0 platform install Holoscan SDK 2.6 via the NGC container. Customers using Holoscan SDK via Debian or Python wheel packages on IGX 1.0 should use Holoscan <=SDK 2.5.

Bug fixes

Issue Description
4792457 Fixed various memory issues, including memory leaks and memory corruption. An issue with the validation of IPv6 addresses when resolving IP addresses (hostname) from the CLI has been fixed. An issue with heap-use-after-free when traversing the graph has been fixed.

Known Issues

This section supplies details about issues discovered during development and QA but not resolved in this release.

Issue Description
4062979 When Operators connected in a Directed Acyclic Graph (DAG) are executed in a multithreaded scheduler, it is not ensured that their execution order in the graph is adhered.
4267272 AJA drivers cannot be built with RDMA on IGX SW 1.0 DP iGPU due to missing nv-p2p.h. Expected to be addressed in IGX SW 1.0 GA.
4384768 No RDMA support on JetPack 6.0 DP and IGX SW 1.0 DP iGPU due to missing nv-p2p kernel module. Expected to be addressed in JP 6.0 GA and IGX SW 1.0 GA respectively.
4190019 Holoviz segfaults on multi-gpu setup when specifying device using the --gpus flag with docker run. Current workaround is to use CUDA_VISIBLE_DEVICES in the container instead.
4210082 v4l2 applications seg faults at exit or crashes at start with '_PyGILState_NoteThreadState: Couldn't create autoTSSkey maping'
4339399 High CPU usage observed with video_replayer_distributed application. While the high CPU usage associated with the GXF UCX extension has been fixed since v1.0, distributed applications using the MultiThreadScheduler (with the check_recession_period_ms parameter set to 0 by default) may still experience high CPU usage. Setting the HOLOSCAN_CHECK_RECESSION_PERIOD_MS environment variable to a value greater than 0 (e.g. 1.5) can help reduce CPU usage. However, this may result in increased latency for the application until the MultiThreadScheduler switches to an event-based multithreaded scheduler.
4318442 UCX cuda_ipc protocol doesn't work in Docker containers on x86_64. As a workaround, we are currently disabling the UCX cuda_ipc protocol on all platforms via the UCX_TLS environment variable.
4325468 The V4L2VideoCapture operator only supports YUYV and AB24 source pixel formats, and only outputs the RGBA GXF video format. Other source pixel formats compatible with V4L2 can be manually defined by the user, but they're assumed to be equivalent to RGBA8888.
4325585 Applications using MultiThreadScheduler may exit early due to timeouts. This occurs when the stop_on_deadlock_timeout parameter is improperly set to a value equal to or less than check_recession_period_ms, particularly if check_recession_period_ms is greater than zero.
4301203 HDMI IN fails in v4l2_camera on IGX Orin Devkit for some resolution or formats. Try the latest firmware as a partial fix. Driver-level fixes expected in IGX SW 1.0 GA.
4384348 UCX termination (either ctrl+c , press 'Esc' or clicking close button) is not smooth and can show multiple error messages.
4481171 Running the driver for a distributed applications on IGX Orin devkits fails when connected to other systems through eth1. A workaround is to use eth0 port to connect to other systems for distributed workloads.
4458192 In scenarios where distributed applications have both the driver and workers running on the same host, either within a Docker container or directly on the host, there's a possibility of encountering "Address already in use" errors. A potential solution is to assign a different port number to the HOLOSCAN_HEALTH_CHECK_PORT environment variable (default: 8777), for example, by using export HOLOSCAN_HEALTH_CHECK_PORT=8780.
4782662 Installing Holoscan wheel 2.0.0 or later as root causes error.
4768945 Distributed applications crash when the engine file is unavailable/generating engine file.
4753994 Debugging Python application may lead to segfault when expanding an operator variable.
Wayland: holoscan::viz::Init() with existing GLFW window fails.
4394306 When Python bindings are created for a C++ Operator, it is not always guaranteed that the destructor will be called prior to termination of the Python application. As a workaround to this issue, it is recommended that any resource cleanup should happen in an operator's stop() method rather than in the destructor.
4824619 iGPU: Rendering YUV images with HolovizOp fails on first run
4902749 V4L2 applications segfault at start if using underlying NVV4L2
4909073 V4L2 and AJA applications in x86 container report Wayland XDG_RUNTIME_DIR not set error
4909088 CPP video_replayer_distributed example throws UCX errors and segfaults on close
4911129 HoloHub Endoscopy Tool Tracking application latency exceeds 50ms on Jetson devices
4903377 The newly introduced RMMAllocator and StreamOrderedAllocator incorrectly parse the Bytes suffix ("B") as Megabytes ("MB") for all parameters related to memory size. Please specify any memory size parameters using the "KB", "MB", "GB" or "TB" suffixes instead.