From 5ff766bfa957de51f784a054fa553e47e2c5c9c0 Mon Sep 17 00:00:00 2001 From: Dustin Franklin Date: Fri, 9 Aug 2024 12:47:05 -0400 Subject: [PATCH] added ROS nodes --- docs/ros.md | 335 +++++++++++++++++++++++++++++++++++++++++ docs/tutorial-intro.md | 30 ++-- mkdocs.yml | 14 +- 3 files changed, 356 insertions(+), 23 deletions(-) create mode 100644 docs/ros.md diff --git a/docs/ros.md b/docs/ros.md new file mode 100644 index 00000000..6532b46f --- /dev/null +++ b/docs/ros.md @@ -0,0 +1,335 @@ +# ROS2 Nodes + +The [`ros2_nanollm`](https://github.com/NVIDIA-AI-IOT/ros2_nanollm) package provides ROS2 nodes for running optimized LLM's and VLM's locally inside a container. These are built on [NanoLLM](tutorial_nano-llm.md) and ROS2 Humble for deploying generative AI models onboard your robot with Jetson. + +![](https://github.com/NVIDIA-AI-IOT/ros2_nanollm/raw/main/assets/sample.gif) + + +!!! abstract "What you need" + + 1. One of the following Jetson devices: + + Jetson AGX Orin (64GB) + Jetson AGX Orin (32GB) + Jetson Orin NX (16GB) + Jetson Orin Nano (8GB)⚠️ + + 2. Running one of the following versions of [JetPack](https://developer.nvidia.com/embedded/jetpack): + + JetPack 5 (L4T r35.x) + JetPack 6 (L4T r36.x) + + 3. Sufficient storage space (preferably with NVMe SSD). + + - `22GB` for `nano_llm:humble` container image + - Space for models (`>10GB`) + + 4. Clone and setup [`jetson-containers`](https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md){:target="_blank"}: + + ```bash + git clone https://github.com/dusty-nv/jetson-containers + bash jetson-containers/install.sh + ``` + +## Running the Live Demo + +!!! abstract "Recommended" + + Before you start, please review [NanoVLM](https://www.jetson-ai-lab.com/tutorial_nano-vlm.html) and [Live LLaVa](https://www.jetson-ai-lab.com/tutorial_live-llava.html) demos. For primary documentation, view [ROS2 NanoLLM](https://github.com/NVIDIA-AI-IOT/ros2_nanollm). + +1. Ensure you have a camera device connected + + ``` + ls /dev/video* + ``` + + +2. Use the `jetson-containers run` and `autotag` commands to automatically pull or build a compatible container image. + + ``` + jetson-containers run $(autotag nano_llm:humble) \ + ros2 launch ros2_nanollm camera_input_example.launch.py + ``` + + This command will start the launch file of the container. + +By default this will load the [`Efficient-Large-Model/Llama-3-VILA1.5-8B`](https://huggingface.co/Efficient-Large-Model/Llama-3-VILA1.5-8b) VLM and publish the image captions and overlay to topics that can be subscribed to by your other nodes, or visualized with RViz or Foxglove. Refer to the [`ros2_nanollm`](https://github.com/NVIDIA-AI-IOT/ros2_nanollm) repo for documentation on the input/output topics that are exposed. + +## Build your own ROS Nodes + +To build your own ROS2 node using LLM or VLM, first create a ROS 2 workspace and package in a directory mounted to the container (following the [ROS 2 Humble Documentation](https://docs.ros.org/en/humble/Tutorials/Beginner-Client-Libraries/Creating-A-Workspace/Creating-A-Workspace.html)). Your src folder should then look like this: + + + └── src + └── your-package-name + ├── launch + └── camera_input.launch.py + ├── resource + └── your-package-name + ├── your-package-name + └── __init__.py + └── your-node-name_py.py + ├── test + └── test_copyright.py + └── test_flake8.py + └── test_pep257.py + ├── package.xml + ├── setup.cfg + ├── setup.py + └── README.md + +We will create the launch folder, as well as the camera_input.launch.py and your-node-name_py.py files in later steps. + + +###**Editing the setup.py file** + +Let’s begin by editing the setup.py file. + +At the top of the file, add + +``` +from glob import glob +``` + +In the setup method, find the data_files=[] line, and make sure it looks like this: + +``` +data_files=[ + ('share/ament_index/resource_index/packages', + ['resource/' + package_name]), + ('share/' + package_name, ['package.xml']), + ('share/' + package_name, glob('launch/*.launch.py')), + ], +``` + +Edit the maintainer line with your name. Edit the maintainer email to your email. Edit the description line to describe your package. + +``` +maintainer='kshaltiel', +maintainter_email='kshaltiel@nvidia.com', +description='YOUR DESCRIPTION', +``` + +Find the ```console_scripts``` line in the entry_points method. Edit the inside to be: + +``` +'your-node-name_py = your-package-name.your-node-name_py:main' +``` + +For example: +``` +entry_points={ + 'console_scripts': [ + 'nano_llm_py = ros2_nanollm.nano_llm_py:main' + ], + }, +``` +*All done for this file!* + + +###**Creating and writing in the your-node-name_py.py file** + +Inside your package, under the folder that shares your package's name and contains the ```__init__.py``` file, create a file named after your node. For NanoLLM, this file would be called ```nano_llm_py.py```. + +Paste the following code into the empty file: + +```python +import rclpy +from std_msgs.msg import String +from sensor_msgs.msg import Image +from cv_bridge import CvBridge +from PIL import Image as im +from MODEL_NAME import NECESSARY_MODULES + +class Your_Model_Subscriber(Node): + + def __init__(self): + super().__init__('your_model_subscriber') + + #EDIT PARAMETERS HERE + self.declare_parameter('param1', "param1_value") + self.declare_parameter('param2', "param2_value") + + # Subscriber for input query + self.query_subscription = self.create_subscription( + String, + 'input_query', + self.query_listener_callback, + 10) + self.query_subscription # prevent unused variable warning + + # Subscriber for input image + self.image_subscription = self.create_subscription( + Image, + 'input_image', + self.image_listener_callback, + 10) + self.image_subscription # prevent unused variable warning + + # To convert ROS image message to OpenCV image + self.cv_br = CvBridge() + + #LOAD THE MODEL + self.model = INSERT_MODEL.from_pretrained("PATH-TO-MODEL") + + #chatHistory var + self.chat_history = ChatHistory(self.model) + + ## PUBLISHER + self.output_publisher = self.create_publisher(String, 'output', 10) + self.query = "Describe the image." + + def query_listener_callback(self, msg): + self.query = msg.data + + def image_listener_callback(self, data): + input_query = self.query + + # call model with input_query and input_image + cv_img = self.cv_br.imgmsg_to_cv2(data, 'rgb8') + PIL_img = im.fromarray(cv_img) + + # Parsing input text prompt + prompt = input_query.strip("][()") + text = prompt.split(',') + self.get_logger().info('Your query: %s' % text) #prints the query + + #chat history + self.chat_history.append('user', image=PIL_img) + self.chat_history.append('user', prompt, use_cache=True) + embedding, _ = self.chat_history.embed_chat() + + #GENERATE OUTPUT + output = self.model.generate( + inputs=embedding, + kv_cache=self.chat_history.kv_cache, + min_new_tokens = 10, + streaming = False, + do_sample = True, + ) + + output_msg = String() + output_msg.data = output + self.output_publisher.publish(output_msg) + self.get_logger().info(f"Published output: {output}") + +def main(args=None): + rclpy.init(args=args) + + your_model_subscriber = Your_Model_Subscriber() + + rclpy.spin(your_model_subscriber) + + # Destroy the node explicitly + # (optional - otherwise it will be done automatically + # when the garbage collector destroys the node object) + nano_llm_subscriber.destroy_node() + rclpy.shutdown() + +if __name__ == '__main__': + main() +``` + + +Edit the import statement at the top of the file to import the necessary modules from the model. + +Next, edit the class name and name inside the ```__init__()``` function to reflect the model that will be used. + +Find the comment that reads ```#EDIT PARAMETERS HERE```. Declare all parameters except for the model name following the format in the file. Under the ```#LOAD THE MODEL``` comment, include the path to the model. + +Lastly, edit the generate method under the ```GENERATE OUTPUT``` comment to include any additional parameters. + +*All done for this file!* + + +###**Creating and writing in the camera_input.launch.py file** + +Inside your package, create the launch folder. Create your launch file inside of it. + +``` +mkdir launch +cd launch +touch camera_input.launch.py +``` + +You can edit this file externally, and it will update within the container. Paste the following code into the empty file. + +``` +from launch import LaunchDescription +from launch_ros.actions import Node +from launch.substitutions import LaunchConfiguration +from launch.actions import DeclareLaunchArgument + +def generate_launch_description(): + launch_args = [ + DeclareLaunchArgument( + 'param1', + default_value='param1_default', + description='Description of param1'), + DeclareLaunchArgument( + 'param2', + default_value='param2_default', + description='Description of param2'), + ] + + + #Your model parameters + param1 = LaunchConfiguration('param1') + param2 = LaunchConfiguration('param2') + + #camera node for camera input + cam2image_node = Node( + package='image_tools', + executable='cam2image', + remappings=[('image', 'input_image')], + ) + + #model node + model_node = Node( + package='your-package-name', #make sure your package is named this + executable='your-node-name_py', + parameters=[{ + 'param1': param1, + 'param2': param2, + }] + ) + + final_launch_description = launch_args + [cam2image_node] + [model_node] + + return LaunchDescription(final_launch_description) + + +``` + +Find the required parameters for your model. You can view this by looking at the [Model API](https://dusty-nv.github.io/NanoLLM/models.html#model-api) for your specific model and taking note to how the model is called. For example, NanoLLM retrieves models through the following: + +``` +model = NanoLLM.from_pretrained( + "meta-llama/Llama-3-8b-hf", # HuggingFace repo/model name, or path to HF model checkpoint + api='mlc', # supported APIs are: mlc, awq, hf + quantization='q4f16_ft' # q4f16_ft, q4f16_1, q8f16_0 for MLC, or path to AWQ weights +) +``` + +The parameters for NanoLLM would be the model name, api, and quantization. + +In the ```generate_launch_description``` function, edit the ```DeclareLaunchArgument``` to accomodate for all parameters except the model name. For NanoLLM, this would look like: + +``` +def generate_launch_description(): + launch_args = [ + DeclareLaunchArgument( + 'api', + default_value='mlc', + description='The model backend to use'), + DeclareLaunchArgument( + 'quantization', + default_value='q4f16_ft', + description='The quantization method to use'), + ] +``` + +Then edit the lines under ```#Your model Parameters``` to match the parameters of your model, again excluding the model name. Lastly, fill in the code under the ```#model node``` comment with your package name, the name of your node file, and all of your parameters, this time including the model. + +*All done for this file!* + diff --git a/docs/tutorial-intro.md b/docs/tutorial-intro.md index eb7f09fe..631c98b5 100644 --- a/docs/tutorial-intro.md +++ b/docs/tutorial-intro.md @@ -26,14 +26,7 @@ Give your locally running LLM an access to vision! | **[Live LLaVA](./tutorial_live-llava.md)** | Run multimodal models interactively on live video streams over a repeating set of prompts. | | **[NanoVLM](./tutorial_nano-vlm.md)** | Use mini vision/language models and the optimized multimodal pipeline for live streaming. | -### Image Generation - -| | | -| :---------- | :----------------------------------- | -| **[Stable Diffusion](./tutorial_stable-diffusion.md)** | Run AUTOMATIC1111's [`stable-diffusion-webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to generate images from prompts | -| **[Stable Diffusion XL](./tutorial_stable-diffusion-xl.md)** | A newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities.| - -### Vision Transformers & CV +### Vision Transformers | | | | :---------- | :----------------------------------- | @@ -42,7 +35,6 @@ Give your locally running LLM an access to vision! | **[NanoSAM](./vit/tutorial_nanosam.md)** | [NanoSAM](https://github.com/NVIDIA-AI-IOT/nanosam), SAM model variant capable of running in real-time on Jetson | | **[SAM](./vit/tutorial_sam.md)** | Meta's [SAM](https://github.com/facebookresearch/segment-anything), Segment Anything model | | **[TAM](./vit/tutorial_tam.md)** | [TAM](https://github.com/gaomingqi/Track-Anything), Track-Anything model, is an interactive tool for video object tracking and segmentation | -| **[Ultralytics YOLOv8](./tutorial_ultralytics.md)** | Run [Ultralytics](https://www.ultralytics.com) YOLOv8 on Jetson with NVIDIA TensorRT. | ### RAG & Vector Database @@ -52,6 +44,16 @@ Give your locally running LLM an access to vision! | **[LlamaIndex](./tutorial_llamaindex.md)** | Realize RAG (Retrieval Augmented Generation) so that an LLM can work with your documents | | **[LlamaIndex](./tutorial_jetson-copilot.md)** | Reference application for building your own local AI assistants using LLM, RAG, and VectorDB | + +### API Integrations +| | | +| :---------- | :----------------------------------- | +| **[ROS2 Nodes](./ros.md)** | Optimized LLM and VLM provided as [ROS2 nodes](./ros.md) for robotics | +| **[Holoscan SDK](./tutorial_holoscan.md)** | Use the [Holoscan-SDK](https://github.com/nvidia-holoscan/holoscan-sdk) to run high-throughput, low-latency edge AI pipelines | +| **[Gapi Workflows](./tutorial_gapi_workflows.md)** | Integrating generative AI into real world environments | +| **[Gapi Micro Services](./tutorial_gapi_microservices.md)** | Wrapping models and code to participate in systems | +| **[Ultralytics YOLOv8](./tutorial_ultralytics.md)** | Run [Ultralytics](https://www.ultralytics.com) YOLOv8 on Jetson with NVIDIA TensorRT. | + ### Audio | | | @@ -60,16 +62,12 @@ Give your locally running LLM an access to vision! | **[AudioCraft](./tutorial_audiocraft.md)** | Meta's [AudioCraft](https://github.com/facebookresearch/audiocraft), to produce high-quality audio and music | | **[Voicecraft](./tutorial_voicecraft.md)** | Interactive speech editing and zero shot TTS | -### NVIDIA SDKs -| | | -| :---------- | :----------------------------------- | -| **[Holoscan SDK](./tutorial_holoscan.md)** | Use the [Holoscan-SDK](https://github.com/nvidia-holoscan/holoscan-sdk) to run high-throughput, low-latency edge AI pipelines | +### Image Generation -### Gapi / API Integrations | | | | :---------- | :----------------------------------- | -| **[Workflows](./tutorial_gapi_workflows.md)** | Integrating generative AI into real world environments | -| **[Micro Services](./tutorial_gapi_microservices.md)** | Wrapping models and code to participate in systems | +| **[Stable Diffusion](./tutorial_stable-diffusion.md)** | Run AUTOMATIC1111's [`stable-diffusion-webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui) to generate images from prompts | +| **[Stable Diffusion XL](./tutorial_stable-diffusion-xl.md)** | A newer ensemble pipeline consisting of a base model and refiner that results in significantly enhanced and detailed image generation capabilities.| ## About NVIDIA Jetson diff --git a/mkdocs.yml b/mkdocs.yml index 9fab5a42..5d21190d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -92,18 +92,23 @@ nav: - LLaVA: tutorial_llava.md - Live LLaVA: tutorial_live-llava.md - NanoVLM: tutorial_nano-vlm.md - - Vision Transformers & CV: + - Vision Transformers (ViT): - vit/index.md - EfficientViT: vit/tutorial_efficientvit.md - NanoOWL: vit/tutorial_nanoowl.md - NanoSAM: vit/tutorial_nanosam.md - SAM: vit/tutorial_sam.md - TAM: vit/tutorial_tam.md - - Ultralytics YOLOv8 🆕: tutorial_ultralytics.md - RAG & Vector Database: - NanoDB: tutorial_nanodb.md - LlamaIndex: tutorial_llamaindex.md - Jetson Copilot: tutorial_jetson-copilot.md + - API Integrations 🆕: + - ROS2 Nodes: ros.md + - Holoscan SDK: tutorial_holoscan.md + - Gapi Workflows: tutorial_gapi_workflows.md + - Gapi Micro Services: tutorial_gapi_microservices.md + - Ultralytics YOLOv8: tutorial_ultralytics.md - Audio: - Whisper: tutorial_whisper.md - AudioCraft: tutorial_audiocraft.md @@ -111,11 +116,6 @@ nav: - Image Generation: - Stable Diffusion: tutorial_stable-diffusion.md - Stable Diffusion XL: tutorial_stable-diffusion-xl.md - - NVIDIA SDKs: - - Holoscan SDK 🆕: tutorial_holoscan.md - - Gapi / API Integrations: - - Workflows 🆕: tutorial_gapi_workflows.md - - Micro Services 🆕: tutorial_gapi_microservices.md #- Metropolis Microservices: # - First Steps: tutorial_mmj.md # - Tools: