This demo shows how to deploy NVIDIA NIM (NVIDIA Inference Microservices) and build an AI application: a Multi-Agent Banking Bot! We'll deploy NVIDIA's NIM microservices and cover how easy it is to take them to production with monitoring, scaling and MLOps best practices. MLRun handles all the complexity!
The demo contains a single notebook covering two main stages:
- Model Serving & Monitoring - Deploy a NIM, add MLRun's LLM Gateway for modularity and monitoring capabilities
- Application Pipeline - Build a multi-agent banking chatbot using MLRun's GenAI Factory components
We will use:
- NVIDIA NIM - for GPU-accelerated model serving
- MLRun - as the orchestrator to operationalize it all
- LangChain - as the main framework for building the AI logic
This demo was showcased in Iguazio's Webinar Deploying GenAI in Production with NVIDIA NIM and MLRun.
Check out the webinar here:
This project can run in different development environments:
- Local computer (using PyCharm, VSCode, Jupyter, etc.)
- Inside GitHub Codespaces
- Other managed Jupyter environments
pip install mlrun langchain_nvidia_ai_endpoints langchain-openai
- NVIDIA NGC API Key
- OpenAI API Key (for monitoring)