This project focuses on analyzing video content to detect and annotate objects using Google Gemini's generative AI capabilities, combined with OpenCV for video processing and annotation. The result is an annotated video that highlights all identified objects and features, along with their properties and appearance times.
The main goal is to:
- Analyze a video and identify all objects and features present throughout its duration.
- Generate bounding boxes and metadata for each detected object, including its condition, time of appearance, and disappearance.
- Annotate the video with bounding boxes and labels for all detected objects.
The output is an annotated video that visually displays the identified objects, providing valuable insights into the video's content.
Output video:
- Object Detection: Identifies all visible objects in the video, along with their properties (e.g., name, condition, bounding boxes, and timestamps).
- AI-Powered Analysis: Utilizes Google Gemini's generative AI model for object detection and metadata generation.
- Dynamic Video Annotation: Applies bounding boxes and labels to objects in the video based on their appearance.
- Error Handling: Includes robust error-handling mechanisms for incomplete data and invalid bounding box formats.
-
Video Upload:
- The video is uploaded and processed using Google Gemini's file management API.
- The system waits for Gemini to finish processing the video before proceeding.
-
Video Analysis:
- The video's length is calculated using OpenCV to ensure annotations cover the entire duration.
- A detailed prompt is sent to Gemini for object detection and metadata extraction.
-
Metadata Parsing:
- The JSON response from Gemini is parsed to extract object information, including:
- Object name
- Condition
- Bounding box coordinates
- Appearance and disappearance timestamps
- The JSON response from Gemini is parsed to extract object information, including:
-
Video Annotation:
- OpenCV processes the video frame by frame.
- Objects are annotated with bounding boxes and labels if they are visible in the current frame.
- Unique colors are used for each object for clear visual distinction.
-
Output Generation:
- The annotated video is saved in MP4 format, displaying all detected objects with their respective metadata.
To run this project, ensure you have the following dependencies installed:
- Python 3.7 or above
- Libraries:
google.generativeai
opencv-python
opencv-python-headless
numpy
json
logging
- The project code is saved in a Google Colab notebook for ease of execution and reproducibility.
- The output is saved as
annotated_video.mp4
in the working directory.
-
Setup Dependencies:
- Install the required libraries using
pip
:pip install opencv-python opencv-python-headless google-generativeai numpy
- Install the required libraries using
-
Run the Notebook:
- Open the provided Colab notebook.
- Replace the
video_file_path
variable with the path to your input video. - Execute the cells in order.
-
Output:
- The annotated video will be saved as
annotated_video.mp4
in the current working directory.
- The annotated video will be saved as
-
Prompt Engineering:
- A custom prompt is designed to extract comprehensive metadata for all objects in the video, covering the entire duration.
- Dynamic video length integration ensures accurate start and end times for object annotations.
-
AI and Vision Integration:
- Google Gemini handles high-level object detection and metadata generation.
- OpenCV applies bounding boxes and renders annotations directly on the video frames.
-
Error Handling:
- Missing or invalid metadata (e.g., malformed bounding boxes) is logged and skipped without interrupting the process.
- Extend support for real-time object tracking across frames.
- Improve bounding box accuracy using advanced object detection models (e.g., YOLO or SSD).
- Add interactivity, such as allowing users to filter objects by type or condition in the annotated video.
- This project leverages Google Gemini for object analysis and OpenCV for video processing and visualization.