CLIPMatch is a gradio-based application that facilitates the exploration of specific visual content within a video by describing it in words. It employs the capabilities of OpenAI's CLIP model to analyze the similarity between video content and textual descriptions. Users can upload a video file and enter a text query, which then generates a similarity graph over time, making it possible to pinpoint the segments where the described content is most likely to appear.
- Video Upload: Upload a video file to be analyzed.
- Text Query: Enter a text description to search for specific visual content within the video.
- Similarity Graph: Generate a similarity graph showing the correlation between the video and text over time, identifying the instances where the described content is visually represented.
- Closest Match Identification: The graph highlights the point of highest similarity between the video and text, aiding in locating the described visual content.
Ensure you have the necessary libraries installed using the following:
pip install -r requirements.txt
- Clone the repository to your local machine.
- Navigate to the project directory in the terminal.
- Run the following command to launch the Gradio interface:
python app.py
- Open the Gradio interface in your web browser (the URL will be displayed in the terminal).
- Upload a video file and enter a text query to search for specific visual content within the video.
- View the similarity graph and analyze the results to find the closest match for your query.
- Gradio
- Matplotlib
- NumPy
- OpenCV
- Torch
- Torchvision
- Clip
- Matplotlib
- PIL (Pillow)
- tqdm