Contexi lets you interact with your entire codebase as a code review co-pilot, using LLM locally. |
Contexi uses:
- Multi Prompt Contextually Guided Retrieval-Augmented Generation
- Self-Critique & Self-Corrective using Chain-of-Thoughts
- Document Re-Ranking
highly optimized techniques to provide the most relevant context-aware responses to questions about your code/data.
✅ Analyzes and understands your entire codebase and data, not just isolated code snippets.
✅ Answers questions about potential security vulnerabilities anywhere in the code.
✅ Import code using git url for analysis.
✅ Learns from follow-up questions and continuously answers based on chat history context
✅ Runs entirely on your local machine for free, No Internet is required.
- Ollama - Preferred models: qwen2.5 (for more precise results)
- Recommended 16 GB RAM and plenty of free disk space
- Python 3.7+
- Various Python dependencies (see
requirements.txt
)
- Tested in Java codebase (You can configure
config.yml
to load other code/file formats)
We'd recommend installing app on python virtual environment
-
Clone this repository:
git clone https://github.com/AI-Security-Research-Group/Contexi.git cd Contexi
-
Install the required Python packages:
pip install -r requirements.txt
-
Edit
config.yml
parameters based on your requirements. -
Run
python3 main.py
Upon running main.py just select any of the below options:
(venv) coder@system ~/home/Contexi $
Welcome to Contexi!
Please select a mode to run:
1. Interactive session
2. UI
3. API
Enter your choice (1, 2, or 3):
You are ready to use the magic stick. 🪄
Send POST requests to http://localhost:8000/ask
with your questions.
Example using curl:
curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" -d '{"question": "What is the purpose of the Login class?"}'
Response format:
{
"answer": "The Login class is responsible for handling user authentication..."
}
Open an Issue if you're having problem with running or installing this script. (Script is tested in mac environment.)
You can customize various aspects of the script:
- Adjust the
chunk_size
andchunk_overlap
in thesplit_documents_into_chunks
function to change how documents are split. - Modify the
PROMPT_TEMPLATE
to alter how the LLM interprets queries and generates responses. - Change the
max_iterations
inperform_crag
to adjust how many times the system will attempt to refine an answer. - Modify the
num_ctx
ininitialize_llm
to adjust the llm context window for better results. - Adjust
n_ideas
parameter to define the depth of accuracy and completeness you need in the answers.
- If you encounter memory issues, try reducing the
chunk_size
andnum_ctx
or the number of documents processed at once. - Ensure that Ollama is running and the correct model name is mentioned in
config.yml
file.
- Codebase Analysis: Understand and explore large code repositories by asking natural language questions.
- Security Auditing: Identify potential security vulnerabilities by querying specific endpoints or functions.
- Educational Tools: Help new developers understand codebases by providing detailed answers to their questions.
- Documentation Generation: Generate explanations or documentation for code segments. AND MORE..
-
Make the important parameters configurable using yaml file✅ - Drag and drop folder in UI for analysis
- Scan source folder and suggest file extension to be analyzed
- Make config.yml configurable in UI
- Session based chat to switch context on each new session
- Persistant chat UI interface upon page refresh
- Add only most recent response in history context
-
Implement tree-of-thoughts concept -
Create web interface✅ -
Integrate the repository import feature which imports repo locally automatically to perform analysis✅
- Use Semgrep to identify potential vulnerabilities based on patterns.
- Pass the identified snippets to a data flow analysis tool to determine if the input is user-controlled.
- Provide the LLM with the code snippet, data flow information, and any relevant AST representations.
- Ask the LLM to assess the risk based on this enriched context.
- Use the LLM's output to prioritize vulnerabilities, focusing on those where user input reaches dangerous functions.
- Optionally, perform dynamic analysis or manual code review on high-risk findings to confirm exploitability.
Contributions to Contexi are welcome! Please submit pull requests or open issues on the GitHub repository.