Skip to content

Commit

Permalink
Merge pull request #14 from sachahu1/feat/docs
Browse files Browse the repository at this point in the history
feat: Improve installation instructions and update project info
  • Loading branch information
sachahu1 authored Nov 26, 2024
2 parents a56c840 + a47b410 commit 0b3a412
Show file tree
Hide file tree
Showing 9 changed files with 198 additions and 37 deletions.
33 changes: 29 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,42 @@

## Getting Started

To get started simply install the package:
### Installation

#### Installing the package (from PyPI)
You can install this package from PyPI using:
```shell
pip install Github-Search-Engine[cpu]
```
You can also choose to install extras:
```shell
pip install Github-Search-Engine[gpu,api]
```

#### Installing the package (from source)
You can install this package from source using:
```shell
pip install git+https://github.com/sachahu1/Github-Search-Engine.git
```
You can then start using the tool with the CLI:

#### Installing the package (Manual)
You can also install the package yourself by cloning the repo:
```shell
github_search_engine
git clone https://github.com/sachahu1/Github-Search-Engine.git
```

And installing the package with poetry:
```shell
poetry install
```

### Using as a CLI tool
You can use this package as a CLI tool, to do that, start by indexing your favourite GitHub repository:
You can use this package as a CLI tool, start with:
```shell
github_search_engine -h
```

Once you're more familiar with the CLI, you can index your favourite GitHub repository:
```shell
github_search_engine index <owner> <repository_name> --db_path=./local-store --github_access_token=<Your GitHub Personal Access Token>
```
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
# -- Project information -----------------------------------------------------

project = "Github-Search-Engine"
copyright = "{% now 'utc', '%Y' %}, "
copyright = "2024, Sacha Hu"
author = "Sacha hu"

# The full version, including alpha/beta/rc tags
Expand Down
83 changes: 68 additions & 15 deletions docs/source/custom/pre-modules/introduction.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,83 @@
# Introduction
# Github-Search-Engine

![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/sachahu1/Github-Search-Engine/run-tests.yaml?branch=main&label=Tests)

## Installation
### Installing Poetry
This tool uses poetry. If you already have poetry installed,
please skip to the next section. Otherwise, let's first setup poetry.
![GitHub Release](https://img.shields.io/github/v/release/sachahu1/Github-Search-Engine)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/github_search_engine)
![GitHub Repo stars](https://img.shields.io/github/stars/sachahu1/Github-Search-Engine)

To install poetry, simply run this command:
## Getting Started

### Installation

#### Installing the package (from PyPI)
You can install this package from PyPI using:
```shell
pip install Github-Search-Engine[cpu]
```
You can also choose to install extras:
```shell
curl -sSL https://install.python-poetry.org | python3 -
pip install Github-Search-Engine[gpu,api]
```
You can find out more about poetry installation [here](https://python-poetry.org/docs/master/#installation).

That's it, poetry is set up.
#### Installing the package (from source)
You can install this package from source using:
```shell
pip install git+https://github.com/sachahu1/Github-Search-Engine.git
```

### Installing the package
Thanks to poetry, installing this package is very simple and can be done in a single command. Simply run:
#### Installing the package (Manual)
You can also install the package yourself by cloning the repo:
```shell
git clone https://github.com/sachahu1/Github-Search-Engine.git
```

And installing the package with poetry:
```shell
poetry install
```
That's it, the package is installed. Move to the next section to learn how to use this package.

## Getting Started
< Add instructions on how to use project here >
### Using as a CLI tool
You can use this package as a CLI tool, start with:
```shell
github_search_engine -h
```

Once you're more familiar with the CLI, you can index your favourite GitHub repository:
```shell
github_search_engine index <owner> <repository_name> --db_path=./local-store --github_access_token=<Your GitHub Personal Access Token>
```
Then, search through any issue using:
```shell
github_search_engine search <owner> <repository_name> "<Your query>" --db_path=./local-store --github_access_token=<Your GitHub Personal Access Token>
```

### Launching an API server
You can use this package as an API. To do that, simply run:
```shell
github_search_engine api --github_access_token=<Your GitHub Personal Access Token>
```

## Building the documentation
To build the documentation you can simply use the docker image. To do so, simply run:
### Using Docker
To access the documentation locally, the easiest way is to use the docker image. To do so, simply run:
```shell
docker build . -f Dockerfile --target documentation -t github_search_engine-docs
docker run -p 80:80 -it github_search_engine-docs
```
Then navigate to [http://localhost](http://localhost)

### Manually
Alternatively you can build the documentation yourself.
First, make sure you have the dependencies installed:
```shell
poetry install --with=documentation
```
Then build the documentation:
```shell
poetry run sphinx-build -M html docs/source/ docs/build
```
Then open the documentation in your browser:
```shell
open docs/build/html/index.html
```
3 changes: 1 addition & 2 deletions github_search_engine/_api/api.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
import os
from contextlib import asynccontextmanager
from typing import Optional

from fastapi import FastAPI
from pydantic import BaseModel

from github_search_engine.github_search_engine import GithubSearchEngine


search_engine: Optional[GithubSearchEngine] = None
search_engine: GithubSearchEngine | None = None


class Repository(BaseModel):
Expand Down
12 changes: 11 additions & 1 deletion github_search_engine/cli.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,19 @@
"""The simplest way to use this tool is via the cli.
To get started, run the CLI tool with:
.. code-block:: bash
$ github-search-engine -h
"""

from cleo.application import Application

from github_search_engine import _cli as cli


def run():
def _run():
application = Application()
application.add(cli.ApiCommand())
application.add(cli.IndexCommand())
Expand Down
8 changes: 3 additions & 5 deletions github_search_engine/clients/github_client_manager.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
from typing import List

from githubkit import GitHub
from githubkit.versions.v2022_11_28.models import Issue
from githubkit.versions.v2022_11_28.models import IssueComment
Expand All @@ -21,7 +19,7 @@ def __init__(self, access_token: str):

async def get_repository_issues(
self, owner: str, repository_name: str
) -> List[Issue]:
) -> list[Issue]:
"""Retrieve issues from a specified GitHub repository.
This function connects to a given GitHub repository and retrieves a list of
Expand Down Expand Up @@ -51,7 +49,7 @@ async def get_repository_issues(

def get_issue_comments(
self, owner: str, repository_name: str, issue_number: int
) -> List[IssueComment]:
) -> list[IssueComment]:
"""Retrieve comments for a specific issue in a repository.
Args:
Expand All @@ -73,7 +71,7 @@ def get_issue_references(
owner: str,
repository_name: str,
issue_number: int,
) -> List[TimelineCrossReferencedEvent]:
) -> list[TimelineCrossReferencedEvent]:
"""Fetch mentions to a given issue.
Fetches the timeline events for a specific issue and filters out the
Expand Down
20 changes: 19 additions & 1 deletion github_search_engine/clients/ollama_client_manager.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from typing import Sequence

import ollama
from ollama import GenerateResponse
from ollama import Options
Expand All @@ -9,7 +11,15 @@ def __init__(self):
self._model = "llama3.1:8b"
self._context_size = 128_000

def embed(self, content: str):
def embed(self, content: str) -> Sequence[Sequence[float]]:
"""Returns the LLM's embedding for the given input text.
Args:
content: The input textual content to embed.
Returns:
The embedding vector for the corresponding segment of the input content.
"""
response = self.client.embed(
model=self._model,
input=content,
Expand All @@ -18,6 +28,14 @@ def embed(self, content: str):
return response.embeddings

def chat(self, prompt: str) -> str:
"""Generates a response to the input prompt.
Args:
prompt: The input text to generate a response for.
Returns:
The generated response.
"""
response: GenerateResponse = self.client.generate(
model=self._model,
prompt=prompt,
Expand Down
72 changes: 65 additions & 7 deletions github_search_engine/github_search_engine.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
import inspect
import logging
import sys
from typing import List
from typing import Optional

import chevron
import onnxruntime
Expand All @@ -22,9 +20,19 @@ class GithubSearchEngine:
def __init__(
self,
github_access_token: str,
qdrant_location: Optional[str] = None,
qdrant_path: Optional[str] = None,
qdrant_location: str | None = None,
qdrant_path: str | None = None,
):
"""A GithubSearchEngine to search GitHub repositories.
Initializes a client manager for GitHub, Qdrant, and Ollama services, setting up
logging and database model configuration.
Args:
github_access_token: The GitHub access token for accessing GitHub API.
qdrant_location: The location of the Qdrant server. Default is None.
qdrant_path: The path to the Qdrant database. Default is None.
"""
logging.basicConfig(level=logging.WARNING)

self._github_client = GithubClientManager(access_token=github_access_token)
Expand All @@ -41,6 +49,17 @@ def __init__(

@staticmethod
def summarise_issue(issue: Issue) -> str:
"""Construct a summary string from an Issue object.
Summarizes the given issue by combining its title and body. This function takes an Issue object and returns a formatted
string containing the issue's title and body, separated by two newline characters.
Args:
issue: The issue to be summarised. The issue must have 'title' and 'body' attributes.
Returns:
A formatted string containing the issue's title and body.
"""
issue_summary = f"""
{issue.title}
Expand All @@ -50,11 +69,26 @@ def summarise_issue(issue: Issue) -> str:

def summarise_results(
self,
results: List[QueryResponse],
results: list[QueryResponse],
owner: str,
repository_name: str,
query: str,
):
) -> str:
"""Summarizes the content and discussion of GitHub issues.
Summarizes the content and discussion of given GitHub issues, presenting how they relate to a
specified query. The summary is concise, devoid of headings or titles, and presented in a
single 2-3 sentence paragraph.
Args:
results: A list of QueryResponse objects containing GitHub issues to summarize.
owner: The owner of the GitHub repository.
repository_name: The name of the GitHub repository.
query: The original query to relate the issues to.
Returns:
A single string containing the summarized content and discussions of all provided GitHub issues.
"""
prompt_template = """
Please briefly summarise the content and discussion of the following github issues.
Keep it short, concise and to the point and explain how it relates to '{{originalQuery}}'
Expand Down Expand Up @@ -100,6 +134,15 @@ def summarise_results(
return final_summary

async def index_repository(self, owner: str, repository_name: str):
"""Index a GitHub repository.
Retrieves all issues from the specified repository and index them into a
vector database for further processing or querying.
Args:
owner: The owner of the GitHub repository.
repository_name: The name of the GitHub repository.
"""
logging.info(f"Fetching Issues from {owner}/{repository_name}")
issues = await self._github_client.get_repository_issues(
owner, repository_name
Expand All @@ -115,7 +158,22 @@ async def index_repository(self, owner: str, repository_name: str):

def search(
self, owner: str, repository_name: str, text: str
) -> List[QueryResponse]:
) -> list[QueryResponse]:
"""Searches for issues in the specified repository that match the given text.
This method searches the database for issues within the given repository that
match the specified text query. If the repository's collection does not exist
in the database, an error is logged and the program exits. The search results
are filtered to exclude issues with empty bodies.
Args:
owner: The owner of the repository.
repository_name: The name of the repository.
text: A natural language query to search for within the repository's issues.
Returns:
A list of query responses that match the search criteria.
"""
if not self._database_client.collection_exists(
collection_name=f"{owner}/{repository_name}",
):
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ testpaths = ["tests"]
source = ["github_search_engine"]

[tool.poetry.scripts]
github_search_engine = "github_search_engine.cli:run"
github_search_engine = "github_search_engine.cli:_run"

[tool.ruff]
line-length = 79
Expand Down

0 comments on commit 0b3a412

Please sign in to comment.