Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completed Sheetseeker Assignment #3

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AmanKishore
Copy link

Finance Sheet Analysis with GPT-4-turbo

Summary

This PR leverages GPT-4-turbo to determine what information the user is querying for and returns a new xlsx sheet with the relevant data highlighted.
I created the SheetAnalyzer class to call GPT-4-turbo, which returns a list of financial terms relevant to a given query. These terms are then used to highlight corresponding values in the spreadsheet.

Query Processing with GPT-4-turbo

The get_relevant_rows function in utils.py is the cornerstone of our query processing. It constructs a conversation with the GPT-4-turbo model, providing it with a list of financial terms from the first column of the spreadsheet and the user's query. The conversation is formatted to prompt GPT-4-turbo to respond with a list of relevant financial terms.

  • System Message: Sets the context for GPT-4-turbo, describing its role as a financial assistant.
  • User Message: Contains the actual instruction, formatted using the INSTRUCTION template from constants.py, which includes the list of financial terms and the user's query.

The GPT-4-turbo model's response is expected to be a list of strings. If the response is not in the correct format, a retry mechanism is triggered, which attempts to correct the format using JSON mode.

Highlighting Relevant Rows

Once the relevant rows are identified, the SheetAnalyzer class's find_and_highlight method iterates over each sheet's DataFrame. It checks if the first column's value of each row is in the set of relevant rows returned by GPT-4-turbo. If a match is found, it highlights all numeric values in that row using the specified PatternFill.

Saving Highlighted Sheets

After highlighting, the workbook is saved in a new directory named highlighted_data. The file name is constructed by sanitizing the query string to ensure it is a valid file name.

Installation and Usage Guide for SheetAnalyzer

Prerequisites

Before proceeding, ensure that Python 3 is installed on your system by running the following command in your terminal:
python3 --version

Installation Steps

Step 1: Clone the Repository

If applicable, clone the repository to your local machine using the following commands:
git clone https://github.com/your-username/sheetseeker.git
cd sheetseeker

Step 2: Install Dependencies

Install the necessary Python packages listed in requirements.txt:
pip3 install -r requirements.txt

Step 3: Set OpenAI API Key

You need to set the OPENAI_API_KEY environment variable with your OpenAI API key. Replace your_api_key_here with your actual API key:
export OPENAI_API_KEY='your_api_key_here'

Running the Script

Once the setup is complete, you can run the solution.py script from the root directory of the project:
python3 src/solution.py

The script will process the predefined queries and highlight the relevant rows in the Excel spreadsheet. The highlighted spreadsheets will be saved in the highlighted_data directory within the project.

Conclusion

By integrating GPT-4-turbo's powerful natural language processing capabilities, SheetAnalyzer enables the user to highlight relevant information from financial documents for any query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant