Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Finance Sheet Analysis with GPT-4-turbo
Summary
This PR leverages GPT-4-turbo to determine what information the user is querying for and returns a new
xlsx
sheet with the relevant data highlighted.I created the
SheetAnalyzer
class to call GPT-4-turbo, which returns a list of financial terms relevant to a given query. These terms are then used to highlight corresponding values in the spreadsheet.Query Processing with GPT-4-turbo
The
get_relevant_rows
function inutils.py
is the cornerstone of our query processing. It constructs a conversation with the GPT-4-turbo model, providing it with a list of financial terms from the first column of the spreadsheet and the user's query. The conversation is formatted to prompt GPT-4-turbo to respond with a list of relevant financial terms.INSTRUCTION
template fromconstants.py
, which includes the list of financial terms and the user's query.The GPT-4-turbo model's response is expected to be a list of strings. If the response is not in the correct format, a retry mechanism is triggered, which attempts to correct the format using JSON mode.
Highlighting Relevant Rows
Once the relevant rows are identified, the
SheetAnalyzer
class'sfind_and_highlight
method iterates over each sheet's DataFrame. It checks if the first column's value of each row is in the set of relevant rows returned by GPT-4-turbo. If a match is found, it highlights all numeric values in that row using the specifiedPatternFill
.Saving Highlighted Sheets
After highlighting, the workbook is saved in a new directory named
highlighted_data
. The file name is constructed by sanitizing the query string to ensure it is a valid file name.Installation and Usage Guide for SheetAnalyzer
Prerequisites
Before proceeding, ensure that Python 3 is installed on your system by running the following command in your terminal:
python3 --version
Installation Steps
Step 1: Clone the Repository
If applicable, clone the repository to your local machine using the following commands:
git clone https://github.com/your-username/sheetseeker.git
cd sheetseeker
Step 2: Install Dependencies
Install the necessary Python packages listed in
requirements.txt
:pip3 install -r requirements.txt
Step 3: Set OpenAI API Key
You need to set the
OPENAI_API_KEY
environment variable with your OpenAI API key. Replaceyour_api_key_here
with your actual API key:export OPENAI_API_KEY='your_api_key_here'
Running the Script
Once the setup is complete, you can run the
solution.py
script from the root directory of the project:python3 src/solution.py
The script will process the predefined queries and highlight the relevant rows in the Excel spreadsheet. The highlighted spreadsheets will be saved in the
highlighted_data
directory within the project.Conclusion
By integrating GPT-4-turbo's powerful natural language processing capabilities,
SheetAnalyzer
enables the user to highlight relevant information from financial documents for any query.