Skip to content

Commit

Permalink
Final reorganizing and adding some of our previous work
Browse files Browse the repository at this point in the history
Co-authored By: Daniel [email protected]
Co-authored By: Brandon [email protected]
Co-authored By: Jinanshi [email protected]
Co-authored By: Enrico [email protected]
  • Loading branch information
daniel Foley committed Dec 16, 2024
1 parent 4664331 commit 2a7ca6c
Show file tree
Hide file tree
Showing 15 changed files with 6,338 additions and 0 deletions.
2,124 changes: 2,124 additions & 0 deletions EDA/LibRAG_EDA.ipynb

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions EDA/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# EDA Notebook Details

The EDA notebook contains some preliminary analysis that we conducted on our data from Boston Public Library's database.

We analyzed metadata retreived from the Digital CommonWealth API as well as full-text data.

Some of our findings, such as how much of the image/video data has text annotations, will help in determing how we should conduct RAG in the coming phase.

## Running the Code
Please make sure your environment is the same as our to run the notebook using the following commands:

pip install -r requirements.txt

Also make sure to access the data, stored in the SCC in /projectnb/sparkgrp/ml-bpl-rag-data/

Full text can be found in /projectnb/sparkgrp/ml-bpl-rag-data/text
5 changes: 5 additions & 0 deletions EDA/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
numpy==1.26.4
seaborn==0.13.2
pandas==2.2.2
textblob==0.18.0.post0
matplotlib==3.9.2
1,689 changes: 1,689 additions & 0 deletions PoC/.ipynb_checkpoints/POC-checkpoint.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit 2a7ca6c

Please sign in to comment.