Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Commit

Permalink
handle bytestring
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilippeMoussalli committed Jan 3, 2024
1 parent d45de6c commit fc9c95f
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 3 deletions.
2 changes: 1 addition & 1 deletion llama_hub/file/pdf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This loader extracts the text from a local PDF file using the `PyPDF2` Python pa

## Usage

To use this loader, you need to pass in a `Path` to a local file.
To use this loader, you need to pass in a `Path` to a local file or a PDF byte stream.

```python
from pathlib import Path
Expand Down
11 changes: 9 additions & 2 deletions llama_hub/file/pdf/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,14 @@ def load_data(
file = Path(file)

# Open the file if it's not already open, else use it as it is
context = open(file, "rb") if isinstance(file, Path) else file
if isinstance(file, Path):
context = open(file, "rb")
if extra_info:
extra_info.update({"file_name": file.name})
else:
extra_info = {"file_name": file.name}
else:
context = file

with context as fp:
# Create a PDF object
Expand All @@ -36,7 +43,7 @@ def load_data(
# Extract the text from the page
page_text = pdf.pages[page].extract_text()
page_label = pdf.page_labels[page]
metadata = {"page_label": page_label, "file_name": file.name}
metadata = {"page_label": page_label}

if extra_info is not None:
metadata.update(extra_info)
Expand Down

0 comments on commit fc9c95f

Please sign in to comment.