Name		Name	Last commit message	Last commit date
parent directory ..
LICENSE		LICENSE
Parsing PDF's using Tesseract OCR.zip		Parsing PDF's using Tesseract OCR.zip
README.md		README.md

README.md

Parsing PDF Files Using Python: A Guide with Tesseract OCR

Parse text from PDF files using Python Functions! This includes how to convert PDF pages into images, preprocess those images to correct distortions (like skew), and extract text using OCR with Tesseract.

The packaged code repository uses several libraries, including cv2, pytesseract, and pdf2image, to extract and process text from PDF attachments

Upload Package to Your Enrollment

The first step is uploading your package to the Foundry Marketplace:

Download the project's .zip file from this repository
Access your enrollment's marketplace at:
```
{enrollment-url}/workspace/marketplace
```
In the marketplace interface, initiate the upload process:
- Select or create a store in your preferred project folder
- Click the "Upload to Store" button
- Select your downloaded .zip file

Install the Package

After upload, you'll need to install the package in your environment. For detailed instructions, see the official Palantir documentation.

The installation process has four main stages:

General Setup
- Configure package name
- Select installation location
Input Configuration
- Configure any required inputs. If no inputs are needed, proceed to next step
- Check project documentation for specific input requirements
Content Review
- Review resources to be installed such as Developer Console, the Ontology, and Functions
Validation
- System checks for any configuration errors
- Resolve any flagged issues
- Initiate installation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Functions for parsing PDF Files with Tesseract OCR

Python Functions for parsing PDF Files with Tesseract OCR

README.md

Parsing PDF Files Using Python: A Guide with Tesseract OCR

Upload Package to Your Enrollment

Install the Package

Files

Python Functions for parsing PDF Files with Tesseract OCR

Directory actions

More options

Directory actions

More options

Latest commit

History

Python Functions for parsing PDF Files with Tesseract OCR

Folders and files

parent directory

README.md

Parsing PDF Files Using Python: A Guide with Tesseract OCR

Upload Package to Your Enrollment

Install the Package