The Padhana framework is designed to enable you to work with PDF and other types of documents in a formal way. By combining a simple document format based on a node hierarchy with a set of parsers and document analysis tools, we parse and then structure/annotate document content to enable rich interactions.
Documentation can be found here: https://hohonu.github.io/padhana-docs/
Ensure you have Anaconda 3 or greater installed, then run:
conda env create -f conda.yml --force
Activate the padhana Conda environment with the command:
conda activate padhana
If you want to use the Tesseract Parser then you will need to install Tesseract