About

This script extracts extracts annotations (highlights, comments, etc.) from a PDF file, and formats them as plain text.

The scripts uses colormath to identify the highlights' colors, see the wiki. The default template uses these colors to determine hierarchy and meaning.

At present, the following annotations are supported:

Highlights without an attached comment are output first, as "highlights" with just the highlighted text included.
Highlights with an attached comment, and text annotations (not attached to any particular text/highlight) are output next, as "detailed comments".
Underline, strikeout, and squiggly underline annotations are output last, as "Nits", with or without an attached comment. The intention of this is to easily separate formatting or grammatical corrections from more substantial comments about the content of the document.

For each annotation, the page number is given, along with the associated (highlighted/underlined) text, if any. Additionally, if the documents includes outlines (aka bookmarks) such as those generated by the hyperref package, those are also used to identify to which section in the document the annotation refers.

See the wiki for more information.

Installation

 pip install pdfminer.six chardet six colormath Jinja2 pathlib
 python setup.py install

Usage

pdf-highlights.py FILE.PDF [> OUTPUT]

Dependencies

My own setup:

Python 3.6
chardet (3.0.4)
colormath (3.0.0)
Jinja2 (2.10)
pathlib (1.0.1)
pdfminer.six (20170720)
six (1.11.0)

Output formatting

There's a Jinja2 template you can adopt as you like. The script exposes the following data to the template:

highlights annotations
comments annotations
editing annotations
Author
Title

See the wiki for more information.

Author

Original author is Andrew Baumann. Thank you, Andrew!
This fork is maintained by Sascha A. Carlin.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
pdf_highlights		pdf_highlights
scripts		scripts
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Installation

Usage

Dependencies

Output formatting

Author

About

Releases

Packages

Languages

License

itst/pdf-highlights

Folders and files

Latest commit

History

Repository files navigation

About

Installation

Usage

Dependencies

Output formatting

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages