Skip to content

sudharsh/python-tika

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

python-tika - Python bindings for Apache Tika

Requirements

  • Java >= 1.5
  • JCC

Installation

$ python setup.py build
$ python setup.py install

Or,

$ pip install git+https://github.com/sudharsh/python-tika.git

Usage

To use the AutoDetectParser,

import tika
tika.initVM()

from tika import parser

print parser.from_buffer("<html><body>Hello World</body></html>
# Or directly from a file, 
# print parser.from_file("/tmp/foo.doc")

returns a dict,

{'content': u'Hello Cruel World',
 'metadata': {u'Content-Encoding': u'ISO-8859-1',
				  u'Content-Type': u'text/html',
				  u'title': u'Hello world'}
}

Thanks

setup.py script derived from aptivate/python-tika

About

Python bindings for Apache Tika

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages