PyPLN

PyPLN is a distributed pipeline for natural language processing, made in Python. We use NLTK and Celery as our foundations. The goal of the project is to create an easy way to use NLTK for processing big corpora, with a Web interface.

PyPLN is sponsored by Fundação Getulio Vargas.

License

PyPLN is free software, released under the GPLv3 https://gnu.org/licenses/gpl-3.0.html.

Documentation

Our documentation is hosted using GitHub Pages:

PyPLN Documentation (created using Sphinx)
Code reference (created using epydoc)

Requirements

You will need some Python packages, libmagic,: poppler utils, libfreetype <http://www.freetype.org/>'s development headers and aspell dictionaries for english and portuguese.

To install dependencies (on a Debian-like GNU/Linux distribution):

sudo apt-get install python-setuptools libmagic-dev poppler-utils libfreetype6-dev fonts-dejavu aspell-en aspell-pt
pip install virtualenv virtualenvwrapper
mkvirtualenv pypln.backend
# we need to install Cython first because of the way pip handles C extensions
pip install Cython
pip install -r requirements/production.txt

You will also need to download some NLTK data packages. You can do so executing:

python -m nltk.downloader genesis maxent_treebank_pos_tagger punkt stopwords averaged_perceptron_tagger

Developing

To run tests:

workon pypln.backend
pip install -r requirements/development.txt
make test

See our code guidelines.

Creating a new Task

All analyses in PyPLN are performed by our workers. Every worker is a Celery task that can be included in the canvas that will run when a document is received in pypln.web.

New workers are very easy to create. All you need to do is write a subclass of PyPLNTask <https://github.com/NAMD/pypln.backend/blob/develop/pypln/backend/celery_task.py#L36> that implements a "process" method. This method will receive the document as a dictionary, and should return a dictionary that will be used to update the existing document. As an example:

from pypln.backend.celery_task import PyPLNTask

class FreqDist(PyPLNTask):
    def process(self, document):
        value = document['value']
        square = value ** 2
        return {'squared_value': square}

This worker assumes that a previous worker has already included "value" in the document and uses it to create a new one, called "squared_value".

Name		Name	Last commit message	Last commit date
Latest commit History 769 Commits
doc		doc
pypln		pypln
requirements		requirements
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.rst		CONTRIBUTING.rst
COPYING		COPYING
Makefile		Makefile
README.rst		README.rst
make-docs.sh		make-docs.sh
run_celery.sh		run_celery.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyPLN

License

Documentation

Requirements

Developing

Creating a new Task

About

Releases

Packages

Contributors 8

Languages

License

NAMD/pypln.backend

Folders and files

Latest commit

History

Repository files navigation

PyPLN

License

Documentation

Requirements

Developing

Creating a new Task

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages