docs

History

Name		Name	Last commit message	Last commit date
parent directory ..
.gitbook/assets		.gitbook/assets
_static/img		_static/img
contribution		contribution
creating-pipeable-functions/creating-pipeable-functions		creating-pipeable-functions/creating-pipeable-functions
examples		examples
README.md		README.md
SUMMARY.md		SUMMARY.md

README.md

dataframehq/dato

dato is an open source library that provides a rapid, declarative ecosystem for reproducible data science within python. dato accomplishes this by (1) enabling piping with >> and (2) unifying common data science libraries under a common syntax.

df >> GroupBy('country') >> Sum >> Hist('revenue', col='age')

Dato has four major components:

dato.base.Pipeable Decorator that enables piping with >>.
dato.process Sub-module with pipe-compatible pandas operations.
dato.plot Sub-module with pipe-compatible plotting operations, following a consistent pandas-inspired syntax with seaborn-esque extended functionality.
dato.ml(in development) Simplifies and standardizes syntax across popular ML libraries.

Installation

pip install dato

Why pipe?

Although piping has some downside as a general programming paradigm (particularly in obscuring code errors and being naturally difficult to debug), we argue that these downsides are outweighed by a level of concision and maintainability it lends to data workflows. When working with data in development environments which contain hidden states (such as jupyter or R markdown), reproducibility of code can be difficult to consistently achieve. Piping mitigates this danger by (1) enforcing a consistent order of operations, and (2) disallowing hidden states. Consequently, the piping paradigm is naturally reproducible, production-ready, and stable as soon as it is written -- properties that are of paramount importance in data work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

docs

docs

README.md

dataframehq/dato

Installation

Why pipe?

Files

docs

Directory actions

More options

Directory actions

More options

Latest commit

History

docs

Folders and files

parent directory

README.md

dataframehq/dato

Installation

Why pipe?