Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
It's recommended for limited embedded systems and critical applications where performance matters most.
Algorithm | Programming language | |||||
Classifier | Java * | JS | C | Go | PHP | Ruby |
svm.SVC | ✓, ✓ ᴵ | ✓ | ✓ | ✓ | ✓ | |
svm.NuSVC | ✓, ✓ ᴵ | ✓ | ✓ | ✓ | ✓ | |
svm.LinearSVC | ✓, ✓ ᴵ | ✓ | ✓ | ✓ | ✓ | ✓ |
tree.DecisionTreeClassifier | ✓, ✓ ᴱ, ✓ ᴵ | ✓, ✓ ᴱ | ✓, ✓ ᴱ | ✓, ✓ ᴱ | ✓, ✓ ᴱ | ✓, ✓ ᴱ |
ensemble.RandomForestClassifier | ✓ ᴱ, ✓ ᴵ | ✓ ᴱ | ✓ ᴱ | ✓ ᴱ | ✓ ᴱ | ✓ ᴱ |
ensemble.ExtraTreesClassifier | ✓ ᴱ, ✓ ᴵ | ✓ ᴱ | ✓ ᴱ | ✓ ᴱ | ✓ ᴱ | |
ensemble.AdaBoostClassifier | ✓ ᴱ, ✓ ᴵ | ✓ ᴱ, ✓ ᴵ | ✓ ᴱ | |||
neighbors.KNeighborsClassifier | ✓, ✓ ᴵ | ✓, ✓ ᴵ | ||||
naive_bayes.GaussianNB | ✓, ✓ ᴵ | ✓ | ||||
naive_bayes.BernoulliNB | ✓, ✓ ᴵ | ✓ | ||||
neural_network.MLPClassifier | ✓, ✓ ᴵ | ✓, ✓ ᴵ | ||||
Regressor | ||||||
neural_network.MLPRegressor | ✓ |
✓ = is full-featured, ᴱ = with embedded model data, ᴵ = with imported model data, * = default language
$ pip install sklearn-porter
If you want the latest changes, you can install the module from the master branch:
$ pip uninstall -y sklearn-porter
$ pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master
The minimum requirements to use the module are defined in the requirements.txt:
- numpy>=1.8.2
- scipy>=0.14.0
- scikit-learn>=0.14.1
The following example demonstrates how you can transpile a decision tree estimator to Java:
from sklearn.datasets import load_iris
from sklearn.tree import tree
from sklearn_porter import Porter
# load data and train the classifier:
samples = load_iris()
X, y = samples.data, samples.target
clf = tree.DecisionTreeClassifier()
clf.fit(X, y)
# export:
porter = Porter(clf, language='java')
output = porter.export(embed_data=True)
print(output)
The exported result matches the official human-readable version of the decision tree.
Run the prediction(s) in the target programming language directly:
# ...
porter = Porter(clf, language='java')
# prediction(s):
Y_java = porter.predict(X)
y_java = porter.predict(X[0])
y_java = porter.predict([1., 2., 3., 4.])
Always compute and check the integrity between the original and the transpiled estimator:
# ...
porter = Porter(clf, language='java')
# accuracy:
integrity = porter.integrity_score(X)
print(integrity) # 1.0
Please note that the integrity check isn't supported on Windows operation systems.
First of all have a quick view on the available arguments:
$ python -m sklearn_porter [-h] --input <PICKLE_FILE> [--output <DEST_DIR>] \
[--class_name <CLASS_NAME>] [--method_name <METHOD_NAME>] \
[--c] [--java] [--js] [--go] [--php] [--ruby] \
[--export] [--checksum] [--data] [--pipe]
The following example shows how you can save an trained estimator to the pickle format:
# ...
# extract estimator:
joblib.dump(clf, 'estimator.pkl', compress=0)
After that the estimator can be transpiled to JavaScript by using the following command:
$ python -m sklearn_porter -i estimator.pkl --js
The target programming language is changeable on the fly:
$ python -m sklearn_porter -i estimator.pkl --c
$ python -m sklearn_porter -i estimator.pkl --java
$ python -m sklearn_porter -i estimator.pkl --php
$ python -m sklearn_porter -i estimator.pkl --java
$ python -m sklearn_porter -i estimator.pkl --ruby
For further processing the argument --pipe
can be used to pass the result:
$ python -m sklearn_porter -i estimator.pkl --js --pipe > estimator.js
For instance the result can be minified by using UglifyJS:
$ python -m sklearn_porter -i estimator.pkl --js --pipe | uglifyjs --compress -o estimator.min.js
Further information will be shown by using the --help
argument:
$ python -m sklearn_porter --help
$ python -m sklearn_porter -h
Tip: You can install a handy function to use the porter directly:
$ cat scripts/alias.sh >> ~/.bash_profile && source ~/.bash_profile
$ porter [-h] --input <PICKLE_FILE> [--output <DEST_DIR>] \
[--class_name <CLASS_NAME>] [--method_name <METHOD_NAME>] \
[--c] [--java] [--js] [--go] [--php] [--ruby] \
[--export] [--checksum] [--data] [--pipe]
But don't forget to activate the right environment where the porter has been installed.
Either you install just the minimum requirements (see requirements.txt) for testing:
$ conda create -n sklearn-porter python=2 # or python=3
$ source activate sklearn-porter
$ pip install -U pip
$ pip install -r requirements.txt
Or you install all recommended packages (see environment.yml) for broader development:
$ conda env create -n sklearn-porter -c conda-forge python=2 -f environment.yml # for macOS users
$ # conda create -n sklearn-porter -c conda-forge python=2 scikit-learn pylint jupyter nb_conda twine
$ source activate sklearn-porter
Independently, the following compilers and intepreters are required to cover all tests:
Name | Version | Command |
GCC | >=4.2 |
gcc --version |
Java | >=1.6 |
java -version |
PHP | >=7 |
php --version |
Ruby | >=2.4.1 |
ruby --version |
Go | >=1.7.4 |
go version |
Node.js | >=6 |
node --version |
The tests cover module functions as well as matching predictions of transpiled estimators. Run all tests:
$ bash scripts/test.sh
#!/usr/bin/env bash
# activate the relevant environment:
source activate sklearn-porter
# start local server which is required for the JavaScript tests:
if [[ $(python -c "import sys; print(sys.version_info[:1][0]);") == "2" ]]; then
python -m SimpleHTTPServer 8080 &>/dev/null & serve_pid=$!
else
python -m http.server 8080 &>/dev/null & serve_pid=$!
fi
# run all tests:
python -m unittest discover -vp '*Test.py'
# close the previous started server:
kill $serve_pid
# deactivate the previous activated environment:
source deactivate &>/dev/null
The test files have a specific pattern: '[Algorithm][Language]Test.py'
:
$ python -m unittest discover -vp 'RandomForest*Test.py'
$ python -m unittest discover -vp '*JavaTest.py'
While you are developing new features or fixes, you can reduce the test duration by changing the number of tests:
$ N_RANDOM_FEATURE_SETS=15 N_EXISTING_FEATURE_SETS=30 python -m unittest discover -vp '*Test.py'
It's highly recommended to ensure the code quality. For that I use Pylint. Run the linter:
$ bash scripts/lint.sh
#!/usr/bin/env bash
find sklearn_porter -name '*.py' -exec pylint {} \;
If you use this implementation in you work, please add a reference/citation to the paper. You can use the following BibTeX entry:
@unpublished{skpodamo,
author = {Darius Morawiec},
title = {sklearn-porter},
note = {Transpile trained scikit-learn estimators to C, Java, JavaScript and others},
url = {https://github.com/nok/sklearn-porter}
}
The module is Open Source Software released under the MIT license.
Don't be shy and feel free to contact me on Twitter or Gitter.