Skip to content

Latest commit

 

History

History

stream-machine-learning

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Streaming and Machine Learning Example

The Dockerfile provided here can be used to build a Docker image which contains all the code necessary to follow the examples in the Machine Learning in SciDB post. The Docker image can be built locally using:

$ docker build --tag rvernica/scidb-examples:stream-machine-learning .

Alternatively, an already built Docker image can be downloaded from Docker Hub using:

$ docker pull rvernica/scidb-examples:stream-machine-learning

Once the Docker image is available a Docker container can be started using:

$ docker run --tty --name scidb-example \
      rvernica/scidb-examples:stream-machine-learning

At start-up, the container starts a SciDB cluster with two instances. The container contains all the code necessary to follow the examples in the Machine Learning in SciDB post. The container does not contain any of the necessary data. The data has to be downloaded separately from the Kaggle competition page and copied into the container:

$ docker exec scidb-example mkdir /kaggle
$ docker cp train.csv scidb-example:/kaggle/train.csv
$ docker cp test.csv scidb-example:/kaggle/test.csv

The code from the Machine Learning in SciDB post is available in /usr/local/src/stream-python/py_pkg/examples/4-machine-learning.py and can be run using:

$ docker exec scidb-example python \
      /usr/local/src/stream-python/py_pkg/examples/4-machine-learning.py
UserWarning: 2 type(s) promoted for null support. Precision loss may occur

The output file with predictions is available in the container in /results.csv while all the intermediate arrays are available in SciDB. For more details on how to use a Docker container please refer to the Docker Get Started tutorial.