-
Notifications
You must be signed in to change notification settings - Fork 943
GetStarted_Standalone
We illustrate how to use TensorFlowOnSpark on a Spark Standalone cluster running on a single machine. While this is not a true distributed cluster, it is useful for small scale development and testing of distributed Spark applications. After your application is working in this environment, it should run in a true distributed Spark cluster with minimal changes. Note that a Spark Standalone cluster running on multiple machines requires a distributed file system that is accessible from all of the executors/workers.
Install Apache Spark per instructions. Make sure that you can successfully run some of the basic examples. Also make sure you set the following environment variables:
export SPARK_HOME=<path to Spark>
export PATH=${SPARK_HOME}/bin:${PATH}
Install TensorFlow per instructions. For example, using the pip install
method, you should be able to install TensorFlow and TensorFlowOnSpark as follows:
pip install tensorflow
pip install tensorflowonspark
View the installed packages:
pip list
export MASTER=spark://$(hostname):7077
export SPARK_WORKER_INSTANCES=2
export CORES_PER_WORKER=1
export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES}))
${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}
You can browse to the Spark Web UI to view your Spark cluster along with your application logs. In particular, each of the TensorFlow nodes in a TensorFlowOnSpark cluster will be "running" on a Spark executor/worker, so its logs will be available in the stderr
logs of its associated executor/worker.
Start a pyspark shell and import tensorflow and tensorflowonspark. If everything is setup correctly, you shouldn't see any errors.
pyspark --master $MASTER
>>> import tensorflow as tf
>>> import tensorflowonspark as tfos
>>> from tensorflowonspark import TFCluster
>>> tf.__version__
>>> tfos.__version__
>>> exit()
Once your Spark Standalone cluster is setup, you should now be able to run the MNIST examples. Note: if you are using TensorFlow 1.x, please use the examples from the v1.4.4 tag.
When you're done with the local Spark Standalone cluster, shut it down as follows:
${SPARK_HOME}/sbin/stop-slave.sh; ${SPARK_HOME}/sbin/stop-master.sh