This repository makes it easy to run distributed, multiple server (multiple Zookeeper, multiple Storm Supervisor) Storm topologies within Docker.
Read our storm-docker blog post on the Viki Engineering blog
More documentation on Github pages: http://dev.viki.com/storm-docker
If we do a search of the word storm
on the
Docker Registry, we see several pages of
results. What makes our storm-docker special compared to similar offerings?
- Supports multiple server Storm setups, in particular Zookeeper and Storm Supervisor.
- High level of configurability through documented sample configuration files
- Same configuration files used for all machines in the Storm cluster
- Instructions on this
README
showing you how to setup the repository - Important parts of codebase is extensively documented to aid hacking. No need to guess what the original author is thinking.
- Github pages with more documentation; a slight redesign and walkthrough is in the works
- Tested and used in production.
Point 1 is probably the single biggest reason why you should use our storm-docker, even if you're running everything on a single machine initially (which storm-docker does support, btw). When you need to add new machines to your Storm cluster, you'll find the transition smooth and leverage on the work we did in figuring out how to run multiple Zookeeper and Storm Supervisor in Docker.
At this point in time (23 June 2014), our storm-docker repository is probably the first and only open source Docker setup that supports distributed, multi-server Storm clusters out of the box.
The storm-docker project is tested using Amazon EC2 instances; we make use of
the curl
command to obtain the public and private IP addresses of the
machines. This may be a problem for machines which are not Amazon EC2 instances
(even though we have not faced any similar issues at Viki), as seen by this
Github issue:
As such, please use one of the following 2 setups for the machines in your Storm cluster if you're using the storm-docker project:
- None of the machines in your Storm cluster are Amazon EC2 instances
- ALL the machines in your Storm cluster are Amazon EC2 instances in the same security group
Refer to the documentation for the all_machines_are_ec2_instances
key in the
config/storm-setup.yaml.sample
file for more information.
The following software is required for running the storm-docker
repository.
In other words, for machines which are going to form your Storm cluster:
- GNU Make
- Docker
- python 2.7.x
- virtualenv
NOTE: The steps here must be carried out for all machines in your Storm cluster.
We will be making use of virtualenv for some of the Python scripts in this repository. We also make use of the PyYAML library, and that requires some Python header files.
On a Ubuntu-like system:
sudo apt-get install python-virtualenv
sudo apt-get install python-dev
Use the following command from the top of the script at http://get.docker.io to install Docker
wget -qO- https://get.docker.io/ | bash
Verify your Docker installation:
docker info
Clone this repository, preferrably to $HOME/workspace/storm-docker
.
At $HOME/workspace
:
git clone https://github.com/viki-org/storm-docker.git
The next few commands will be run from the storm-docker
repository. Let us
go there:
cd storm-docker
NOTE: The steps here must be carried out for all machines of your Storm cluster unless otherwise stated.
NOTE: This step is critical to the correct functioning of the Storm topology.
NOTE: storm-docker assumes that all machines in the Storm cluster make use of the same configuration files. As such, you can perform this step of editing the configuration files once (on any machine) and copy the files to all the machines of your Storm cluster.
Copy the sample configuration files to concrete configuration files (the
copy-sample-config.sh
script does not overwrite any existing concrete
configuration files):
./copy-sample-config.sh
Carry on by editing the following concrete configuration files:
config/storm-setup.yaml
config/cluster.xml
config/zoo.cfg
Documentation is available in the copied concrete configuration files, except
the config/cluster.xml
file used for logback configuration.
A default set of configuration has been set; for more fine-grained
configuration, we highly recommend reading
The logback manual.
Once done with your edits, we can continue with building the Docker images.
NOTE: This step is necessary after making changes to any of the configuration files in the above subsection.
Run the GNU make
command. The default goal builds the Docker images:
make
If this is the first time the Docker images are being built, this script will take some time to complete.
Before actually running the various Docker containers, you might want to verify
that all servers used in various sections of the config/storm-setup.yaml
file
are listed under the servers
dictionary (located in the same file) by running
the scripts/verify_storm_setup_yaml.py
script:
. venv/bin/activate
python scripts/verify_storm_setup_yaml.py
Warnings will be printed to stderr should some servers be missing from the
servers
dictionary.
As at 26 December 2014, it is highly recommended to run the scripts/remote.py
file to automatically spin up the various Docker containers on your Storm
cluster (instead of manually running the start-storm.sh
script on each
server), like this:
. venv/bin/activate
python scripts/remote.py --all
Fabric is used to run the start-storm.sh
script on the
various servers and spin up the correct Docker containers, depending on your
configuration in the config/storm-setup.yaml
file.
To stop all running Docker containers:
./destroy-storm.sh
To stop individual containers, supply them as arguments to the
destroy-storm.sh
script, for instance to stop the ui
and zookeeper
containers:
./destroy-storm.sh ui zookeeper
This project was started to address the need to increase the scalability and fault tolerance of Viki's Storm cluster. A lot of effort was spent figuring out how to run multiple Storm Supervisor and Zookeeper in Docker.
This repository should be viewed more as a foundation on which you can build on for running your Storm cluster in Docker, rather than as a defacto standard for running Storm in Docker.
To better aid someone new to the codebase to understanding and subsequently modifying the code, much of the core Python code contains rather extensive inline documentation.
This repository was originally based on wurstmeister/storm-docker; big thanks to wurstmeister for making his project open source.