Support for installation on Windows (#535)

* Support for installation on Windows
vmware-archive · Sep 4, 2019 · 9d0ede1 · 9d0ede1
1 parent 2969fdc
commit 9d0ede1
Show file tree

Hide file tree

Showing 29 changed files with 544 additions and 151 deletions.
diff --git a/README.md b/README.md
@@ -27,36 +27,68 @@ Documentation for the [internal APIs](docs/hillview-apis.pdf).
 
 # Installing Hillview on a local machine
 
-These instructions are for Ubuntu or MacOS machines.
+## Ubuntu or MacOS machines.
 
 * Install Java 8.  At this point newer versions of Java will *not* work.
 * clone this github repository
 * run the script `bin/install-dependencies.sh`
-* Download the Hillview release [tar
-file](https://github.com/vmware/hillview/releases/download/v0.6-alpha/hillview-bin.tar.gz).
+* Download the Hillview release [zip
+file](https://github.com/vmware/hillview/releases/download/v0.7-alpha/hillview-bin.zip).
   Save it in the top directory of Hillview.
-* Untar the release `tax xfvz hillview-bin.tar.gz`
+* Unzip the release `unzip hillview-bin.zip`
+
+## Windows machines
+
+* Download and install Java 8.
+* Choose a directory for installing Hillview
+* Enable execution of powershell scripts; this can be done, for example, by
+  running the following command in powershell as an administrator: `Set-ExecutionPolicy unrestricted`
+* Download and install the following script in the chosen directory `bin/install-hillview.ps1`
+* Run the installation script using Windows powershell
 
 # Running Hillview locally
 
+## Windows machines
+
+All Windows scripts are in the `bin` folder:
+
+```
+$: cd bin
+```
+
+* Start Hillview processes:
+
+```
+$: hillview-start.bat
+```
+
+* If needed give permissions to the application to communicate through the Windows firewall
+* To stop hillview:
+
+```
+$: hillview-stop.bat
+```
+
+## Ubuntu or MacOS machines
+
 All the following scripts are in the `bin` folder.
 
 ```
-$ cd bin
+$: cd bin
 ```
 
 * Start the back-end service which performs all the data processing:
 
 ```
-$ ./backend-start.sh &
+./backend-start.sh &
 ```
 
 * Start the web server
   (the default port of the web server is 8080; if you want to change it, change the setting
    in `apache-tomcat-9.0.4/conf/server.xml`).
 
 ```
-$ ./frontend-start.sh
+$: ./frontend-start.sh
 ```
 
 * start a web browser and open http://localhost:8080
@@ -74,8 +106,8 @@ machine.
   the Java SDK) download and prepare the sample data:
 
 ```
-$ ./rebuild.sh -a
-$ ./demo-data-cleaner.sh
+$: ./rebuild.sh -a
+$: ./demo-data-cleaner.sh
 ```
 
 # Deploying the Hillview service on a cluster
@@ -146,7 +178,7 @@ two sample files are `bin/config.json`and `bin/config-local.json`.
 
 ## Deployment scripts
 
-All deployment scripts are writte in Python, and are in the `bin` folder.
+All deployment scripts are written in Python, and are in the `bin` folder.
 
 ```
 $: cd bin
@@ -179,6 +211,11 @@ Query the status of the services:
 $: ./status config.json
 ```
 
+## Data management
+
+We provide some crude data management scripts and tools for clusters.
+They are described [here](bin/README.md).
+
 # Developing Hillview
 
 ## Software Dependencies
@@ -207,7 +244,7 @@ to>/jdk/jdk1.8.0_101). To set your JAVA_HOME environment variable, add
 the following to your ~/.bashrc or ~/.zshrc.
 
 ```
-$ export JAVA_HOME="<path-to-jdk-folder>"
+$: export JAVA_HOME="<path-to-jdk-folder>"
 ```
 
 (For MacOS you do not need to set up JAVA_HOME.)
@@ -219,14 +256,14 @@ for building and testing.
 
 On MacOS you first need to install [Homebrew](https://brew.sh/).  One way to do that is to run
 ```
-/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
+$: /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
 ```
 
 To install all other dependencies you can run:
 
 ```
-$ cd bin
-$ ./install-dependencies.sh
+$: cd bin
+$: ./install-dependencies.sh
 ```
 
 For old versions of Ubuntu this may fail, so you may have to install the required
@@ -249,8 +286,8 @@ platform/pom.xml.
 * Build the software:
 
 ```
-$ cd bin
-$ ./rebuild.sh -a
+$: cd bin
+$: ./rebuild.sh -a
 ```
 
 ### Build details
@@ -262,9 +299,9 @@ JAR file `platform/target/hillview-jar-with-dependencies.jar`.  This
 part can be built with:
 
 ```
-$ cd platform
-$ mvn clean install
-$ cd ..
+$: cd platform
+$: mvn clean install
+$: cd ..
 ```
 
 * web: the web server, web client and web services; this project links
@@ -273,9 +310,9 @@ to the result produced by the `platform` project.  This produces a WAR
 be built with:
 
 ```
-$ cd web
-$ mvn package
-$ cd ..
+$: cd web
+$: mvn package
+$: cd ..
 ```
 
 ## Contributing code
@@ -287,13 +324,13 @@ standard.
 ## Setup IntelliJ IDEA
 
 Download and install Intellij IDEA: https://www.jetbrains.com/idea/.
-You can just untar the linux binary in a place of your choice and run
+You can just untar the Linux binary in a place of your choice and run
 the shell script `ideaXXX/bin/idea.sh`.  The web projects uses
 capabilities only available in the paid version of Intellij IDEA.
 
 One solution is to load only the module that you want to contribute to: move to the
 corresponding folder: `cd platform` or `cd web` and start
-intellij there.
+IntelliJ there.
 
 Alternatively, if you have IntelliJ Ultimate you can create an empty project
 in the hillview folder, and then import three modules (from File/Project structure/Modules,

diff --git a/bin/README.md b/bin/README.md
@@ -1,26 +1,137 @@
-# This folder contains various scripts for managing Hillview clusters
+# This folder contains various scripts and configuration files for managing Hillview clusters
 
-## Shell scripts for building and testing
+## Linux/MacOS shell scripts for building and testing
 
-* lib.sh: a small library of useful shell functions used by other scripts
-* install-dependencies.sh: Install all dependencies needed to build Hillview
-* rebuild.sh: build the Hillview front-end and back-end
-* backend-start.sh: start the Hillview back-end service on the local machine
-* frontend-start.sh: start the Hillview front-end service on the local machine
-* demo-data-cleaner.sh: Downloads test data and preprocesses it
-* redeploy.sh: stop services, rebuild the software, deploy it, and restart the service
-* force-gc.sh: a simple shell script which tries to force a Java process to execute GC
+* `backend-start.sh`: start the Hillview back-end service on the local machine
+* `demo-data-cleaner.sh`: download a small test data and preprocesses it
+* `force-gc.sh`: asks a Java process to execute GC
+* `forever.sh`: runs another command in a loop forever
+* `frontend-start.sh`: start the Hillview front-end service on the local machine
+* `install-dependencies.sh`: install all dependencies needed to build Hillview
+* `lib.sh`: a small library of useful shell functions used by other scripts
+* `package-binaries.sh`: used to build an archive with executables and scripts which
+   is used for the code distribution
+* `rebuild.sh`: build the Hillview front-end and back-end
+* `redeploy.sh`: Performs four consecutive actions on a remote
+  Hillview installation: stops the services, rebuilds the software,
+  deploys it, and restarts the service
+* `upload-file.sh`: Given a csv file it will guess a schema for it and
+  upload it to a remote cluster chopped into small pieces.
+
+The following are templates that are used to generate actual shell scripts
+on a remoate cluster when Hillview is installed
+
+* `hillview-aggregator-manager-template.sh`: used to generate a file
+  called `hillview-aggregator-manager.sh` which can be used to start,
+  stop, query a Hillview aggregation service.  The generated file is
+  are installed on each aggregator machines.
+
+* `hillview-webserver-manager-template.sh`: used to generate a file
+  called `hillview-webserver-manager.sh` which can be used to start,
+  stop, query a Hillview web server.  The generated file is installed
+  on the remote Hillview web server machine.
+
+* `hillview-worker-manager-template.sh`: used to generate a file
+  called `hillview-worker-manager.sh` which can be used to start,
+  stop, query a Hillview worker.  The generated file is installed on
+  each remote worker machine.
+
+## Windows scripts
+
+* `install-hillview.ps1`: a PowerShell script used to download and
+  install Hillview on a Windows machine.
+* `detect-java.bat`: a Windows batch file which has a library that
+  detects where Java is installed
+* `hillview-start.bat`: a Windows batch file which starts Hillview on the local machine
+* `hillview-stop.bat`: a Windows batch file which stops Hillview on the local machine
 
 ## Python scripts for deploying Hillview on a cluster and managing data
 
-* hillviewCommon.py: common library used by other python programs
-* upload-data.py: upload a set of files to a folder on a set of machines in a
-                  round-robin fashion
-* download-data.py: downloads the files that match from all machines in the cluster
-* delete-data.py: delete a folder from all machines in a Hillview cluster
-* run-on-all.py: run a command on a set of remote machines
-* deploy.py: copy the binaries to all machines in a Hillview cluster
-* start.py: start the Hillview service on a remote cluster
-* stop.py: stop the Hillview service on a remote cluster
-* status.py: check the Hillview service on a remote cluster
-* hillviewConsoleLog.py: logging library used by other python programs
+* `delete-data.py`: delete a folder from all machines in a Hillview cluster
+* `deploy.py`: copy the Hillview binaries to all machines in a Hillview cluster
+* `download-data.py`: downloads the specified files from all machines in a cluster
+* `hillviewCommon.py`: common library used by other Python programs
+* `run-on-all.py`: run a command on all machines in a Hillview cluster
+* `start.py`: start the Hillview service on a remote cluster
+* `status.py`: check the Hillview service on a remote cluster
+* `stop.py`: stop the Hillview service on a remote cluster
+* `upload-data.py`: upload a set of files to all machines in a Hillview cluster in a
+   round-robin fashion
+
+## Configuration files
+
+* `config.json`: skeleton configuration file for a Hillview cluster
+* `config-local.json`: description of a Hillview cluster that consists
+  of just the local machine (used both as a web server and as a
+  worker)
+
+# Additional documentation
+
+## Managing a Hillview cluster
+
+* Copy the file `config.json` and modify it to describe your cluster.  Let's say you
+  saved into `myconfig.json`
+* To run Hillview on the local machine just use `config-local.json`
+* You can install Hillview on your cluster by running `deploy.py myconfig.json`
+* You can start the Hillview service on the cluster by running `start.py myconfig.json`
+* You can stop the Hillview service on the cluster by running `stop.py myconfig.json`
+* You can check the status of the Hillview service on the cluster by running `status.py myconfig.json`
+
+## Managing files on a Hillview cluster
+
+Several scripts can be used to manage data distributed as raw files on
+a Hillview cluster.  The convention is that a dataset is stored in one
+directory; the same directory is used on all machines, and each
+machine holds a fragment of the entire dataset.
+
+Let's say we have a very large file x.csv that we want to upload to a
+cluster; we will chop it into pieces and install the pieces in the
+directory "data/x" on each machine (below the hillview working
+directory).  This is done with:
+
+```
+$: ./upload-file.sh -c myconfig.json -d data/x -h -f x.csv -o
+```
+
+The various flags have the following significance:
+* `-c myconfig.json`: specifies cluster where data is uploaded
+* `-d data/x`: specifies directory where data is uploaded on each machine
+* `-h`: specifies the fact that the file `x.csv` has a header row
+* `-f x.csv`: specifies the input file
+* `-o`: specifies that the output should be saved as ORC files (a fast columnar format)
+
+After uploading the file in this way it can be loaded by selecting
+`Load / ORC files' and specifying:
+* File name pattern: data/x/x*.orc
+* Schema file: schema
+
+Alternatively, you can split the file locally and upload the pieces
+afterwards; the following splits the file into pieces in the `tmp`
+directory and then uploads these pieces to the cluster using the
+`upload-data.py` program:
+
+```
+$: ./upload-file.sh -d tmp -h -f x.csv -o
+$: ./upload-data.py -d data/x -s schema mycluster.json tmp/*.orc
+```
+
+To list the files on the cluster you can use the `run-on-all.py` script, e.g.:
+
+```
+$: ./run-on-all.py mycluster.json "ls -l data/x"
+```
+
+You can delete a directory from all machines of a cluster:
+
+```
+$: ./delete-data.py mycluster.json data/x
+```
+
+Finally, you can download back data you have uploaded to the cluster:
+
+```
+$: ./download-data.py mycluster.json data/x
+```
+
+When downloading this utility will create a folder for each machine in
+the cluster.
diff --git a/bin/delete-data.py b/bin/delete-data.py
@@ -5,8 +5,7 @@
 
 import os.path
 from argparse import ArgumentParser
-from hillviewCommon import ClusterConfiguration, get_config
-from hillviewConsoleLog import get_logger
+from hillviewCommon import ClusterConfiguration, get_config, get_logger
 
 logger = get_logger("delete-data")
 

diff --git a/bin/deploy.py b/bin/deploy.py
@@ -7,8 +7,7 @@
 from argparse import ArgumentParser
 import tempfile
 import os.path
-from hillviewCommon import ClusterConfiguration, get_config
-from hillviewConsoleLog import get_logger
+from hillviewCommon import ClusterConfiguration, get_config, get_logger
 
 logger = get_logger("deploy")