Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Commit

Permalink
Support for installation on Windows (#535)
Browse files Browse the repository at this point in the history
* Support for installation on Windows
  • Loading branch information
Mihai Budiu authored Sep 4, 2019
1 parent 2969fdc commit 9d0ede1
Show file tree
Hide file tree
Showing 29 changed files with 544 additions and 151 deletions.
85 changes: 61 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,36 +27,68 @@ Documentation for the [internal APIs](docs/hillview-apis.pdf).

# Installing Hillview on a local machine

These instructions are for Ubuntu or MacOS machines.
## Ubuntu or MacOS machines.

* Install Java 8. At this point newer versions of Java will *not* work.
* clone this github repository
* run the script `bin/install-dependencies.sh`
* Download the Hillview release [tar
file](https://github.com/vmware/hillview/releases/download/v0.6-alpha/hillview-bin.tar.gz).
* Download the Hillview release [zip
file](https://github.com/vmware/hillview/releases/download/v0.7-alpha/hillview-bin.zip).
Save it in the top directory of Hillview.
* Untar the release `tax xfvz hillview-bin.tar.gz`
* Unzip the release `unzip hillview-bin.zip`

## Windows machines

* Download and install Java 8.
* Choose a directory for installing Hillview
* Enable execution of powershell scripts; this can be done, for example, by
running the following command in powershell as an administrator: `Set-ExecutionPolicy unrestricted`
* Download and install the following script in the chosen directory `bin/install-hillview.ps1`
* Run the installation script using Windows powershell

# Running Hillview locally

## Windows machines

All Windows scripts are in the `bin` folder:

```
$: cd bin
```

* Start Hillview processes:

```
$: hillview-start.bat
```

* If needed give permissions to the application to communicate through the Windows firewall
* To stop hillview:

```
$: hillview-stop.bat
```

## Ubuntu or MacOS machines

All the following scripts are in the `bin` folder.

```
$ cd bin
$: cd bin
```

* Start the back-end service which performs all the data processing:

```
$ ./backend-start.sh &
./backend-start.sh &
```

* Start the web server
(the default port of the web server is 8080; if you want to change it, change the setting
in `apache-tomcat-9.0.4/conf/server.xml`).

```
$ ./frontend-start.sh
$: ./frontend-start.sh
```

* start a web browser and open http://localhost:8080
Expand All @@ -74,8 +106,8 @@ machine.
the Java SDK) download and prepare the sample data:

```
$ ./rebuild.sh -a
$ ./demo-data-cleaner.sh
$: ./rebuild.sh -a
$: ./demo-data-cleaner.sh
```

# Deploying the Hillview service on a cluster
Expand Down Expand Up @@ -146,7 +178,7 @@ two sample files are `bin/config.json`and `bin/config-local.json`.

## Deployment scripts

All deployment scripts are writte in Python, and are in the `bin` folder.
All deployment scripts are written in Python, and are in the `bin` folder.

```
$: cd bin
Expand Down Expand Up @@ -179,6 +211,11 @@ Query the status of the services:
$: ./status config.json
```

## Data management

We provide some crude data management scripts and tools for clusters.
They are described [here](bin/README.md).

# Developing Hillview

## Software Dependencies
Expand Down Expand Up @@ -207,7 +244,7 @@ to>/jdk/jdk1.8.0_101). To set your JAVA_HOME environment variable, add
the following to your ~/.bashrc or ~/.zshrc.

```
$ export JAVA_HOME="<path-to-jdk-folder>"
$: export JAVA_HOME="<path-to-jdk-folder>"
```

(For MacOS you do not need to set up JAVA_HOME.)
Expand All @@ -219,14 +256,14 @@ for building and testing.

On MacOS you first need to install [Homebrew](https://brew.sh/). One way to do that is to run
```
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
$: /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
```

To install all other dependencies you can run:

```
$ cd bin
$ ./install-dependencies.sh
$: cd bin
$: ./install-dependencies.sh
```

For old versions of Ubuntu this may fail, so you may have to install the required
Expand All @@ -249,8 +286,8 @@ platform/pom.xml.
* Build the software:

```
$ cd bin
$ ./rebuild.sh -a
$: cd bin
$: ./rebuild.sh -a
```

### Build details
Expand All @@ -262,9 +299,9 @@ JAR file `platform/target/hillview-jar-with-dependencies.jar`. This
part can be built with:

```
$ cd platform
$ mvn clean install
$ cd ..
$: cd platform
$: mvn clean install
$: cd ..
```

* web: the web server, web client and web services; this project links
Expand All @@ -273,9 +310,9 @@ to the result produced by the `platform` project. This produces a WAR
be built with:

```
$ cd web
$ mvn package
$ cd ..
$: cd web
$: mvn package
$: cd ..
```

## Contributing code
Expand All @@ -287,13 +324,13 @@ standard.
## Setup IntelliJ IDEA

Download and install Intellij IDEA: https://www.jetbrains.com/idea/.
You can just untar the linux binary in a place of your choice and run
You can just untar the Linux binary in a place of your choice and run
the shell script `ideaXXX/bin/idea.sh`. The web projects uses
capabilities only available in the paid version of Intellij IDEA.

One solution is to load only the module that you want to contribute to: move to the
corresponding folder: `cd platform` or `cd web` and start
intellij there.
IntelliJ there.

Alternatively, if you have IntelliJ Ultimate you can create an empty project
in the hillview folder, and then import three modules (from File/Project structure/Modules,
Expand Down
153 changes: 132 additions & 21 deletions bin/README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,137 @@
# This folder contains various scripts for managing Hillview clusters
# This folder contains various scripts and configuration files for managing Hillview clusters

## Shell scripts for building and testing
## Linux/MacOS shell scripts for building and testing

* lib.sh: a small library of useful shell functions used by other scripts
* install-dependencies.sh: Install all dependencies needed to build Hillview
* rebuild.sh: build the Hillview front-end and back-end
* backend-start.sh: start the Hillview back-end service on the local machine
* frontend-start.sh: start the Hillview front-end service on the local machine
* demo-data-cleaner.sh: Downloads test data and preprocesses it
* redeploy.sh: stop services, rebuild the software, deploy it, and restart the service
* force-gc.sh: a simple shell script which tries to force a Java process to execute GC
* `backend-start.sh`: start the Hillview back-end service on the local machine
* `demo-data-cleaner.sh`: download a small test data and preprocesses it
* `force-gc.sh`: asks a Java process to execute GC
* `forever.sh`: runs another command in a loop forever
* `frontend-start.sh`: start the Hillview front-end service on the local machine
* `install-dependencies.sh`: install all dependencies needed to build Hillview
* `lib.sh`: a small library of useful shell functions used by other scripts
* `package-binaries.sh`: used to build an archive with executables and scripts which
is used for the code distribution
* `rebuild.sh`: build the Hillview front-end and back-end
* `redeploy.sh`: Performs four consecutive actions on a remote
Hillview installation: stops the services, rebuilds the software,
deploys it, and restarts the service
* `upload-file.sh`: Given a csv file it will guess a schema for it and
upload it to a remote cluster chopped into small pieces.

The following are templates that are used to generate actual shell scripts
on a remoate cluster when Hillview is installed

* `hillview-aggregator-manager-template.sh`: used to generate a file
called `hillview-aggregator-manager.sh` which can be used to start,
stop, query a Hillview aggregation service. The generated file is
are installed on each aggregator machines.

* `hillview-webserver-manager-template.sh`: used to generate a file
called `hillview-webserver-manager.sh` which can be used to start,
stop, query a Hillview web server. The generated file is installed
on the remote Hillview web server machine.

* `hillview-worker-manager-template.sh`: used to generate a file
called `hillview-worker-manager.sh` which can be used to start,
stop, query a Hillview worker. The generated file is installed on
each remote worker machine.

## Windows scripts

* `install-hillview.ps1`: a PowerShell script used to download and
install Hillview on a Windows machine.
* `detect-java.bat`: a Windows batch file which has a library that
detects where Java is installed
* `hillview-start.bat`: a Windows batch file which starts Hillview on the local machine
* `hillview-stop.bat`: a Windows batch file which stops Hillview on the local machine

## Python scripts for deploying Hillview on a cluster and managing data

* hillviewCommon.py: common library used by other python programs
* upload-data.py: upload a set of files to a folder on a set of machines in a
round-robin fashion
* download-data.py: downloads the files that match from all machines in the cluster
* delete-data.py: delete a folder from all machines in a Hillview cluster
* run-on-all.py: run a command on a set of remote machines
* deploy.py: copy the binaries to all machines in a Hillview cluster
* start.py: start the Hillview service on a remote cluster
* stop.py: stop the Hillview service on a remote cluster
* status.py: check the Hillview service on a remote cluster
* hillviewConsoleLog.py: logging library used by other python programs
* `delete-data.py`: delete a folder from all machines in a Hillview cluster
* `deploy.py`: copy the Hillview binaries to all machines in a Hillview cluster
* `download-data.py`: downloads the specified files from all machines in a cluster
* `hillviewCommon.py`: common library used by other Python programs
* `run-on-all.py`: run a command on all machines in a Hillview cluster
* `start.py`: start the Hillview service on a remote cluster
* `status.py`: check the Hillview service on a remote cluster
* `stop.py`: stop the Hillview service on a remote cluster
* `upload-data.py`: upload a set of files to all machines in a Hillview cluster in a
round-robin fashion

## Configuration files

* `config.json`: skeleton configuration file for a Hillview cluster
* `config-local.json`: description of a Hillview cluster that consists
of just the local machine (used both as a web server and as a
worker)

# Additional documentation

## Managing a Hillview cluster

* Copy the file `config.json` and modify it to describe your cluster. Let's say you
saved into `myconfig.json`
* To run Hillview on the local machine just use `config-local.json`
* You can install Hillview on your cluster by running `deploy.py myconfig.json`
* You can start the Hillview service on the cluster by running `start.py myconfig.json`
* You can stop the Hillview service on the cluster by running `stop.py myconfig.json`
* You can check the status of the Hillview service on the cluster by running `status.py myconfig.json`

## Managing files on a Hillview cluster

Several scripts can be used to manage data distributed as raw files on
a Hillview cluster. The convention is that a dataset is stored in one
directory; the same directory is used on all machines, and each
machine holds a fragment of the entire dataset.

Let's say we have a very large file x.csv that we want to upload to a
cluster; we will chop it into pieces and install the pieces in the
directory "data/x" on each machine (below the hillview working
directory). This is done with:

```
$: ./upload-file.sh -c myconfig.json -d data/x -h -f x.csv -o
```

The various flags have the following significance:
* `-c myconfig.json`: specifies cluster where data is uploaded
* `-d data/x`: specifies directory where data is uploaded on each machine
* `-h`: specifies the fact that the file `x.csv` has a header row
* `-f x.csv`: specifies the input file
* `-o`: specifies that the output should be saved as ORC files (a fast columnar format)

After uploading the file in this way it can be loaded by selecting
`Load / ORC files' and specifying:
* File name pattern: data/x/x*.orc
* Schema file: schema

Alternatively, you can split the file locally and upload the pieces
afterwards; the following splits the file into pieces in the `tmp`
directory and then uploads these pieces to the cluster using the
`upload-data.py` program:

```
$: ./upload-file.sh -d tmp -h -f x.csv -o
$: ./upload-data.py -d data/x -s schema mycluster.json tmp/*.orc
```

To list the files on the cluster you can use the `run-on-all.py` script, e.g.:

```
$: ./run-on-all.py mycluster.json "ls -l data/x"
```

You can delete a directory from all machines of a cluster:

```
$: ./delete-data.py mycluster.json data/x
```

Finally, you can download back data you have uploaded to the cluster:

```
$: ./download-data.py mycluster.json data/x
```

When downloading this utility will create a folder for each machine in
the cluster.
3 changes: 1 addition & 2 deletions bin/delete-data.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@

import os.path
from argparse import ArgumentParser
from hillviewCommon import ClusterConfiguration, get_config
from hillviewConsoleLog import get_logger
from hillviewCommon import ClusterConfiguration, get_config, get_logger

logger = get_logger("delete-data")

Expand Down
3 changes: 1 addition & 2 deletions bin/deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
from argparse import ArgumentParser
import tempfile
import os.path
from hillviewCommon import ClusterConfiguration, get_config
from hillviewConsoleLog import get_logger
from hillviewCommon import ClusterConfiguration, get_config, get_logger

logger = get_logger("deploy")

Expand Down
Loading

0 comments on commit 9d0ede1

Please sign in to comment.