-
Notifications
You must be signed in to change notification settings - Fork 126
Development Notes
In conf/macrobase.yaml
:
server:
applicationConnectors:
- type: http
port: 6666
adminConnectors:
- type: http
port: 6667
To view test coverage:
mvn cobertura:cobertura; open target/site/cobertura/index.html
We're also on coveralls.io.
Diagnostics tools live under test
in macrobase.diagnostic
. Add a new instance of a class that extends ConfiguredCommand<MacroBaseConf>
to the bootstrap in DiagnosticApplication
, then run bin/diagnostic.sh <your command name>
.
To merge a PR, don't just click Merge pull request
. Instead, follow the following rebase steps and push directly to master:
// Say you're currently on branch wip; get latest changes from origin
git fetch origin
// Rebase change in wip on top of the existing master
// ... or git pull --rebase
git rebase -i origin/master
// Get back to master
git checkout master
// Merge wip with master (using the "fast-forward only" option)
git merge --ff-only wip
// Now can push to origin/master
git push origin master
Oracle recently released their Java Mission Control software, which is available on Mac OS X. To find the appropriate command, run find /Library/Java -name jmc
. To run a Java program with their 'Flight Recorder' enabled, append -XX:+UnlockCommercialFeatures -XX:+FlightRecorder
to the JVM arguments. bin/profile/profile_streaming.sh
provides an example.
YourKit supports open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of YourKit Java Profiler and YourKit .NET Profiler, innovative and intelligent tools for profiling Java and .NET applications.
Clone the repo (git clone https://github.com/stanford-futuredata/macrobase.wiki.git
), commit images to img
subdirectory, link to them using relative link in Markdown (img/myfolder/foo.png
).
The MIT ISTC machines don't have Maven, Java 1.8, or the latest version of Postgres. As a possibly temporary workaround, I have installed binaries locally on the istc3 host in /data/pbailis/bin
and have set up a Postgres 9.5 instance in /data/pbailis/pgdata
running on port 5050. For everything to work correctly, please copy the following to your .bashrc
, or just copy mine from ~/pbailis/.bashrc
(and make sure to add source ~/.bashrc
to .bash_profile
):
export PATH="/data/pbailis/bin:$PATH"
export PGDATA=/data/pbailis/pgdata
export PGPORT=5050
export PGHOST=localhost
export JAVA_OPTS="-Dmacrobase.loader.db.url=localhost:$PGPORT -Xms128m -Xmx16G"
An rwx copy of the stanford-futuredata/macrobase
repo is in /data/pbailis/macrobase
.
You should have your own directory in /data/
from which you can also clone the repo (git clone [email protected]:stanford-futuredata/macrobase.git /data/`whoami`
) and work privately.
A description of the Postgres tables is in /data/pbailis/dataset-descriptions.txt
.
PostgreSQL is slow for sparse column accesses, so we use Spark SQL to convert each table to Parquet, a columnar storage format.
In the UNIX shell, run:
SPARK_CLASSPATH=postgresql-9.4.1207.jre6.jar bin/spark-shell --driver-memory 50G --executor-memory 50G --executor-cores 64
In the Spark shell, run (replacing List()
with your list of tables):
for(table <- List("campaign_expenditures", "fed_disbursements", "hubway_trips", "milan_telecom", "sensor_data_demo", "joined_cmt_data", "uk_road_accidents")) {
println(s"Loading $table")
var jdbcDF = sqlContext.load("jdbc", Map("url" -> "jdbc:postgresql://localhost:5050/postgres", "dbtable" -> table))
jdbcDF.select("*").write.format("parquet").save(s"$table.parquet")
}
-
Running benchmarks is easy: just run
python execute_workflows.py
within thebench
sub-directory. You can pick the workflows to run by specifying your own JSON workflow configuration file. Take a look atbench/conf/workflow_config.json
to get a sense of what this configuration file looks like. Please specify where the workflow is a batch or streaming job, its name, what the target attributes are, what the high and low metrics are, and what the base query is. In addition,execute_workflows.py
sweeps over the parameters provided in your own JSON sweeping parameters configuration file: an example of this can be found atbench/conf/sweeping_parameters_config.json
. Provide the name of the parameter, along with the range of parameters that you want to explore. (The file can be left empty if you do not want to sweep over any parameters) The names of the JSON configuration files are passed intoexecute_workflows.py
through command line arguments. Usepython execute_workflows.py -h
to get all the arguments available. -
To produce plots, pipe the output of the
execute_workflows.py
script to a file (eg,output.out
, then runpython produce_plots.py --output-file <output_file> --plot-directory <plot_directory>
. Be sure to choose the workloads you're interested in seeing graphs of through the plotting configuration JSON file (bench/conf/plotting_config.json
). A graph will be produced for every swept parameter, with each graph containing as many lines as number of workloads chosen.
Note: All scripts should be run from the bench/
sub-directory within macrobase
.
A Stanford Future Data Project