Skip to content

Accompanying repository for FIS/SunGard's whitepaper on using the Dataflow SDK to transform options market data

License

Notifications You must be signed in to change notification settings

SunGard-Labs/dataflow-whitepaper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FIS Google Cloud Platform Whitepapers

About

Welcome to the source repository for various code references found in several FIS Advanced Technology white papers.

In December 2015, FIS Advanced Technology released the white paper "Transforming Options Market Data with the Dataflow SDK", to provide software engineers insight into Google Cloud Dataflow's optimal programming model and execution environent.

In August of 2016, FIS Advanced Technology produced a sequel to the original 2015 Bigtable whitepaper, detailing the introduction of Google Cloud Dataflow and Google Cloud BigQuery to the Market Reconstruction Tool's solution architecture, as well as to provide a deeper look into the material covered at our Analyzing 25 billion stock market events in an hour with NoOps on GCP talk from Google NEXT 2016.

September of 2016 saw the release of "Market Reconstruction 2.0: Visualization at Scale", illustrating the team's experience designing the user interface for a securities transaction regulatory database expected to grow to 35 petabytes over 6 years.

White papers

Running the example Dataflow options market data transformaton project

To build the project:

mvn clean install

Out of the box, the repository is configured to run a standalone Dataflow job on the local workstation, using input data that ships with the repository (input/zvzzt.input.txt).

The example can be run locally either executing:

cd bin && ./run

or by calling Maven with:

mvn clean install && mvn -Plocal exec:exec

Running the project on Google Cloud Platform / BigQuery

Once you have activated a Google account on Google Cloud Platform, you will need your Project ID and at least one GCS bucket to be created (for storing deployment artifacts and input files.)

Log your shell into GCP:

gcloud auth login

If you do not already have a Google Cloud Storage bucket, you can create one with the following command:

gsutil mb gs://<pick_a_bucket_name>

Copy input specimen to Google Cloud Storage:

gsutil cp input/zvzzt.input.txt gs://<pick_a_bucket_name>

Ensure that there is a proper destination dataset in your BigQuery account. For example, this command will create a dataset called dataflow-project within BigQuery for your account:

bq mk <dataflow_project>

Execute the following, substituting your own values PROJECT and BQDEST in bin/run:

cd bin && ./run gs://<pick_a_bucket_name>/zvzzt.input.txt

The Pipeline will automatically create the table if it does not exist, although it cannot create the initial dataset.

To execute the job upon Google Cloud Platform using Maven, edit the associated values for your project ID and account within pom.xml and then run:

mvn clean install && mvn -Pgcp exec:exec

Remember that you can not use local files but have to use files stored from/to GCS (gs://).

Errata

Please open up a GitHub issue for any discrepancies or inconsistencies you may discover and we will correct and publish here.

See Also

License

MIT. See license text in LICENSE.

Copyrights and Names

Copyright © FIS 2016. Licensed under the MIT license.

About

Accompanying repository for FIS/SunGard's whitepaper on using the Dataflow SDK to transform options market data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published