Skip to content

Creating WST datasets

Ben Klein edited this page May 16, 2021 · 3 revisions

While most of WST's usefulness comes from the size and scope of the data we can analyze, often there are cases when you want a smaller or specific selection of repositories to compose a specialized dataset.

All of the code we use to build our own datasets is in this repo, so how can you go about using it?

Prepare ArangoDB

First off you'll need a database ready to store the output, you can follow your favorite guide for installing ArangoDB, after which you should create a database and a user with write access to that database.

In order to set up the collections, relations, and indexes, the wsyntree-collector command has a subcommand that initializes the database for you:

export WST_DB_URI="http://DB_USERNAME:DB_PASSWORD@localhost:8529/NAME_OF_DATABASE"
wsyntree-collector -v db init
# if you wish to re-initialize a database by **DELETING ALL DATA** and re-creating:
wsyntree-collector -v db init --delete

Running the collector

TODO: collector design is still being changed during development

Clone this wiki locally