RIALTO Combine Load Procedure

Initial Load

Use these steps when writing to an empty store.

cd ~/workspace/rialto-etl
Make sure you have the latest ETL code
Get the SPARQL Proxy URL and API key from shared_configs. Put these values into config/settings.local.yml or the corresponding environment variables.
Connect to the Stanford VPN using full-tunnel mode
Test the connection by sending a simple count query to the SPARQL Proxy
Extract, Transform, Load - Organizations from Profiles
1. Ensure you have the CAP/Profiles API key in either config/settings.local.yml or an environment variable. See shared_configs.
2. Run the organization ETL steps
Extract, Transform, Load - Researchers from Profiles
1. Run the researcher ETL steps
Extract, Transform, Load - Grants from SeRA
1. Using the researchers.ndj file from the researchers extract step above, run the grant ETL steps. Note that researchers without SUNet IDs will not have their grants imported.
Extract, Transform, Load - Publications from Web of Science
1. Using the researchers.ndj file from the researchers extract step above, run the publications ETL steps. This process will create new co-authors, link publications to authors, create new topics, link topics to publications, and link publications to grants.

Use these steps when loading data into a store that already has data.

Querying the data-store for people will get people who have been historically affiliated, which will be more people than we care to update (due to time to load). We may want to re-query Profiles for "current people" or we could mark "inactive" people?