Skip to content
This repository has been archived by the owner on Apr 1, 2020. It is now read-only.

How to Use and Update the SIA_repo_template

ernestynne edited this page Nov 11, 2017 · 4 revisions

Overviews and instructions on how to use the SIA repo template are found in the README.

This page contains the notes for the How to Workshops that SIA runs.

Using the SIA Repo Template

Overview

Why do we have a template?

The purpose of this repo is to ensure that any IDI project that is started that depends on other repositories has all the necessary macros available (and the relevant versions of the macros). Dependence on other repositories is managed via a concept called submodules. This template also has:

  • Standard parts of the readme populated
  • Standard headers for R, SAS and SQL scripts so there are no excuses for not having headers now :)

What does the file structure look like?

Note we have the superset structure it is expected that you will delete the ones you do not want

[Have them look at the folders on GitHub]

docs: This folder contains relevant documentation.

examples: This folder contains an example for training purposes.

include: This folder contains generic formatting scripts - gradually being deprecated.

lib: This folder is used to refer to reusable code that belongs to other repositories.

logs: This folder is used to store the output logs that SAS generates. This is used for the cross agency outcomes that have a lot of code to run.

output: This folder is used to store analytical output such as Excel tables and graphics.

rprogs: This folder contains R scripts (both a main script and functions).

sasautos: This folder contains SAS macros. All scripts here will be loaded into the SAS environment during the running of setup in the main script.

sasprogs: This folder contains SAS programs. The main script that builds the dataset is located in here as well as well as other SAS scripts that are not macros.

unittests: This folder contains unit tests for those who wish to debug the code.

What files are already in this template repo?

[show the sql template example]

[show that the readme is already pre-populated]

What should I do when I first start a project?

git clone --recursive https://github.com/nz-social-investment-agency/sia_repo_template.git

Then make sure you are working with the latest version of the SIAL, SIDF and so on. Creating these as submodules can be done by following these instructions in the submodule section

What happens if we need to make enhancements to our submodule repos for a particular project?

For example we created a few extra indicators for the data foundation when we did the vulnerable mothers work. We will use this example throughout this section to explain where to make changes.

Where do I make these enhancements?

Currently the IDI does not have a version control tool. This means that we must manually replicate the branching and merging process so that our production copies of our reusable repositories are not accidentally modified. It sounds very convoluted but the team has had a lot of trouble signing things out of the IDI only to find it takes hours to reconcile the differences. Doing it this way will help minimise problems.

Make your changes in sia_vulnerable_mums_analysis/lib/social_investment_data_foundation folder in the IDI. This is our manual version of branching.

Modify only the submodule and do not modify the social_investment_data_foundation production copy in the top level of our IDI project folder. Other people might be working on projects that use the same reusable code. If you overwrite the master copy it could break the data build in the other project. We experienced this during building of the data foundation and updating the SIAL. We modified the master copies and as a result the social housing code rebuild fell over.

Do not create and modify copies in your personal folders. This is equivalent to using the stash in version control. The reason for this is we recently ran an entire end to end data build. We noted that the changes to the data foundation that were requested had not been made. It turns out they had but someone had saved a copy in their personal folder. The sasautos call only picks up macros in sasautos consequently it was not picking up the changes.

How do I merge these enhancements back to the master?

Resolving Conflicts

If there are two projects concurrently using our reusable code then you need to confirm with the other project that your changes will not create conflicts with their build. If it is straightforward enough (e.g. you added one new script and did not modify any other scripts) then you might get away with verbal confirmation. For more complicated changes, one way to check there are no conflicts is to point the sasautos call to the other repositories submodule and see if the code still runs end to end. For example, imagine we were working on vulnerable mothers and social housing at the same time. We try to run the vulnerable mothers data build with the social housing submodules and find we could not build the tables. This would confirm there are conflicts.

Merging back to master

Updating the reusable tools (SIAL, SIDF etc)

Once you have confirmed there are no conflicts sign the project code out of the IDI. Pull down the latest version of the repos like the social_investment_data_foundation onto your local machine.

cd /c/NotBackedUp/social_investment_data_foundation

git pull origin master

Copy and paste the submodule version of the social_investment_data_foundation into your local copy. Push these changes through to the remote repository.

git add .

git commit -m "added disability indicator"

git push origin master

Creating the project repository - if submodules do not exist

If you have not created submodules because this is your first push of the project code for vulnerable mothers then go to the submodule section on the version control page and follow those instructions.

Updating the project repository - once the submodules are setup

If you are updating the vulnerable mothers project repository code then go to your project which has the changes and pull down the submodule changes. Check the SHA number has been updated.

cd /c/NotBackedUp/sia-vulnerable-mothers-analysis/lib/social_investment_data_foundation

git pull origin master

Copy and paste your project files into your local project (Git only tracks changes not copy and pastes of files so your submodules will not be flagged as modified). Push the project changes through to GitHub.

cd /c/NotBackedUp/sia-vulnerable-mothers-analysis/

git add .

git commit -m "updated clustering method"

git push origin master

Pulling the master copies into the IDI

Zip up the production copies of the code and email them to [email protected] asking them to drop them into our IDI folder.

Yes it is very convoluted but we must be able to track our changes to all our repositories and without a version control tool in the IDI this is the best way we can do it.

Note: If you feel your Git skills are at an advanced level feel free to push the data foundation changes via the submodules.

Clone this wiki locally