-
Notifications
You must be signed in to change notification settings - Fork 2
How to Use and Update the SIA_repo_template
Overviews and instructions on how to use the SIA repo template are found in the README.
This page contains the notes for the How to Workshops that SIA runs.
Why do we have a template?
The purpose of this repo is to ensure that any IDI project that is started that depends on other repositories has all the necessary macros available (and the relevant versions of the macros). Dependence on other repositories is managed via a concept called submodules. This template also has:
- Standard parts of the readme populated
- Standard headers for R, SAS and SQL scripts so there are no excuses for not having headers now :)
Note we have the superset structure it is expected that you will delete the ones you do not want
[Have them look at the folders on GitHub]
docs: This folder contains relevant documentation.
examples: This folder contains an example for training purposes.
include: This folder contains generic formatting scripts - gradually being deprecated.
lib: This folder is used to refer to reusable code that belongs to other repositories.
logs: This folder is used to store the output logs that SAS generates. This is used for the cross agency outcomes that have a lot of code to run.
output: This folder is used to store analytical output such as Excel tables and graphics.
rprogs: This folder contains R scripts (both a main script and functions).
sasautos: This folder contains SAS macros. All scripts here will be loaded into the SAS environment during the running of setup in the main script.
sasprogs: This folder contains SAS programs. The main script that builds the dataset is located in here as well as well as other SAS scripts that are not macros.
unittests: This folder contains unit tests for those who wish to debug the code.
[show the sql template example]
[show that the readme is already pre-populated]
git clone --recursive https://github.com/nz-social-investment-agency/sia_repo_template.git
Then make sure you are working with the latest version of the SIAL, SIDF and so on. Creating these as submodules can be done by following these instructions in the submodule section
For example we created a few extra indicators for the data foundation when we did the vulnerable mothers work. We will use this example throughout this section to explain where to make changes.
Currently the IDI does not have a version control tool. This means that we must manually replicate the branching and merging process so that our production copies of our reusable repositories are not accidentally modified. It sounds very convoluted but the team has had a lot of trouble signing things out of the IDI only to find it takes hours to reconcile the differences. Doing it this way will help minimise problems.
Make your changes in sia_vulnerable_mums_analysis/lib/social_investment_data_foundation
folder in the IDI. This is our manual version of branching.
Modify only the submodule and do not modify the social_investment_data_foundation production copy in the top level of our IDI project folder. Other people might be working on projects that use the same reusable code. If you overwrite the master copy it could break the data build in the other project. We experienced this during building of the data foundation and updating the SIAL. We modified the master copies and as a result the social housing code rebuild fell over.
Do not create and modify copies in your personal folders. This is equivalent to using the stash in version control. The reason for this is we recently ran an entire end to end data build. We noted that the changes to the data foundation that were requested had not been made. It turns out they had but someone had saved a copy in their personal folder. The sasautos call only picks up macros in sasautos consequently it was not picking up the changes.
If there are two projects concurrently using our reusable code then you need to confirm with the other project that your changes will not create conflicts with their build. If it is straightforward enough (e.g. you added one new script and did not modify any other scripts) then you might get away with verbal confirmation. For more complicated changes, one way to check there are no conflicts is to point the sasautos call to the other repositories submodule and see if the code still runs end to end. For example, imagine we were working on vulnerable mothers and social housing at the same time. We try to run the vulnerable mothers data build with the social housing submodules and find we could not build the tables. This would confirm there are conflicts.
Once you have confirmed there are no conflicts sign the project code out of the IDI. Pull down the latest version of the repos like the social_investment_data_foundation onto your local machine.
cd /c/NotBackedUp/social_investment_data_foundation
git pull origin master
Copy and paste the submodule version of the social_investment_data_foundation into your local copy. Push these changes through to the remote repository.
git add .
git commit -m "added disability indicator"
git push origin master
If you have not created submodules because this is your first push of the project code for vulnerable mothers then go to the submodule section on the version control page and follow those instructions.
If you are updating the vulnerable mothers project repository code then go to your project which has the changes and pull down the submodule changes. Check the SHA number has been updated.
cd /c/NotBackedUp/sia-vulnerable-mothers-analysis/lib/social_investment_data_foundation
git pull origin master
Copy and paste your project files into your local project (Git only tracks changes not copy and pastes of files so your submodules will not be flagged as modified). Push the project changes through to GitHub.
cd /c/NotBackedUp/sia-vulnerable-mothers-analysis/
git add .
git commit -m "updated clustering method"
git push origin master
Zip up the production copies of the code and email them to [email protected] asking them to drop them into our IDI folder.
Yes it is very convoluted but we must be able to track our changes to all our repositories and without a version control tool in the IDI this is the best way we can do it.
Note: If you feel your Git skills are at an advanced level feel free to push the data foundation changes via the submodules.
Adding a person to an IDI Project
Access to the SIA Github organisation
Graph databases and visualisations
How to Create and Update the SIAL
How to Use and Update the SIDF
How to Use and Update the SIAtoolbox
How to Use and Update the Markdown Reports
How to Use Git and Version Control