Skip to content

Commit

Permalink
Merge pull request #101 from nhs-r-community/redact-git
Browse files Browse the repository at this point in the history
Added content about Git and GitHub
  • Loading branch information
Lextuga007 authored Mar 4, 2024
2 parents 4d6a00a + b23b121 commit c8eda06
Show file tree
Hide file tree
Showing 4 changed files with 161 additions and 22 deletions.
1 change: 1 addition & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ book:
appendices:
- technical-r.qmd
- technical-python.qmd
- technical-git.qmd
- contribution.qmd
- glossary.qmd
page-footer:
Expand Down
2 changes: 1 addition & 1 deletion contribution.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Particularly for packages like [{NHSRdatasets}](https://github.com/nhs-r-communi
- Add your dataset in the `data` folder, in `.rda` format.
The best way to do this is with the {usethis} package with "gzip" compression: `usethis::use_data(data, compress="gzip")`

## Contributing guidelines and etiquette
## Contributing guidelines and etiquette {#contrib}

- Preview your Markdown code to make sure the format is not broken.
- Material, including commit messages, should be written in clear and simple English.
Expand Down
105 changes: 84 additions & 21 deletions statement-on-using-tools-git.qmd
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
# Statement on using tools - Git and GitHub

\[TODO\] Holding text to be clear this is broader in scope than just R
Using R (or Python) is just the first step towards data science with the use of Git being item two on the ["baseline fundamentals"](https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/#baseline-rap---getting-the-fundamentals-right) level of Reproducible Analytical Pipelines (RAP).
Without the use of Git it is not possible to move to Silver or Gold and with good reason because it is integral to being able to produce analysis and code that is:

Git is item two on the "baseline fundamentals" level of RAP, and necessary before moving on much further. https://nhsdigital.github.io/rap-community-of-practice/introduction_to_RAP/levels_of_RAP/#baseline-rap---getting-the-fundamentals-right
- more efficient
- more robust
- more transparent

Much more is written about Git in the [NHS RAP Community of Practice](https://nhsdigital.github.io/rap-community-of-practice/training_resources/git/introduction-to-git/) and so this chapter serves to add only a few things to that work.

## GitHub repositories
## GitHub repositories {#github-examples}

Examples of organisations that are using GitHub:
These examples of organisations that are using GitHub serve as a reference to the increasing public sector and Government organisations and teams that publish code:

### NHS

Expand All @@ -23,27 +27,86 @@ Examples of organisations that are using GitHub:
* [RAP Community of Practice](https://github.com/NHSDigital/rap-community-of-practice) - NHS England but under NHS Digital's organisation
* [The Strategy Unit](https://github.com/The-Strategy-Unit)


### Government

* [The Analysis Function](https://github.com/best-practice-and-impact)
* Cabinet Office [central](https://github.com/cabinetoffice) and [Analysis & Insight](https://github.com/co-analysis)
* [Department for Environment, Food and Rural Affairs (Defra) Data Science Centre of Excellence](https://github.com/Defra-Data-Science-Centre-of-Excellence)
* Department for Education
- [Analytical Services](https://github.com/dfe-analytical-services)
- [R Community](https://github.com/DfE-R-Community)
- [DfE Digital](https://github.com/DFE-Digital)
* [Department for Health and Social Care Data Science](https://github.com/DataS-DHSC)
* [Government Digital Service](https://github.com/alphagov)
* Ministry of Justice [central](https://github.com/ministryofjustice) and [Analytical Services](https://github.com/moj-analytical-services)
* [Scottish Government Analysis](https://github.com/ScotGovAnalysis)
* UKHSA [Collaboration](https://github.com/ukhsa-collaboration) and [Internal](https://github.com/UKHSA-Internal)

### Local Government
### England Local Government

* [London Borough of Hackney](https://github.com/LBHackney-IT)
* [Trafford Data Lab](https://github.com/traffordDataLab)
* [Trafford Council](https://github.com/TraffordCouncil)

### Charity

* [The Health Foundation](https://github.com/HFAnalyticsLab)

## Scotland

- [Scottish Government Analysis](https://github.com/ScotGovAnalysis) (Analysis / Statistics / Data Science)
- [Scottish Government](https://github.com/scottishgovernment) (Web Development)
- [Marine Scotland Science](https://github.com/MarineScotlandScience)
- [Transport Scotland](https://github.com/TransportScotland)
- [Public Health Scotland](https://github.com/Public-Health-Scotland)

### Scottish Local authorities

- [Comhairle nan Eilean Siar (Western Isles Council)](https://github.com/cne-siar)
- [City of Edinburgh Council](https://github.com/edinburghcouncil)
- [Dundee City Council - GIS](https://github.com/DundeeCityCouncil)
- [East Lothian Council](https://github.com/ELCwebteam)
- [Falkirk Council](https://github.com/Falkirk-Council)
- [North Ayrshire Council](https://github.com/north-ayrshire-council)
- [North Lanarkshire Council](https://github.com/north-lanarkshire-council)
- [South Ayrshire Council](https://github.com/southayrshire)
- [West Lothian Council](https://github.com/westlothiancouncil)

### UK Government Departments

- [The Analysis Function](https://github.com/best-practice-and-impact)
- Cabinet Office
- [Central](https://github.com/cabinetoffice)
- [Analysis & Insight](https://github.com/co-analysis)
- Department for Education
- [Analytical Services](https://github.com/dfe-analytical-services)
- [R Community](https://github.com/DfE-R-Community)
- [DfE Digital](https://github.com/DFE-Digital)
- [Standard](https://github.com/DFEAGILEDEVOPS)
- [Digital](https://github.com/DFE-Digital)
- [Analytical Services](https://github.com/dfe-analytical-services)
- [R Community](https://github.com/DfE-R-Community)
- Department for Environment, Food and Rural Affairs (Defra)
- [Data Science Centre of Excellence](https://github.com/Defra-Data-Science-Centre-of-Excellence)
- [Central](https://github.com/defra)
- [Department for Health and Social Care Data Science](https://github.com/DataS-DHSC)
- [Department for International Trade](https://github.com/uktrade)
- [Department for Levelling Up, Housing and Communities](https://github.com/communitiesuk)
- [Department for Transport](https://github.com/department-for-transport)
- [Foreign, Commonwealth and Development Office](https://github.com/DFID)
- [Government Digital Service](https://github.com/alphagov)
- Ministry of Justice
- [Analysis](https://github.com/moj-analytical-services)
- [Central](https://github.com/ministryofjustice)
- UKHSA
- [Collaboration](https://github.com/ukhsa-collaboration)
- [Internal](https://github.com/UKHSA-Internal)

#### Office for National Statistics

- [Data Science Campus](https://github.com/datasciencecampus)
- [Government Analysis Function](https://github.com/best-practice-and-impact)
- [ONS Big Data](https://github.com/ONSBigData)
- [ONS Data Visualisation](https://github.com/ONSvisual)
- [ONS Digital](https://github.com/ONSdigital)

### Other

- [British Geological Survey](https://github.com/BritishGeologicalSurvey)
- [Central Statistics Office, Ireland](https://github.com/CSOIreland)


:::{.callout-note collapse=false appearance='default' icon=true}
## Scotland Government Analysis list
Thanks to the Scotland Government Analysis who have also published a comprehensive list of organisational [GitHubs](https://github.com/ScotGovAnalysis/welcome/blob/main/public-sector-github-orgs.md).
:::


This is not a complete list and links may change name so this is a great opportunity to also contribute to this book.
Details on how to do that can be found in the chapter [Contributing guidelines and etiquette](#contrib).

75 changes: 75 additions & 0 deletions technical-git.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Technical guidance - git {#tech-git}

## Installing Git

Download the installer from [https://git-scm.com/downloads](https://git-scm.com/downloads)

NHS RAP Community of Practice have a [Git Quick Start Guide](https://nhsdigital.github.io/rap-community-of-practice/training_resources/git/quick_start_guides/git_quick_start_guide/) written for the Terminal commands.

## Set up using R

Following course materials developed by [R Forwards](https://forwards.github.io/workshops/package-dev-modules/slides/02-setting-up-system/setting-up-system.html#1) or [NHS-R Community Introduction to Git and GitHub using R](https://intro-git-github.nhsrcommunity.com/) which is also based on R Forwards slides.

## Removing sensitive and patient identifiable information

GitHub recommend using [BFG Repo-Cleaner](https://rtyley.github.io/bfg-repo-cleaner/) for a quick and efficient way of deleting files, their history from the commit and this can be used across all branches.
BFG Repo-Cleaner is also good for removing very large files.

It requires [Java](https://www.java.com/en/download/manual.jsp) installed and this may also require administrator rights to do as well as the [BFG Repo-Cleaner `.jar` file](https://repo1.maven.org/maven2/com/madgag/bfg/1.14.0/bfg-1.14.0.jar).

Once downloaded put the `bfg-1.14.0.jar` in the folder where the file is that you wish to delete.

:::{.callout-tip collapse=false appearance='default' icon=true}
## Take a copy! (tip)
The documentation recommends taking a copy of the repository before making any changes. The code on the BFG Repo-Cleaner is for the Terminal or you can copy local folders.
:::

Delete the file and commit that so that the latest commit is _clean_ and doesn't contain the undesired data.

Using the Terminal type (on Windows)

```
java -jar bfg-1.14.0.jar -–delete-files my_sensitive_file.rda
```
If the file and `bfg-1.14.0.jar` are in a subfolder amend the code for the `bfg-1.14.0.jar` part only:

```
java -jar example\subfolder\bfg-1.14.0.jar -–delete-files my_sensitive_file.rda
```

If it works then you should get a whole list of information about the deletion.
However, if it doesn't work then you will get information on the ways you can use the program.

:::{.callout-important collapse=false appearance='default' icon=true}
## Other file types (important)
If the sensitive data is part of something like a Quarto report, website or book then a corresponding `html` file will also have to be deleted.
:::

Once the file history has been removed from the Git history type:

```
git push --force
```
:::{.callout-important collapse=false appearance='default' icon=true}
## Changing history (important)
Using the BFG Repo-Cleaner changes the Git history on main and may also do this for branches.

Anything that has already been cloned, forked or downloaded from GitHub will be unaffected and you may need to contact GitHub directly to ensure this information is removed from GitHub repositories.
:::

## Disaster planning

Any publishing of sensitive data will require an incident to be raised within your organisation and may be classed as a breach and this can cause stress and pressure in the people involved.
In order to react quickly and efficiently it's advisable to practice deleting Git histories that don't involve sensitive information.

Prevention is also better than recovery and many of the teams and organisations listed in the [Statement on Using Tools - git](#github-examples) are working on preventative measures including using a comprehensive `.gitignore`, git hooks and so on.
[TODO] Contributions on preventing accidental sharing of sensitive data for this book will be welcomed.

## GitHub Personal Access Token

The Personal Access Token (PAT) should never be stored in any file that can be committed and pushed to GitHub.
However, if this does occur GitHub will contact you to say that your PAT has been revoked and you need to set up a new one.
This means that your code and history are untouched but you will need to set up a new PAT to reconnect your local Git to GitHub.

0 comments on commit c8eda06

Please sign in to comment.