Skip to content

Commit

Permalink
Merge pull request #124 from nhs-r-community/using-ai
Browse files Browse the repository at this point in the history
Using ai
  • Loading branch information
Lextuga007 authored Oct 3, 2024
2 parents 96832b9 + 0db13ab commit 37a1020
Show file tree
Hide file tree
Showing 6 changed files with 69 additions and 4 deletions.
1 change: 1 addition & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ book:
- statement-on-using-tools-git.qmd
- statement-on-using-tools-shiny.qmd
- statement-on-using-tools-quarto.qmd
- statement-on-artificial-intelligence.qmd
appendices:
- technical-r.qmd
- technical-python.qmd
Expand Down
51 changes: 51 additions & 0 deletions statement-on-artificial-intelligence.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Statement on using artificial intelligence

The use of artificial intelligence is growing in every day use so this chapter will particularly highlight some points to consider when using with other data science tools like R, Python or git.

LLMs (Large Language Models) are particularly growing in use through ChatGPT and Copilot with generating code and can be linked into Python or R through particular packages.
Uses vary from:

- producing code
- generating comments
- explaining code

and access to LLMs varies across organisations.

## Things to consider

### Breach of sensitive information

LLMs are proprietary and closed systems and anything entered into them will be seen by the model, retained and reused.

:::{.callout-important collapse=false appearance='default' icon=true}
## Sharing publicly or to a model (important)
If you can share your code with an LLM you can publish it on GitHub!

An LLM will reuse your code and ideas because that's how it generates its output.
:::

It is a breach if code is given that includes sensitive information, something highlighted in the chapter about sharing code in [Quarto](#code-tools), where filters include something like:

```
filter(NHSNumber == '4564564564')
```

This is bad practice but is something that is often done to explore data by analysts as they investigate it.
It is a perfectly legitimate use for code and becomes a notifiable incident if it is shared with an LLM as they learn from the questions they are given as much as from data sources like GitHub.

### Code may be out of date

Things move fast in R and Python and packages are created and updated at a rapid rate.
LLMs like ChatGPT have release dates and you might find that solutions it gives are for out of date packages.
Many R packages that are superseded are still available so code will work, like httr which is now httr2 or qicharts which is now qicharts2, so it's always good to review the code output and see if things have changed.
The difficulty is that you won't be able to get updated information from the LLM even if you are aware of a more up to date package.

### Other issues

These are purely technical and analytical areas for consideration but there is a whole host of other considerations to make when using LLMs like ChatGPT including environmental, bias, how it hallucinates (when information is missing it will fill this in) and how companies that own these models have been accused of labour exploitation.

These are all serious issues that are effectively hidden to the user and as it is not necessary to declare its use we don't really know who is using these LLMs or what for.

## Useful resources

[Approval and use of Artificial Intelligence Policy](https://www.linkedin.com/posts/andy-mayne_approval-use-of-artificial-intelligence-activity-7246442745060278272-ZOjB?utm_source=share&utm_medium=member_desktop) from NHS Somerset NHS Foundation Trust shared on LinkedIn.
6 changes: 6 additions & 0 deletions statement-on-using-tools-git.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,9 @@ Thanks to the Scotland Government Analysis who have also published a comprehensi

This is not a complete list and links may change name so this is a great opportunity to also contribute to this book.
Details on how to do that can be found in the chapter [Contributing guidelines and etiquette](#contrib).

## GitHub Actions

<iframe width="560" height="315" src="https://www.youtube.com/embed/6UpVES4aGgw?si=0EUD1euA_lOvOqMg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

[Sandbox](https://github.com/nottmhospitals/actions_sandbox)
4 changes: 2 additions & 2 deletions statement-on-using-tools-quarto.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ Whilst static charts show HTML code as an image, more interactive charts like {p

Even with pdfs there are tools like the R package [{scrapR}](https://github.com/adamkucharski/scrapR) written to extract point data where it is not shared so any small, suppressible data should therefore never be shared, even in a chart.

### Code Tools
### Code Tools {#code-tools}

A useful function of Quarto and R Markdown is the [inclusion of code](https://quarto.org/docs/output-formats/html-code.html#code-tools) through the use of the yaml (`code-fold`) but if the code includes any reference to small numbers of sensitive data directly, for example:

```
filter(!NHSNumber == '4564564564')
filter(NHSNumber == '4564564564')
```
this would be a breach of sensitive information.

Expand Down
6 changes: 4 additions & 2 deletions statement-on-using-tools-r.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,8 @@ As we've discussed above, the philosophy of using packages in R is rather differ

> It is also important to highlight that all software has potential vulnerabilities, including the proprietary software that you already have installed. Therefore, good software security practices should be maintained regardless of the software you are using.
## Other links
## Useful resources

[NHS England Regional Managers of Analytical teams presentation](https://github.com/aporter121/r-stuff/blob/main/Thoughts%20on%20R%20for%20regional%20leads.pptx)
[NHS England Regional Managers of Analytical teams presentation](https://github.com/aporter121/r-stuff/blob/main/Thoughts%20on%20R%20for%20regional%20leads.pptx)

The Digital Technology Assessment Criteria for Health and Social Care [(DTAC) for R](https://www.linkedin.com/feed/update/urn:li:activity:7247179090808438784/?originTrackingId=T%2B3IdsxgS0isgVbrQckHtg%3D%3D) from [South West Analytics & Infrastructure in Healthcare](https://www.linkedin.com/company/swaih/about/) shared on LinkedIn.
5 changes: 5 additions & 0 deletions technical-python.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,8 @@ The default location for the folder is in `C:/Users/my.name` so a few commands c
- `dir` means directory and will show all the folders and files
:::

## Useful resources

The Digital Technology Assessment Criteria for Health and Social Care [(DTAC) for JupyterHub](https://www.linkedin.com/feed/update/urn:li:activity:7247179393335193602/?originTrackingId=ktTpD0p2Tz6vRt7H6GTogw%3D%3D) from [South West Analytics & Infrastructure in Healthcare](https://www.linkedin.com/company/swaih/about/) shared on LinkedIn.

The Digital Technology Assessment Criteria for Health and Social Care [(DTAC) for Python](https://www.linkedin.com/feed/update/urn:li:activity:7247179090808438784/?originTrackingId=T%2B3IdsxgS0isgVbrQckHtg%3D%3D) from [South West Analytics & Infrastructure in Healthcare](https://www.linkedin.com/company/swaih/about/) shared on LinkedIn.

0 comments on commit 37a1020

Please sign in to comment.