diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 54df7a8..939ee06 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -42,12 +42,12 @@ install.packages("lintr") install.packages("styler") ``` -Where possible, we'd recommend following the [Test Driven Development (TDD)](https://testdriven.io/test-driven-development/) approach: +Where possible, we'd recommend following the [Test Driven Development (TDD)](https://testdriven.io/test-driven-development/) approach. Though if you're new to package development, or already have code for a specific function feel free to start with step 2, to copy the function into the package and then go back to step 1 afterwards. -1. Write tests for the behaviour you want. Either edit an existing test script, or if adding a new function, create a test script using: +1. Write tests using [testthat](https://r-pkgs.org/testing-basics.html) for the behaviour you want. Either edit an existing test script, or if adding a new function, create a test script using: ``` r -usethis::usetest("name_of_new_function") +usethis::use_test("name_of_new_function") ``` 2. Write just enough code so that the tests pass. Again, either edit an existing function, or add a new R script using: @@ -56,7 +56,21 @@ usethis::usetest("name_of_new_function") usethis::use_r("name_of_new_function") ``` -3. Add documentation for what you've done. Follow the [roxygen2](https://roxygen2.r-lib.org/articles/rd.html) pattern for comments. +3. Add documentation for what you've done. Follow the [roxygen2](https://roxygen2.r-lib.org/articles/rd.html) pattern for comments. Here's an example of what it looks like for a basic `add()` function: + +``` +#' @description Add together two numbers +#' +#' @param x A number. +#' @param y A number. +#' @return A number. +#' @examples +#' add(1, 1) +#' add(10, 1) +add <- function(x, y) { + x + y +} +``` 4. Continue to improve code while keeping tests passing. You can automatically style code using: @@ -64,11 +78,11 @@ usethis::use_r("name_of_new_function") styler::style_pkg() ``` -5. Run a full check of the package. Here's a few ways you can do this: +5. Run a full check of the package using the following functions: ``` r devtools::check() # General package check, can also use Ctrl-Shift-E -lintr::lint_pkg() # Check styling of code +lintr::lint_package() # Check formatting of code spelling::spell_check() # Check for spelling mistakes ``` @@ -76,10 +90,12 @@ spelling::spell_check() # Check for spelling mistakes Keyboard shortcuts for the `devtools` package to use while in RStudio: -* `load_all()` (Ctrl-Shift-L): Load code with dfeR package -* `test()` (Ctrl-Shift-T): Run tests -* `document()` (Ctrl-Shift-D): Rebuild docs and NAMESPACE -* `check()` (Ctrl-Shift-E): Check complete package +``` r +load_all() # (Ctrl-Shift-L): Load code with dfeR package +test() # (Ctrl-Shift-T): Run tests +document() # (Ctrl-Shift-D): Rebuild docs and NAMESPACE +check() # (Ctrl-Shift-E): Check complete package +``` We recommend using the [usethis](https://usethis.r-lib.org/index.html) package where possible for consistency and simplicity. @@ -87,7 +103,7 @@ We recommend using the [usethis](https://usethis.r-lib.org/index.html) package w Add any packages the package users will need with: ``` r -usethis::use_package(pkgname, type = "imports") +usethis::use_package(pkgname) ``` Add any packages that package developers only may need with: @@ -159,7 +175,19 @@ lintr::lint_package() ### Testing -We use [testthat](https://cran.r-project.org/package=testthat) for unit tests, we expect all new functions to have some level of test coverage. +We use [testthat](https://cran.r-project.org/package=testthat) for unit tests, we expect all new functions to have some level of test coverage. + +If you want to see examples of existing tests for inspiration, take a look inside the `tests/testthat/` folder. + +### Test coverage + +There are GitHub Actions workflows that check and link the package to [codecov.io](https://app.codecov.io/gh/dfe-analytical-services/), this runs automatic scans to check the % of lines in functions that we are testing. On the [dfeR codecov pages](https://app.codecov.io/gh/dfe-analytical-services/dfeR) you can preview the variation by branch and commit to see the impact of changes made. + +You will need to create an account or login using GitHub to see the pages. + +The current % of coverage is shown as a badge on the package [README on GitHub](https://github.com/dfe-analytical-services/dfeR). + +It is worth noting that 100% coverage does not mean that the tests are perfect, it only means that all lines are ran in tests, so it's more a measure of quantity rather than quality. Interesting to see all the same though, and we'd recommend using it to spot any potential elements of more complicated functions that you may have forgotten to test. ### Spelling @@ -179,6 +207,14 @@ To automatically pick up genuine new words in the package and add to this list, spelling::update_wordlist() ``` +## Adding vignettes + +Vignettes can be found in the `vignettes/` folder as .Rmd files. To start a new one use: + +``` r +usethis::use_vignette("name_of_vignette") +``` + ## Code of Conduct Please note that the dfeR project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this project you agree to abide by its terms. diff --git a/.gitignore b/.gitignore index 9168bf8..16add95 100644 --- a/.gitignore +++ b/.gitignore @@ -4,3 +4,4 @@ .httr-oauth .DS_Store docs +inst/doc diff --git a/DESCRIPTION b/DESCRIPTION index d0b083b..6b627b0 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,11 +1,14 @@ Type: Package Package: dfeR Title: Common DfE R tasks -Version: 0.1.1 +Version: 0.2.0 Authors@R: c( person("Cam", "Race", , "cameron.race@education.gov.uk", role = c("aut", "cre")), person("Laura", "Selby", , "laura.selby@education.gov.uk", role = "aut"), - person("Adam", "Robinson", role = "aut") + person("Adam", "Robinson", role = "aut"), + person("Jen", "Machin", , "jen.machin@education.gov.uk", role = "ctb"), + person("Rich", "Bielby", , "richard.bielby@education.gov.uk", role = "ctb", + comment = c(ORCID = "0000-0001-9070-9969")) ) Description: This package contains R functions to allow DfE analysts to re-use code for common analytical tasks that are undertaken across the @@ -17,8 +20,12 @@ BugReports: https://github.com/dfe-analytical-services/dfeR/issues Imports: lifecycle Suggests: + knitr, + rmarkdown, spelling, testthat (>= 3.0.0) +VignetteBuilder: + knitr Config/testthat/edition: 3 Encoding: UTF-8 Language: en-GB diff --git a/NAMESPACE b/NAMESPACE index 47667d4..28e6a27 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -4,5 +4,6 @@ export(format_ay) export(format_ay_reverse) export(format_fy) export(format_fy_reverse) +export(get_clean_sql) export(round_five_up) importFrom(lifecycle,deprecated) diff --git a/NEWS.md b/NEWS.md index cb6ad3e..098dd4a 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,18 @@ +# dfeR 0.2.0 + +Add function for formatting financial years: + +- format_fy() + +Add reversing functions for academic and financial years: + +- format_ay_reverse() +- format_fy_reverse() + +Add function for grabbing and cleaning a SQL script, and vignette for connecting to SQL. + +- get_clean_sql() + # dfeR 0.1.1 Add default value to decimal place argument of round_five_up() function. diff --git a/R/get_clean_sql.R b/R/get_clean_sql.R new file mode 100644 index 0000000..4512258 --- /dev/null +++ b/R/get_clean_sql.R @@ -0,0 +1,66 @@ +#' Get a cleaned SQL script into R +#' +#' @description +#' This function cleans a SQL script, ready for using within R in the DfE. +#' +#' @param filepath path to a SQL script +#' @param additional_settings TRUE or FALSE boolean for the addition of +#' settings at the start of the SQL script +#' @return Cleaned string containing SQL query +#' @export +#' @examples +#' # This assumes you have already set up a database connection +#' # and that the filepath for the function exists +#' # For more details see the vignette on connecting to SQL +#' +#' # Pull a cleaned version of the SQL file into R +#' if (file.exists("your_script.sql")) { +#' sql_query <- get_clean_sql("your_script.sql") +#' } +get_clean_sql <- function(filepath, additional_settings = FALSE) { + if (!additional_settings %in% c(TRUE, FALSE)) { + stop( + "additional_settings must be either TRUE or FALSE" + ) + } + + # check filepath leads to a SQL file + if (tolower(tools::file_ext(filepath)) != "sql") { + stop("filepath must point to a SQL script, with a .sql extension") + } + + # The file() function will error if the file can't be found + # Open a connection to the file + con <- file(filepath, "r") + sql_string <- "" + + while (TRUE) { + line <- readLines(con, n = 1) + + if (length(line) == 0) { + break + } + + line <- gsub("\\t", " ", line) + line <- gsub("\\n", " ", line) + + if (grepl("--", line) == TRUE) { + line <- paste(sub("--", "/*", line), "*/") + } + + sql_string <- paste(sql_string, line) + } + + # Close connection to the file + close(con) + + if (additional_settings == TRUE) { + # Prefix with settings that sometimes help + sql_string <- paste( + "SET ANSI_PADDING OFF", + "SET NOCOUNT ON;" + ) + } + + return(sql_string) +} diff --git a/README.Rmd b/README.Rmd index 4c3ad5d..3feb848 100644 --- a/README.Rmd +++ b/README.Rmd @@ -91,3 +91,5 @@ This is a basic example showing the `format_ay()` function: library(dfeR) format_ay(202425) ``` + +For more details on all the functions available in this package, and examples of how to use them, please see our [dfeR package reference documentation](https://dfe-analytical-services.github.io/dfeR/reference/index.html). diff --git a/README.md b/README.md index 703c509..5cfc5b4 100644 --- a/README.md +++ b/README.md @@ -104,3 +104,7 @@ library(dfeR) format_ay(202425) #> [1] "2024/25" ``` + +For more details on all the functions available in this package, and +examples of how to use them, please see our [dfeR package reference +documentation](https://dfe-analytical-services.github.io/dfeR/reference/index.html). diff --git a/_pkgdown.yml b/_pkgdown.yml index 8c9b13d..99f7549 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -5,3 +5,16 @@ template: bslib: pkgdown-nav-height: 81.4468px code-color: "ffffff" + +reference: +- title: Helper functions + contents: + - round_five_up + +- title: Formatting + contents: + - starts_with("format_") + +- title: Database connections + contents: + - get_clean_sql diff --git a/inst/WORDLIST b/inst/WORDLIST index 226aeef..8ffe17f 100644 --- a/inst/WORDLIST +++ b/inst/WORDLIST @@ -2,11 +2,14 @@ CMD Codecov DfE Lifecycle +ORCID +SSMS ay -ch +dbplyr dfeshiny -ethz +fy lauraselby +odbc pkgdown renv -stat +sql diff --git a/man/dfeR-package.Rd b/man/dfeR-package.Rd index 8ef5953..7bb801b 100644 --- a/man/dfeR-package.Rd +++ b/man/dfeR-package.Rd @@ -28,5 +28,11 @@ Authors: \item Adam Robinson } +Other contributors: +\itemize{ + \item Jen Machin \email{jen.machin@education.gov.uk} [contributor] + \item Rich Bielby \email{richard.bielby@education.gov.uk} (\href{https://orcid.org/0000-0001-9070-9969}{ORCID}) [contributor] +} + } \keyword{internal} diff --git a/man/get_clean_sql.Rd b/man/get_clean_sql.Rd new file mode 100644 index 0000000..49a5c38 --- /dev/null +++ b/man/get_clean_sql.Rd @@ -0,0 +1,30 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/get_clean_sql.R +\name{get_clean_sql} +\alias{get_clean_sql} +\title{Get a cleaned SQL script into R} +\usage{ +get_clean_sql(filepath, additional_settings = FALSE) +} +\arguments{ +\item{filepath}{path to a SQL script} + +\item{additional_settings}{TRUE or FALSE boolean for the addition of +settings at the start of the SQL script} +} +\value{ +Cleaned string containing SQL query +} +\description{ +This function cleans a SQL script, ready for using within R in the DfE. +} +\examples{ +# This assumes you have already set up a database connection +# and that the filepath for the function exists +# For more details see the vignette on connecting to SQL + +# Pull a cleaned version of the SQL file into R +if (file.exists("your_script.sql")) { + sql_query <- get_clean_sql("your_script.sql") +} +} diff --git a/tests/sql_scripts/simple.sql b/tests/sql_scripts/simple.sql new file mode 100644 index 0000000..8e3865d --- /dev/null +++ b/tests/sql_scripts/simple.sql @@ -0,0 +1,3 @@ +-- Simple SQL script to get all data from my database table + +SELECT * FROM [my_database_table] diff --git a/tests/testthat/test-get_clean_sql.R b/tests/testthat/test-get_clean_sql.R new file mode 100644 index 0000000..7aa4ea2 --- /dev/null +++ b/tests/testthat/test-get_clean_sql.R @@ -0,0 +1,54 @@ +test_that("Can retrieve basic script", { + expect_equal( + get_clean_sql("../sql_scripts/simple.sql"), + paste0( + " /* Simple SQL script to get all data from my database table", + " */ SELECT * FROM [my_database_table]" + ) + ) +}) + +test_that("Adds additional settings", { + # Check that the output starts with the desired lines + expect_true( + grepl( + "^SET ANSI_PADDING OFF SET NOCOUNT ON;", + get_clean_sql("../sql_scripts/simple.sql", additional_settings = TRUE) + ) + ) +}) + +test_that("Doesn't add additional settings", { + # Check that the output doesn't start with the additional lines + expect_false( + grepl( + "^SET ANSI_PADDING OFF SET NOCOUNT ON;", + get_clean_sql("../sql_scripts/simple.sql", additional_settings = FALSE) + ) + ) + # Check that the output doesn't start with the additional lines + expect_false( + grepl( + "^SET ANSI_PADDING OFF SET NOCOUNT ON;", + get_clean_sql("../sql_scripts/simple.sql") + ) + ) +}) + +test_that("Rejects non-boolean values for additional_settings", { + expect_error( + get_clean_sql("../sql_scripts/simple.sql", additional_settings = "True"), + "additional_settings must be either TRUE or FALSE" + ) + expect_error( + get_clean_sql("../sql_scripts/simple.sql", additional_settings = ""), + "additional_settings must be either TRUE or FALSE" + ) +}) + +test_that("Rejects file that don't have SQL extension", { + expect_error( + get_clean_sql("../spelling.R"), + "" + ) +}) diff --git a/vignettes/.gitignore b/vignettes/.gitignore new file mode 100644 index 0000000..097b241 --- /dev/null +++ b/vignettes/.gitignore @@ -0,0 +1,2 @@ +*.html +*.R diff --git a/vignettes/connecting_to_sql.Rmd b/vignettes/connecting_to_sql.Rmd new file mode 100644 index 0000000..6605b48 --- /dev/null +++ b/vignettes/connecting_to_sql.Rmd @@ -0,0 +1,95 @@ +--- +title: "Connecting to SQL" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{connecting_to_sql} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +R can be used to execute SQL scripts to extract data from a database as well as querying the database directly via R. There are three primary ways to do this: + +1. executing a separate SQL script from R +2. writing strings of SQL code in your R script +3. using [dbplyr](https://dbplyr.tidyverse.org/) to query a database using R code + +Which you use will depend on how comfortable you are with SQL and R and if you already have existing SQL scripts that you want to execute or you’re writing new database queries. This vignette focuses on that first example, using the `get_clean_sql()` function to read in a separate SQL script and execute from R. + +For more information on the other two methods, or on troubleshooting the connection between R and SQL in the Department for Education (DfE), please see the [Interacting with a SQL database section of our Analysts' Guide](https://dfe-analytical-services.github.io/analysts-guide/learning-development/r.html#interacting-with-a-sql-database-from-within-r-scripts). + +## Connecting to a database + +Usually in the DfE we use a combination of [odbc](https://odbc.r-dbi.org/) and [DBI](https://dbi.r-dbi.org/) to connect to our SQL databases. + +How you connect will vary depending on whether you're running R code on your laptop, or as a part of a deployed R Shiny app. For running code on your laptop you can automatically use your Windows login (a trusted connection) to grant you access to the database, as the package can automatically detect your user details. + +```{r local connection, eval=FALSE} +# Library calls ==== + +library(odbc) +library(DBI) + +# Database connection ==== + +con <- DBI::dbConnect(odbc::odbc(), + Driver = "ODBC Driver 17 for SQL Server", + Server = "server_name", + Database = "database_name", + UID = "", + PWD = "", + Trusted_Connection = "Yes" +) +``` + +For advice on finding your database details, or connecting to a SQL database from an R Shiny app that is deployed on a server, please contact the [Statistics Development Team](mailto:statistics.development@education.gov.uk) who will be able to advise on the setup and steps required. + +## Reading a SQL script into R + +There are a number of standard characters found in SQL scripts that can cause issues when reading in a SQL script within R and we have created the `get_clean_sql()` function to assist with this. Assume you have connected to the database and assigned that connection to a `con` variable, you would then use the following line to read a cleaned version of your SQL script into R. + +```{r reading clean sql, eval=FALSE} +sql_query <- dfeR::get_clean_sql("path_to_sql_file.sql") +``` + +## Executing the SQL query + +Now that the SQL query is saved as a variable in the R environment you can pass that into a function to execute against the database. There's a number of potential ways to do this, though a common way is to use `dbGetQuery()` from the [DBI package](https://dbi.r-dbi.org/), setting the statement as your cleaned SQL query. + +It's important to note that `dbGetQuery()` is intended to work with 'SELECT' style queries only. If you're doing something that isn't a 'SELECT' query, such as writing back into SQL, consider using the `dbExecute()` or `dbSendQuery()` functions from the [DBI package](https://dbi.r-dbi.org/) instead. + +```{r executing sql query, eval=FALSE} +sql_query_result <- DBI::dbGetQuery(con, statement = sql_query) +``` + +As a side note, if your SQL query is short, you could write it directly into the function such as: + +```{r executing sql query inline, eval=FALSE} +sql_query_result <- DBI::dbGetQuery( + con, + statement = "SELECT * FROM [my_database_table]" +) +``` + +### Troubleshooting + +Our first advice if you hit an error, would be to check that your SQL script runs in SQL Server Management Studio (SSMS) and is a valid SQL 'SELECT' query that returns a single output. + +Assuming that it runs fine in SSMS, the next thing to try is to set additional settings while cleaning the SQL script. You can do this with the `additional_settings` argument in the `get_clean_sql()` function. + +```{r reading clean sql additional, eval=FALSE} +sql_query <- dfeR::get_clean_sql( + "path_to_sql_file.sql", + additional_settings = TRUE +) +``` + +This will add additional settings to the start of your SQL query that are sometimes necessary for the odbc and DBI connection to correctly execute your query. + +For further troubleshooting tips, please see the [Interacting with a SQL database section of our Analysts' Guide](https://dfe-analytical-services.github.io/analysts-guide/learning-development/r.html#interacting-with-a-sql-database-from-within-r-scripts), or contact the [Statistics Development Team](mailto:statistics.development@education.gov.uk) for support.