-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add geography lookup data and functions #81
Conversation
… squash these commits!)
… squash these commits!)
Merge branch 'create-geog-helpers' of https://github.com/dfe-analytical-services/dfeR into create-geog-helpers # Conflicts: # DESCRIPTION # NAMESPACE # _pkgdown.yml # inst/WORDLIST
Copying in @jen-machin for sight, though expecting @rmbielby will have the most opinions on reviewing as it's pulling across data from the data screener. |
I think the only thing I can comment on really is the splitting of scripts. I'm in favour of your updated method and condensing the scripts. Perhaps make clear a family of functions always have a family prefix e.g., "pretty_"? (I know this touched upon in the contributing so just me being picky). |
For the re-organisation, I'm wary it's just going to get messy the way you've got it. I agree with the lines:
But I'm not massively into how you're using internal_functions and helper_functions. Thoughts are:
|
Agree with you on the proposed reshuffling @rmbielby - I specifically asked about it as I wasn't fully happy with what I had and I'd already been through a couple of variations and found myself tripping up over the code and trying to keep it easy to find as I went. Like your solution the best of anything I've tried so far. Here's the updated contributing rules I've followed in my lat-ees-t commit: Folder and script structure conventionsIn an R package you are not allowed to have any sub directories in the R/ folder. Where possible we should have:
The script should share the name of the function or family. If needed then a _utils.R script should be used for Documentation for all data shipped with the packages is kept in
Every exported function or data set gets its own test script, called |
This may be counter-intuitive, but given you've taken my suggestion and gone with it, I feel like I need to post-justify it with some more informed opinions than just what seemed like a good idea to me at the time... I've done a bit of scanning through blogs and there's some interesting stuff in the following articles that's relevant for anyone reading back through this chain (and for us going forward with other packages). Think it broadly fits in with our thinking.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finishing for the day, so sending comments for the bits I've looked through so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor things
Before you start reading, go make a brew ☕ no PR with this many file changes will be a quick read...
Brief overview of changes
Added the Ward-PCon-LAD-LA hiearchy lookup file into the package (currently at https://github.com/dfe-analytical-services/dfe-published-data-qa/blob/main/data/ward_lad_la_pcon_hierarchy.csv), joined on region and country columns and added some helper functions to pull unique locations from that file for Ward, PCon, LAD, LA, and Region level.
Added a countries dataset too, to replace https://github.com/dfe-analytical-services/dfe-published-data-qa/blob/main/data/country.csv.
Also moved some of the code around in a way that I'm hoping will future proof the package a bit. I had some internal debates on this that I've detailed at the end of the PR, as well as some specific questions I'd like a review to feedback on.
Why are these changes being made?
Want to make this easier for everyone in DfE, moving the lookup file allows us to start moving away from relying on the screener repo, and puts the lookups at the fingertips of R users.
Will be a really nice benefit for us too for using in the screener and also the edustats briefing tool (which is currently running off of this branch).
Detailed description of changes
Added a wrapper function for getting lookup data from the ONS API rather than downloading the files manually. Hoping this will prove useful for future lookups too. It has its quirks...
Exported it in case it's helpful for anyone, though strongly suspect it'll only be us who use it.
This is documented in the
data-raw/
folder, folding the package guidance on https://r-pkgs.org/data.html.Created functions in the internal_functions.R script to help neaten the code a bit so hopefully it's easier to use moving forwards.
I've recreated the hierarchy file from the data screener using the ONS API so we can have a more reproducible method when we next come to update it.
I did various bits of QA of the data added against the existing data in the data screener repo and what was downloadable from the Open Geography Portal to dual run with the API. I'm happy that the data we have matches the screener data.
Have added some unit tests to make sure the expected numbers of locations are returned.
Using
waldo::compare()
the difference I found was (suggests the new lookup I've made here is a slight improvement on what we had before):One location in over 24,000 has a code that fails the usual pattern, I think it's safe to assume this was a typo in the ONS 2017 file where they missed a 0 from the PCon code, have fixed this in a commented section for manual fixes and noted it in the documentation for the data set so users are aware.
The change is to make Glasgow East's PCon code
S14000030
instead ofS1400030
I'm hoping I've made these self-explanatory in the documentation - so rather than make this description any longer I'll use this as the first test of that!
Making new 'fetch' functions from lookups should be pretty speedy moving forwards too.
comma_sep()
andround_five_up()
intohelper_functions.R
(as we already list under helper functions in the pkgdown site)create_project()
I haven't changed anything about this function, though did a quick refactor of the validation code as the way it was written before wasn't being styled properly by
styler
. You'd run styler and then lintr would complain, if you then address lintr's issues styler would undo it next time you reran it.The refactor now plays more nicely with styler / lintr and I think this has accidentally resolved #77 too. All the tests are still passing so don't think this needs much reviewing, though tagging @JT-39 for sight on this as the original author of that function.
Additional information for reviewers
Future plans
We want to add more look up files that are currently in the data screener and associated 'fetch' functions:
We should make an extra table / options for the custom locations we've added like 'Unknown' and 'Outside of England' etc. too.
Questions I'm interested in thoughts on
I've tried to combine our scripts a bit more and lay out a template / standard for how we approach it moving forwards as I'm wary we can't have subdirectories in the R folder for a package, meaning we could end up with lots of scripts. and we could easily lose time to having code in inconsistent places and struggling to find it.
I've added some notes into the contributing file describing this, though interested on thoughts about the approach? At a a high level it is
Then
For the creation of data sets, they get their own scripts in
data-raw/
, I expect these will use some of the helper functions (both exported and internal only) to make them more readable and easier to update.Question is, does that make sense as an approach?
There's some tests on the data sets and the exported functions.
There's nothing on the ONS API wrapper as there didn't seem anything obvious to test.
There is also currently no tests on internal non-exported functions.
Question is, I'm not sure if there's more we should be checking here or we're fine with this?
Regions
Currently only England has regions. Which makes me wonder if:
a) we stick as we are now
a1) and therefore should we allow Scotland / Wales / NI as regions? (We know some want it somtimes)
or
b) we should include an alternative level for Scotland / Wales / NI as regions, and in the big lookup file join with those instead for those countries? (thought we might be able to do counties, but ONS only seem to publish them for England too)
To fetch or not fetch
Currently regions and countries have fetch functions that just pull in their data. Maybe they don't need them? Maybe it's helpful? Maybe we need to expand to have a flag for custom DfE locations? Thoughts?
Issue ticket number/s and link
#77 is the only issue linked to this