-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API character limit warnings #146
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The obvious thing that jumps out is why not have a shared function that does the bulk of the work?
lengths_check <- function(entries, entry_type, length_limit){
lengths_table <- data.frame(
"entries" = entries
"length" = unlist(lapply(entries, nchar), use.names = FALSE)
)
lengths_too_long <- lengths_table[lengths_table$length > length_limit, "entries"]
if (length(lengths_too_long) == 0) {
output <- list(
"message" = paste("All location codes are ", length_limit," characters or fewer."),
"result" = "PASS"
)
} else {
if (length(lengths_too_long) == 1) {
output <- list(
"message" = paste0("The following ", entry_type," is over ", length_limit," characters, this will need shortening before this data can be published through the API: '", paste(lengths_too_long, collapse = "', '"), "'."),
"result" = "ADVISORY"
)
}
}
...
}
Then within each test you have a line like lengths_check(location_codes, "location codes", 30)
@rmbielby you're absolutely right, though... Writing it as it was broadly followed how the rest of the project was written and was super fast as GitHub copilot was spotted the patterns and giving me 90% of it as suggestions, so rewriting it like this will probably take me longer than the initial writing did to start with (not particularly long but still something). That being said, I nearly went to do this just now, though I think it's probably not worth it when we can tackle this pretty soon separately when converting to the package, there's plenty of other code in this repo in the same kind of way, so I'm going to leave it for now and expect we can sweep it up with the other tidy up into a file of internal only functions when moving to a package. |
Brief overview of changes
This adds 5 new tests to check for character limits in data sets according to the standards set for the API - https://dfe-analytical-services.github.io/analysts-guide/statistics-production/api-data-standards.html#character-limits-for-col_names-and-filter-items
Why are these changes being made?
To highlight to publishers where their current data may need to change and to allow us to spot any early issues with the limits.
Detailed description of changes
5 new tests added, including automated tests and test data for them, all as advisory warnings for now. At a later point they can be added to the API suite of tests (once that exists) as a failure.
Also added a .gitattributes file to force the repo to always be CRLF, which also helped me to then be able to run
git add --renormalize .
to undo what I'd done from my other laptop knocking files to just LF. In that I've ignored any binary files so that we don't inadvertently break them.Additional information for reviewers
Nothing extra to add.
Issue ticket number/s and link
No issue on GitHub, but resolves this Trello card - https://trello.com/c/jenmz3hG/1708-add-in-warnings-for-character-limits-into-screener-thinking-ahead-to-api-data