grep(pattern, ...) et al.: Optionally escalate warning on "argument 'pattern' has length > 1 ..." to an error #8
HenrikBengtsson
started this conversation in
Ideas
Replies: 1 comment
-
I've submitted this to BugZilla, cf. PR18577. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Issue
The
grep()
family of functions produces a warning ifpattern
specified more than one regular expression, e.g.I cannot come up with a valid use case where using
length(pattern) != 1
is not a mistake. Because of this, I argue it's a bug. Because warnings might be ignored, or drown in the noise of other warnings, this is almost as problematic as a silent bug. There is a risk that there is code out there that produces incorrect results, without being noticed.I've checked R 4.3.1 and found the following functions with this lax behavior:
In the base package:
agrep()
agrepl()
grep()
grepl()
grepRaw()
regexpr()
gregexpr()
regexec()
gsub()
sub()
There might be more.
Suggestion
For the sprint
As a starter, introduce an "internal" environment variable, e.g.
_R_CHECK_LENGTH_1_PATTERN_
, that can be used to escalate the warning to an error:Long-term
I'd argue that using
length(pattern) > 1
is a bug, similar to howlength(cond) > 1
is a bug inif (cond) { ... }
. For the latter, we started out with_R_CHECK_LENGTH_1_CONDITION_=true
as above in R (>= 3.4.0), but since that was too harsh to enable on CRAN, support for per-package checking was added as:in R (>= 4.0.0). That would only escalate to an error, if the buggy code was part of the package tested, otherwise it remained a warning. This allowed CRAN to have packages fixed one by one without breaking the package ecosystem. Eventually, enough packages were fixed, and they made
_R_CHECK_LENGTH_1_CONDITION_=true
the default. And in R (>= 4.2.0), erroring was the only outcome and_R_CHECK_LENGTH_1_CONDITION_
could be dropped.A similar path forward can be made to fix
length(pattern) > 1
bugs, which means the next step after the above would be to add support for:There is no need to implement the latter during the sprint. Having
_R_CHECK_LENGTH_1_PATTERN_=true
will be a big step forward.Comment: I think it was @kalibera who implemented
_R_CHECK_LENGTH_1_CONDITION_=package:_R_CHECK_PACKAGE_NAME_,verbose
.PS. This proposal was taken from HenrikBengtsson/Wishlist-for-R#122.
Beta Was this translation helpful? Give feedback.
All reactions