-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JEP-14a] Clarified error type precedence. #157
Open
springcomp
wants to merge
1
commit into
main
Choose a base branch
from
spec/error-types
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,296 @@ | ||
# String Functions | ||
|
||
||| | ||
|---|--- | ||
| **JEP** | 14a | ||
| **Author** | Maxime Labelle, Chris Armstrong (GorillaStack), Richard Gibson | ||
| **SemVer** | MINOR | ||
| **Status**| draft | ||
| **Created**| 13-October-2022 | ||
| **Obsoletes**| [JEP-14](./jep-014-string-functions.md) | ||
|
||
## Addendum | ||
|
||
|Date|Description | ||
|---|---| | ||
|15-March-2023|Clarified error type precedence. | ||
|
||
## Abstract | ||
|
||
This JEP introduces a core set of useful string manipulation functions. Those functions are modeled from functions found in popular programming languages such as JavaScript and Python. | ||
|
||
## Specification | ||
|
||
Some string manipulation functions bring the new concept of _optional arguments_ to JMESPath functions. The specification paragraph on function evaluation must thus be changed accordingly – highlighted in **bold** in the text below: | ||
|
||
_Functions can ~~either~~ have a specific arity, **a range of valid – minimum and maximum – number of arguments** or be variadic with a minimum number of arguments. If a function-expression is encountered where the arity does not match or the minimum number of arguments for a variadic function is not provided, then implementations must indicate to the caller that an invalid-arity error occurred. How and when this error is raised is implementation specific._ | ||
|
||
Some functions accept number arguments which are further constrained to integers or even non-negative integers. This JEP specifies a new error | ||
type `invalid-value` by updating the paragraph on type constraints from the specification like so: | ||
|
||
_Each function signature declares the types of its input parameters. If any type constraints are not met, implementations must indicate that an `invalid-type` error occurred. **If a function parameter accepts values constrained to a specific subset of a type and those constraints are not met, implementations must report that an `invalid-value` error occurred.**_ | ||
|
||
_The [initial version of this JEP](./jep-014-string-functions.md) had a provision stating that_ “How and when those errors are raised is implementation specific”. _This provision has been removed. Implementation must perform type-checking for all function parameters_ before _attempting to evaluate the set of valid values for a given type._ | ||
|
||
|
||
### find_first | ||
|
||
``` | ||
int find_first(string $subject, string $sub[, int $start[, int $end]]) | ||
``` | ||
Given the `$subject` string, `find_first()` returns the zero-based index of the first occurence where the `$sub` substring appears in `$subject` or `null` if it does not appear. If either the `$subject` or the `$sub` argument is an empty string, `find_first()` returns `null`. | ||
|
||
The `$start` and `$end` parameters are optional and allow restricting to the slice `[$start:$end]` the range within `$subject` in which `$sub` must be found. | ||
|
||
- If `$start` is omitted, it defaults to `0` (which is the start of the `$subject` string). | ||
- If `$end` is omitted, it defaults to `length(subject)` (which is past the end of the `$subject` string). | ||
|
||
If not omitted, the `$start` or `$end` arguments are expected to be integers. Otherwise, an error MUST be raised. | ||
|
||
Contrary to similar functions found in most popular programming languages, the `find_first()` function does not return `-1` if no occurrence of the substring can be found. Instead, it returns `null` for consistency reasons with how JMESPath behaves. | ||
|
||
### Examples | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `"subject string"` | `` find_first(@, 'string') `` | `8` | ||
| `"subject string"` | `` find_first(@, 'string', `0`) `` | `8` | ||
| `"subject string"` | `` find_first(@, 'string', `0`, `14`) `` | `8` | ||
| `"subject string"` | `` find_first(@, 'string', `-99`, `100`) `` | `8` | ||
| `"subject string"` | `` find_first(@, 'string', `-6`) `` | `8` | ||
| `"subject string"` | `` find_first(@, 'string', `0`, `13`) `` | `null` | ||
| `"subject string"` | `` find_first(@, 'string', `8`) `` | `8` | ||
| `"subject string"` | `` find_first(@, 'string', `8`, `11`) `` | `null` | ||
| `"subject string"` | `` find_first(@, 'string', `9`) `` | `null` | ||
| `"subject string"` | `` find_first(@, 's') `` | `0` | ||
| `"subject string"` | `` find_first(@, 's', `1`) `` | `8` | ||
| `"subject string"` | `` find_first(@, '') `` | `null` | ||
|
||
### find_last | ||
|
||
``` | ||
int find_last(string $subject, string $sub[, int $start[, int $end]]) | ||
``` | ||
Given the `$subject` string, `find_last()` returns the zero-based index of the last occurence where the `$sub` substring appears in `$subject` or `null` if it does not appear. If either the `$subject` or the `$sub` argument is an empty string, `find_last()` returns `null`. | ||
|
||
The `$start` and `$end` parameters are optional and allow restricting to the slice `[$start:$end]` the range within `$subject` in which `$sub` must be found. | ||
|
||
- If `$start` is omitted, it defaults to `0` (which is the start of the `$subject` string). | ||
- If `$end` is omitted, it defaults to `length(subject)` (which is past the end of the `$subject` string). | ||
|
||
If not omitted, the `$start` or `$end` arguments are expected to be integers. Otherwise, an error MUST be raised. | ||
|
||
Contrary to similar functions found in most popular programming languages, the `find_last()` function does not return `-1` if no occurrence of the substring can be found. Instead, it returns `null` for consistency reasons with how JMESPath behaves. | ||
|
||
### Examples | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `"subject string"` | `` find_last(@, 'string') `` | `8` | ||
| `"subject string"` | `` find_last(@, 'string', `8`) `` | `8` | ||
| `"subject string"` | `` find_last(@, 'string', `8`, `9`) `` | `null` | ||
| `"subject string"` | `` find_last(@, 'string', `9`) `` | `null` | ||
| `"subject string"` | `` find_last(@, 's') `` | `0` | ||
| `"subject string"` | `` find_last(@, 's', `1`) `` | `8` | ||
| `"subject string"` | `` find_last(@, 's', `0`, `7`) `` | `0` | ||
| `"subject string"` | `` find_last(@, '') `` | `null` | ||
|
||
### lower | ||
|
||
``` | ||
string lower(string $subject) | ||
``` | ||
Returns the lowercase `$subject` string using Unicode default casing conversion specification. | ||
|
||
### Examples | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `"STRING"` | `` lower(@) `` | `"string"` | ||
|
||
### pad_left | ||
|
||
``` | ||
string pad_left(string $subject, number $width[, string $pad]) | ||
``` | ||
|
||
Given the `$subject` string, `pad_left()` adds characters to the beginning and returns a string of length at least `$width`. | ||
|
||
The `$pad` optional string parameter specifies the padding character. | ||
If omitted, it defaults to an ASCII space (U+0020). | ||
If present, it MUST have length 1, otherwise an error MUST be raised. | ||
|
||
If the `$subject` string has length greater than or equal to `$width`, it is returned unmodified. | ||
|
||
If `$width` is not an integer or is negative, an error MUST be raised. | ||
|
||
### Examples | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `"string"` | `` pad_left(@, `0`) `` | `"string"` | ||
| `"string"` | `` pad_left(@, `5`) `` | `"string"` | ||
| `"string"` | `` pad_left(@, `10`) `` | `" string"` | ||
| `"string"` | `` pad_left(@, `10`, '-') `` | `"----string"` | ||
|
||
### pad_right | ||
|
||
``` | ||
string pad_right(string $subject, number $width[, string $pad]) | ||
``` | ||
|
||
Given the `$subject` string, `pad_right()` adds characters to the end and returns a string of length at least `$width`. | ||
|
||
The `$pad` optional string parameter specifies the padding character. | ||
If omitted, it defaults to an ASCII space (U+0020). | ||
If present, it MUST have length 1, otherwise an error MUST be raised. | ||
|
||
If the `$subject` string has length greater than or equal to `$width`, it is returned unmodified. | ||
|
||
If `$width` is not an integer or is negative, an error MUST be raised. | ||
|
||
### Examples | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `"string"` | `` pad_right(@, `0`) `` | `"string"` | ||
| `"string"` | `` pad_right(@, `5`) `` | `"string"` | ||
| `"string"` | `` pad_right(@, `10`) `` | `"string "` | ||
| `"string"` | `` pad_right(@, `10`, '-') `` | `"string----"` | ||
|
||
### replace | ||
|
||
``` | ||
string replace(string $subject, string $old, string $new[, number $count]) | ||
``` | ||
Given the `$subject` string, `replace()` replaces occurrences of the `$old` substring with the `$new` substring. | ||
|
||
The `$count` optional integer specifies how many occurrences of the `$old` substring in `$subject` are replaced. If this parameter is omitted, all occurrences are replaced. If `$count` is not an integer or is negative, an error MUST be raised. | ||
|
||
The `replace()` function has no effect if `$count` is `0`. | ||
|
||
### Examples | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `0`) `` | `"aabaaabaaaab"` | ||
| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `1`) `` | `"-baaabaaaab"` | ||
| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `2`) `` | `"-b-abaaaab"` | ||
| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `3`) `` | `"-b-ab-aab"` | ||
| `"aabaaabaaaab"` | `` replace(@, 'aa', '-') `` | `"-b-ab--b"` | ||
|
||
### split | ||
|
||
``` | ||
array[string] split(string $subject, string $search[, number $count]) | ||
``` | ||
|
||
Given the `$subject` string, `split()` breaks on ocurrences of the string `$search` and returns an array. | ||
|
||
The `split()` function returns an array containing each partial string between occurrences of `$search`. If `$subject` contains no occurrences of the `$search` string, an array containing just the original `$subject` string will be returned. | ||
|
||
If the `$search` argument is an empty string, `split()` breaks on every character and returns an array containing each character from the `$subject` string. Thus, if `$subject` is _also_ an empty string, `split()` returns an empty array. | ||
|
||
The `$count` optional integer specifies the maximum number of split points within the `$search` string. | ||
If this parameter is omitted, all occurrences are split. If `$count` is not an integer or is negative, an error MUST be raised. | ||
|
||
If `$count` is equal to `0`, `split()` returns an array containing a single element, the `$subject` string. | ||
|
||
Otherwise, the `split()` function breaks on occurrences of the `$search` string up to `$count` times. The last string in the resulting array containing the remaining contents of `$subject` unmodified. | ||
|
||
**Note**: The `split()` function was [originally designed by Chris Armstrong](https://github.com/GorillaStack/jmespath.site/blob/master/docs/proposals/string-manipulation.rst). However, its behaviour has been slightly altered for consistency reasons. | ||
|
||
### Examples | ||
|
||
| Expression | Result | ||
|---|--- | ||
| `split('', '')` | `[]` | ||
| `split('all chars', '')` | `[ "a", "l", "l", " ", "c", "h", "a", "r", "s" ]` | ||
| `split('/', '/')` | `[ "", "" ]` | | ||
|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|') `` | `[ "average", "min", "max", "mean", "median" ]` | ||
|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `3`) `` | `[ "average", "min", "max", "mean\|-\|median" ]` | ||
|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `2`) `` | `[ "average", "min", "max\|-\|mean\|-\|median" ]` | ||
|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `1`) `` | `[ "average", "min\|-\|max\|-\|mean\|-\|median" ]` | ||
|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `0`) `` | `[ "average\|-\|min\|-\|max\|-\|mean\|-\|median" ]` | ||
| `split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '-')` | `[ "average\|", "\|min\|", "\|max\|", "\|mean\|", "\|median" ]` | ||
|
||
## Specification | ||
|
||
### trim | ||
|
||
``` | ||
string trim(string $subject[, string $chars]) | ||
``` | ||
Given the `$subject` string, `trim()` removes the leading and trailing characters found in `$chars`. | ||
|
||
The `$chars` optional string parameter represents a set of characters to be removed. If this parameter is not specified, or is an empty string, whitespace characters are removed from the `$subject` string. Whitespaces are defined by the Unicode standard as codepoints having the `White_Space` property set to `Yes`. | ||
|
||
### Examples | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `" subject string "` | `` trim(@) `` | `"subject string"` | ||
| `" subject string "` | `` trim(@, '') `` | `"subject string"` | ||
| `" subject string "` | `` trim(@, ' ') `` | `"subject string"` | ||
| `" subject string "` | `` trim(@, 's') `` | `" subject string "` | ||
| `" subject string "` | `` trim(@, 'su') `` | `" subject string "` | ||
| `" subject string "` | `` trim(@, 'su ') `` | `"bject string"` | ||
| `" subject string "` | `` trim(@, 'gsu ') `` | `"bject strin"` | ||
|
||
### trim_left | ||
|
||
``` | ||
string trim_left(string $subject[, string $chars]) | ||
``` | ||
Given the `$subject` string, `trim_left()` removes the leading characters found in `$chars`. | ||
|
||
Like for the `trim()` function, the `$chars` optional string parameter represents a set of characters to be removed. `trim_left()` defaults to removing whitespace characters if `$chars` is not specified or is an empty string. | ||
|
||
### Examples | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `" subject string "` | `` trim_left(@) `` | `"subject string "` | ||
| `" subject string "` | `` trim_left(@, 's') `` | `" subject string "` | ||
| `" subject string "` | `` trim_left(@, 'su') `` | `" subject string "` | ||
| `" subject string "` | `` trim_left(@, 'su ') `` | `"bject string "` | ||
| `" subject string "` | `` trim_left(@, 'gsu ') `` | `"bject string "` | ||
|
||
### trim_right | ||
|
||
``` | ||
string trim_right(string $subject[, string $chars]) | ||
``` | ||
Given the `$subject` string, `trim_right()` removes the trailing characters found in `$chars`. | ||
|
||
Like for the `trim()` and `trim_left()` functions, the `$chars` optional string parameter represents a set of characters to be removed. `trim_right()` defaults to removing whitespace characters if `$chars` is not specified or is an empty string. | ||
|
||
### Examples | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `" subject string "` | `` trim_right(@) `` | `" subject string"` | ||
| `" subject string "` | `` trim_right(@, 's') `` | `" subject string "` | ||
| `" subject string "` | `` trim_right(@, 'su') `` | `" subject string "` | ||
| `" subject string "` | `` trim_right(@, 'su ') `` | `" subject string"` | ||
| `" subject string "` | `` trim_right(@, 'gsu ') `` | `" subject strin"` | ||
|
||
### upper | ||
|
||
``` | ||
string upper(string $subject) | ||
``` | ||
Returns the uppercase `$subject` string using Unicode default casing conversion specification. | ||
|
||
| Given | Expression | Result | ||
|---|---|--- | ||
| `"string"` | `` upper(@) `` | `"STRING"` | ||
|
||
## Compliance | ||
|
||
A new `string_functions.json` file will be added to the compliance tests. | ||
The test suite will introduce the following new error type: | ||
|
||
- invalid-value | ||
|
||
This error type would be raised by `split()` for instance, if its `$count` parameter is negative or not an integer. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The requirement of type validation before range validation should not be limited to string functions... I think this deserves its own JEP rather than a replacement of JEP-14 (in which case I don't think it will be necessary to touch JEP-14 at all, because "How and when those errors are raised is implementation specific" can be interpreted within the context of that constraint (e.g., "how" might be "by exception" vs. "in a separate return value" and "when" might be "when parsing" [where possible] vs. "when evaluating").