jmespath-community · springcomp · Mar 15, 2023 · gibson042 · Mar 15, 2023
diff --git a/jep-014-string-functions.md b/jep-014-string-functions.md
@@ -4,9 +4,10 @@
 |---|---
 | **JEP**    |  14
 | **Author** | Maxime Labelle, Chris Armstrong (GorillaStack), Richard Gibson
-| **Created**| 13-October-2022
 | **SemVer** | MINOR
 | **Status**| accepted
+| **Created**| 13-October-2022
+| **Obsoleted by**| [JEP-14a](./jep-014a-string-functions.md)
 
 ## Abstract
 

diff --git a/jep-014a-string-functions.md b/jep-014a-string-functions.md
@@ -0,0 +1,296 @@
+# String Functions
+
+|||
+|---|---
+| **JEP**    | 14a
+| **Author** | Maxime Labelle, Chris Armstrong (GorillaStack), Richard Gibson
+| **SemVer** | MINOR
+| **Status**| draft
+| **Created**| 13-October-2022
+| **Obsoletes**| [JEP-14](./jep-014-string-functions.md)
+
+## Addendum
+
+|Date|Description
+|---|---|
+|15-March-2023|Clarified error type precedence.
+
+## Abstract
+
+This JEP introduces a core set of useful string manipulation functions. Those functions are modeled from functions found in popular programming languages such as JavaScript and Python.
+
+## Specification
+
+Some string manipulation functions bring the new concept of _optional arguments_ to JMESPath functions. The specification paragraph on function evaluation must thus be changed accordingly – highlighted in **bold** in the text below:
+
+_Functions can ~~either~~ have a specific arity, **a range of valid – minimum and maximum – number of arguments** or be variadic with a minimum number of arguments. If a function-expression is encountered where the arity does not match or the minimum number of arguments for a variadic function is not provided, then implementations must indicate to the caller that an invalid-arity error occurred. How and when this error is raised is implementation specific._
+
+Some functions accept number arguments which are further constrained to integers or even non-negative integers. This JEP specifies a new error
+type `invalid-value` by updating the paragraph on type constraints from the specification like so:
+
+_Each function signature declares the types of its input parameters. If any type constraints are not met, implementations must indicate that an `invalid-type` error occurred. **If a function parameter accepts values constrained to a specific subset of a type and those constraints are not met, implementations must report that an `invalid-value` error occurred.**_
+
+_The [initial version of this JEP](./jep-014-string-functions.md) had a provision stating that_ “How and when those errors are raised is implementation specific”. _This provision has been removed. Implementation must perform type-checking for all function parameters_ before _attempting to evaluate the set of valid values for a given type._
+
+
+### find_first
+
+```
+int find_first(string $subject, string $sub[, int $start[, int $end]])
+```
+Given the `$subject` string, `find_first()` returns the zero-based index of the first occurence where the `$sub` substring appears in `$subject` or `null` if it does not appear. If either the `$subject` or the `$sub` argument is an empty string, `find_first()` returns `null`.
+
+The `$start` and `$end` parameters are optional and allow restricting to the slice `[$start:$end]` the range within `$subject` in which `$sub` must be found.
+
+- If `$start` is omitted, it defaults to `0` (which is the start of the `$subject` string).
+- If `$end` is omitted, it defaults to `length(subject)` (which is past the end of the `$subject` string).
+
+If not omitted, the `$start` or `$end` arguments are expected to be integers. Otherwise, an error MUST be raised.
+
+Contrary to similar functions found in most popular programming languages, the `find_first()` function does not return `-1` if no occurrence of the substring can be found. Instead, it returns `null` for consistency reasons with how JMESPath behaves.
+
+### Examples
+
+| Given | Expression | Result
+|---|---|---
+| `"subject string"` | `` find_first(@, 'string') `` |  `8`
+| `"subject string"` | `` find_first(@, 'string', `0`) `` |  `8`
+| `"subject string"` | `` find_first(@, 'string', `0`, `14`) `` |  `8`
+| `"subject string"` | `` find_first(@, 'string', `-99`, `100`) `` |  `8`
+| `"subject string"` | `` find_first(@, 'string', `-6`) `` |  `8`
+| `"subject string"` | `` find_first(@, 'string', `0`, `13`) `` |  `null`
+| `"subject string"` | `` find_first(@, 'string', `8`) `` |  `8`
+| `"subject string"` | `` find_first(@, 'string', `8`, `11`) `` |  `null`
+| `"subject string"` | `` find_first(@, 'string', `9`) `` |  `null`
+| `"subject string"` | `` find_first(@, 's') `` |  `0`
+| `"subject string"` | `` find_first(@, 's', `1`) `` |  `8`
+| `"subject string"` | `` find_first(@, '') `` |  `null`
+
+### find_last
+
+```
+int find_last(string $subject, string $sub[, int $start[, int $end]])
+```
+Given the `$subject` string, `find_last()` returns the zero-based index of the last occurence where the `$sub` substring appears in `$subject` or `null` if it does not appear. If either the `$subject` or the `$sub` argument is an empty string, `find_last()` returns `null`.
+
+The `$start` and `$end` parameters are optional and allow restricting to the slice `[$start:$end]` the range within `$subject` in which `$sub` must be found.
+
+- If `$start` is omitted, it defaults to `0` (which is the start of the `$subject` string).
+- If `$end` is omitted, it defaults to `length(subject)` (which is past the end of the `$subject` string).
+
+If not omitted, the `$start` or `$end` arguments are expected to be integers. Otherwise, an error MUST be raised.
+
+Contrary to similar functions found in most popular programming languages, the `find_last()` function does not return `-1` if no occurrence of the substring can be found. Instead, it returns `null` for consistency reasons with how JMESPath behaves.
+
+### Examples
+
+| Given | Expression | Result
+|---|---|---
+| `"subject string"` | `` find_last(@, 'string') `` |  `8`
+| `"subject string"` | `` find_last(@, 'string', `8`) `` |  `8`
+| `"subject string"` | `` find_last(@, 'string', `8`, `9`) `` |  `null`
+| `"subject string"` | `` find_last(@, 'string', `9`) `` |  `null`
+| `"subject string"` | `` find_last(@, 's') `` |  `0`
+| `"subject string"` | `` find_last(@, 's', `1`) `` |  `8`
+| `"subject string"` | `` find_last(@, 's', `0`, `7`) `` |  `0`
+| `"subject string"` | `` find_last(@, '') `` |  `null`
+
+### lower
+
+```
+string lower(string $subject)
+```
+Returns the lowercase `$subject` string using Unicode default casing conversion specification.
+
+### Examples
+
+| Given | Expression | Result
+|---|---|---
+| `"STRING"` | `` lower(@) `` |  `"string"`
+
+### pad_left
+
+```
+string pad_left(string $subject, number $width[, string $pad])
+```
+
+Given the `$subject` string, `pad_left()` adds characters to the beginning and returns a string of length at least `$width`.
+
+The `$pad` optional string parameter specifies the padding character.
+If omitted, it defaults to an ASCII space (U+0020).
+If present, it MUST have length 1, otherwise an error MUST be raised.
+
+If the `$subject` string has length greater than or equal to `$width`, it is returned unmodified.
+
+If `$width` is not an integer or is negative, an error MUST be raised.
+
+### Examples
+
+| Given | Expression | Result
+|---|---|---
+| `"string"` | `` pad_left(@, `0`) `` |  `"string"`
+| `"string"` | `` pad_left(@, `5`) `` |  `"string"`
+| `"string"` | `` pad_left(@, `10`) `` |  `"    string"`
+| `"string"` | `` pad_left(@, `10`, '-') `` |  `"----string"`
+
+### pad_right
+
+```
+string pad_right(string $subject, number $width[, string $pad])
+```
+
+Given the `$subject` string, `pad_right()` adds characters to the end and returns a string of length at least `$width`.
+
+The `$pad` optional string parameter specifies the padding character.
+If omitted, it defaults to an ASCII space (U+0020).
+If present, it MUST have length 1, otherwise an error MUST be raised.
+
+If the `$subject` string has length greater than or equal to `$width`, it is returned unmodified.
+
+If `$width` is not an integer or is negative, an error MUST be raised.
+
+### Examples
+
+| Given | Expression | Result
+|---|---|---
+| `"string"` | `` pad_right(@, `0`) `` |  `"string"`
+| `"string"` | `` pad_right(@, `5`) `` |  `"string"`
+| `"string"` | `` pad_right(@, `10`) `` |  `"string    "`
+| `"string"` | `` pad_right(@, `10`, '-') `` |  `"string----"`
+
+### replace
+
+```
+string replace(string $subject, string $old, string $new[, number $count])
+```
+Given the `$subject` string, `replace()` replaces occurrences of the `$old` substring with the `$new` substring.
+
+The `$count` optional integer specifies how many occurrences of the `$old` substring in `$subject` are replaced. If this parameter is omitted, all occurrences are replaced. If `$count` is not an integer or is negative, an error MUST be raised.
+
+The `replace()` function has no effect if `$count` is `0`.
+
+### Examples
+
+| Given | Expression | Result
+|---|---|---
+| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `0`) `` |  `"aabaaabaaaab"`
+| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `1`) `` |  `"-baaabaaaab"`
+| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `2`) `` |  `"-b-abaaaab"`
+| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `3`) `` |  `"-b-ab-aab"`
+| `"aabaaabaaaab"` | `` replace(@, 'aa', '-') `` |  `"-b-ab--b"`
+
+### split
+
+```
+array[string] split(string $subject, string $search[, number $count])
+```
+
+Given the `$subject` string, `split()` breaks on ocurrences of the string `$search` and returns an array.
+
+The `split()` function returns an array containing each partial string between occurrences of `$search`. If  `$subject` contains no occurrences of the `$search` string, an array containing just the original `$subject` string will be returned.
+
+If the `$search` argument is an empty string, `split()` breaks on every character and returns an array containing each character from the `$subject` string. Thus, if `$subject` is _also_ an empty string, `split()` returns an empty array.
+
+The `$count` optional integer specifies the maximum number of split points within the `$search` string.
+If this parameter is omitted, all occurrences are split. If `$count` is not an integer or is negative, an error MUST be raised.
+
+If `$count` is equal to `0`, `split()` returns an array containing a single element, the `$subject` string.
+
+Otherwise, the `split()` function breaks on occurrences of the `$search` string up to `$count` times. The last string in the resulting array containing the remaining contents of `$subject` unmodified.
+
+**Note**: The `split()` function was [originally designed by Chris Armstrong](https://github.com/GorillaStack/jmespath.site/blob/master/docs/proposals/string-manipulation.rst). However, its behaviour has been slightly altered for consistency reasons.
+
+### Examples
+
+| Expression | Result
+|---|---
+| `split('', '')` | `[]`
+| `split('all chars', '')` | `[ "a", "l", "l", " ", "c", "h", "a", "r", "s" ]`
+| `split('/', '/')` | `[ "", "" ]` |
+|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|') `` | `[ "average", "min", "max", "mean", "median" ]` 
+|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `3`) `` | `[ "average", "min", "max", "mean\|-\|median" ]`
+|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `2`) `` | `[ "average", "min", "max\|-\|mean\|-\|median" ]`
+|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `1`) `` | `[ "average", "min\|-\|max\|-\|mean\|-\|median" ]`
+|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `0`) `` | `[ "average\|-\|min\|-\|max\|-\|mean\|-\|median" ]`
+| `split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '-')` | `[ "average\|", "\|min\|", "\|max\|", "\|mean\|", "\|median" ]`
+
+## Specification
+
+### trim
+
+```
+string trim(string $subject[, string $chars])
+```
+Given the `$subject` string, `trim()` removes the leading and trailing characters found in `$chars`.
+
+The `$chars` optional string parameter represents a set of characters to be removed. If this parameter is not specified, or is an empty string, whitespace characters are removed from the `$subject` string. Whitespaces are defined by the Unicode standard as codepoints having the `White_Space` property set to `Yes`.
+
+### Examples
+
+| Given | Expression | Result
+|---|---|---
+| `"   subject string   "` | `` trim(@) `` |  `"subject string"`
+| `"   subject string   "` | `` trim(@, '') `` |  `"subject string"`
+| `"   subject string   "` | `` trim(@, ' ') `` |  `"subject string"`
+| `"   subject string   "` | `` trim(@, 's') `` |  `"   subject string   "`
+| `"   subject string   "` | `` trim(@, 'su') `` |  `"   subject string   "`
+| `"   subject string   "` | `` trim(@, 'su ') `` |  `"bject string"`
+| `"   subject string   "` | `` trim(@, 'gsu ') `` |  `"bject strin"`
+
+### trim_left
+
+```
+string trim_left(string $subject[, string $chars])
+```
+Given the `$subject` string, `trim_left()` removes the leading characters found in `$chars`.
+
+Like for the `trim()` function, the `$chars` optional string parameter represents a set of characters to be removed. `trim_left()` defaults to removing whitespace characters if `$chars` is not specified or is an empty string.
+
+### Examples
+
+| Given | Expression | Result
+|---|---|---
+| `"   subject string   "` | `` trim_left(@) `` |  `"subject string   "`
+| `"   subject string   "` | `` trim_left(@, 's') `` |  `"   subject string   "`
+| `"   subject string   "` | `` trim_left(@, 'su') `` |  `"   subject string   "`
+| `"   subject string   "` | `` trim_left(@, 'su ') `` |  `"bject string   "`
+| `"   subject string   "` | `` trim_left(@, 'gsu ') `` |  `"bject string   "`
+
+### trim_right
+
+```
+string trim_right(string $subject[, string $chars])
+```
+Given the `$subject` string, `trim_right()` removes the trailing characters found in `$chars`.
+
+Like for the `trim()` and `trim_left()` functions, the `$chars` optional string parameter represents a set of characters to be removed. `trim_right()` defaults to removing whitespace characters if `$chars` is not specified or is an empty string.
+
+### Examples
+
+| Given | Expression | Result
+|---|---|---
+| `"   subject string   "` | `` trim_right(@) `` |  `"   subject string"`
+| `"   subject string   "` | `` trim_right(@, 's') `` |  `"   subject string   "`
+| `"   subject string   "` | `` trim_right(@, 'su') `` |  `"   subject string   "`
+| `"   subject string   "` | `` trim_right(@, 'su ') `` |  `"   subject string"`
+| `"   subject string   "` | `` trim_right(@, 'gsu ') `` |  `"   subject strin"`
+
+### upper
+
+```
+string upper(string $subject)
+```
+Returns the uppercase `$subject` string using Unicode default casing conversion specification.
+
+| Given | Expression | Result
+|---|---|---
+| `"string"` | `` upper(@) `` |  `"STRING"`
+
+## Compliance
+
+A new `string_functions.json` file will be added to the compliance tests.
+The test suite will introduce the following new error type:
+
+- invalid-value
+
+This error type would be raised by `split()` for instance, if its `$count` parameter is negative or not an integer.