Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(text/unstable): add replaceStart, replaceEnd, replaceBoth functions (#6265) #6286

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lionel-rowe
Copy link
Contributor

@lionel-rowe lionel-rowe commented Dec 20, 2024

Resolves #6265.

Roughly speaking, these replicate the combined functionality of trim_matches, trim_start_matches, trim_end_matches, strip_prefix, and strip_suffix (plus a hypothetical strip_prefix_and_suffix) from the Rust standard library.

@lionel-rowe lionel-rowe requested a review from kt3k as a code owner December 20, 2024 12:56
@github-actions github-actions bot added the text label Dec 20, 2024
@lionel-rowe lionel-rowe changed the title feat(text/unstable): add strip function (#6265) feat(text/unstable): add strip, stripStart, stripEnd functions (#6265) Dec 20, 2024
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.34%. Comparing base (4989ba7) to head (ad6098a).
Report is 52 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6286      +/-   ##
==========================================
- Coverage   96.52%   96.34%   -0.19%     
==========================================
  Files         533      548      +15     
  Lines       40871    41708     +837     
  Branches     6120     6316     +196     
==========================================
+ Hits        39451    40182     +731     
- Misses       1378     1483     +105     
- Partials       42       43       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 141 to 147
const count = options?.count ?? Infinity;

for (let i = 0; i < count; ++i) {
str = str.replace(regExp, "");
if (str === prev) break;
prev = str;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Infinity? If the string doesn't include the pattern to strip, it will loop forever, won't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Infinity? If the string doesn't include the pattern to strip, it will loop forever, won't it?

It'll break on the str === prev condition as soon as there's nothing left to strip.

const aaa = 'a'.repeat(1000)
const aab = aaa + 'b'
const bbb = 'b'.repeat(1000)

stripStart(aaa, 'a') === ''
stripStart(aab, 'a') === 'b'
stripStart(bbb, 'a') === bbb

@timreichen
Copy link
Contributor

I think the count option should not be part of this api. Instead, the user can always call the function repeatedly or in a loop themselves. In that sense, I think stripStart() and stripEnd() should be the only exported functions, and maybe a stripAll() to reflect replaceAll().

@lionel-rowe
Copy link
Contributor Author

lionel-rowe commented Dec 24, 2024

I think the count option should not be part of this api. Instead, the user can always call the function repeatedly or in a loop themselves. In that sense, I think stripStart() and stripEnd() should be the only exported functions, and maybe a stripAll() to reflect replaceAll().

@timreichen Probably worth getting some more use cases, but if I grep my code for functions named *strip*, mostly they simply strip out all occurrences (usually based on replaceAll or similar, no need for a library function), and in a significant minority of cases they remove all prefixes and suffixes, like this:

// strip(str, /[^\p{L}\p{M}]+/gu, { count: Infinity })
function stripNonLetter(str: string) {
    return str.replaceAll(/^[^\p{L}\p{M}]+|[^\p{L}\p{M}]+$/gu, '')
}

// strip(str, /[_\- ]+/g, { count: Infinity })
const strip = (str: string) => str.replaceAll(/^[_\- ]+|[_\- ]+$/g, '')

There are also a few cases where they strip a single instance at the start or end and then stop there, and it sounds like @canac 's use case fits here:

// stripStart(str, '\ufeff', { count: 1 })
function stripBom(str: string) {
   return str.replace(/^\ufeff/, '')
}

I guess the "strip up to n occurrences" case is rare when 1 < n < Infinity, but very common (possibly the most common) when n == Infinity, and also reasonably common where n == 1. So maybe a better API is options?: { all?: boolean } or options?: { single?: boolean }, depending on what the chosen default is (no strong opinion either way on the default).

I'm not keen on the idea of doubling the API surface area from strip/stripStart/stripEnd to strip/stripStart/stripEnd/stripAll/stripAllStart/stripAllEnd, even though that's essentially what Rust std library does. Seems like overkill in JS/TS.

Another way of supporting single/all, but without a specific option, would be reading the g flag if the supplied pattern is a regex. I'm also not keen on this option as it doesn't support string patterns, and it's also not really the correct semantics for the g flag. Currently I just ignore g and y flags, but maybe throwing would be better if g doesn't match all-ness (and always throwing if y supplied).

@timreichen
Copy link
Contributor

timreichen commented Dec 24, 2024

@timreichen Probably worth getting some more use cases, but if I grep my code for functions named *strip*, mostly they simply strip out all occurrences (usually based on replaceAll or similar, no need for a library function), and in a significant minority of cases they remove all prefixes and suffixes, like this:

// strip(str, /[^\p{L}\p{M}]+/gu, { count: Infinity })
function stripNonLetter(str: string) {
    return str.replaceAll(/^[^\p{L}\p{M}]+|[^\p{L}\p{M}]+$/gu, '')
}

// strip(str, /[_\- ]+/g, { count: Infinity })
const strip = (str: string) => str.replaceAll(/^[_\- ]+|[_\- ]+$/g, '')

There are also a few cases where they strip a single instance at the start or end and then stop there, and it sounds like @canac 's use case fits here:

// stripStart(str, '\ufeff', { count: 1 })
function stripBom(str: string) {
   return str.replace(/^\ufeff/, '')
}

I guess the "strip up to n occurrences" case is rare when 1 < n < Infinity, but very common (possibly the most common) when n == Infinity, and also reasonably common where n == 1. So maybe a better API is options?: { all?: boolean } or options?: { single?: boolean }, depending on what the chosen default is (no strong opinion either way on the default).

I agree, strip as a name is confusing as it doesn't indicate single or all occurrences.

I'm not keen on the idea of doubling the API surface area from strip/stripStart/stripEnd to strip/stripStart/stripEnd/stripAll/stripAllStart/stripAllEnd, even though that's essentially what Rust std library does. Seems like overkill in JS/TS.

I wouldn't like that either. I think the problem here is that the functions are explicit in name but might work differently depending on the passed regexp.

Another way of supporting single/all, but without a specific option, would be reading the g flag if the supplied pattern is a regex. I'm also not keen on this option as it doesn't support string patterns, and it's also not really the correct semantics for the g flag. Currently I just ignore g and y flags, but maybe throwing would be better if g doesn't match all-ness (and always throwing if y supplied).

Come to think of it, this api is tries to do too much combining all these rust functions into one imo. There is String.prototype.replace, String.prototype.replaceAll, which already cover a lot of use cases.

This api should be reduced to replaceStart() and replaceEnd(), which makes them actually more flexible, not only stripping but allow for any replacement and work more in a js way:

strip_prefix => replaceStart("foo:bar", "foo:", "") // "bar"
strip_suffix => replaceEnd("bar:foo", ":foo", "") // "bar"
trim_start_matches => replaceStart("11foo1bar", /1+/, "") // "foo1bar"
trim_end_matches => replaceEnd( "foo1bar11", /1+/, "") // "foo1bar"
trim_matches => "11foo1bar11".replace(/^1+|1+$/, "") // "foo1bar" or call both replaceStart() and replaceEnd()

N replacements can be handled by the regexp, using a prepared string or calling the function repeatedly:
replaceEnd("barfoofoofoo", /(foo){2}/, "") // "barfoo" or
replaceEnd("barfoofoofoo", "foo".repeat(2), "") // "barfoo" or
for (let i = 0; i < 2; i++) replaceEnd("barfoofoofoo", "foo", "") // "barfoo"

@lionel-rowe
Copy link
Contributor Author

lionel-rowe commented Dec 24, 2024

N replacements can be handled by the regexp, using a prepared string or calling the function repeatedly:
replaceEnd("barfoofoofoo", /(foo){2}/, "") // "barfoo" or
replaceEnd("barfoofoofoo", "foo".repeat(2), "") // "barfoo" or
for (let i = 0; i < 2; i++) replaceEnd("barfoofoofoo", "foo", "") // "barfoo"

@timreichen None of these are really satisfactory for the case where you want to dynamically strip arbitrarily many occurrences of an arbitrary string, but none of the use cases I've found involve doing that anyway — use cases with unlimited occurrences typically seem to be more suited to regex, where you can trivially use a non-capturing group along with * or +.

As for replace vs strip:

replaceEnd(url.pathname, /\/*/, '/') // pathname always ends with slash
stripEnd(url.pathname, /\/*/) + '/' // same result

But it could be useful for cases like this:

// only prepend 'https:' when replacing existing 'http:' protocol
replaceStart(url.href, 'http:', 'https:')

// doesn't work with `strip` version as the protocol might be something else
// e.g. you might end up with `https:file:///...`
'https:' + stripStart(url.href, 'http:')

Using the name replace* along with allowing regexes weakly implies that the string-replacement syntax will be supported, which would add a moderate amount of implementation complexity — something like this. EDIT: actually I guess this doesn't require special handling as the regexes are cloned internally, with capture groups etc. staying the same.

'abcxyzabcxyz'.replace(/[abc]+/, '<$&>') // '<abc>xyz<abc>xyz'

// naive result would be `<$&>xyzabcxyz`
// but seems reasonable to expect result to be `<abc>xyzabcxyz`
replaceStart('abcxyzabcxyz', /[abc]+/, '<$&>')

Conversely, supporting callback replacements would be trivial, as you can just conditionalize the replacement based on whether the index lies at the start/end.

replaceStart('abcxyzabcxyz', /[abc]+/, (m) => `<${m}>`)

Also, the name replace for replace-both-ends would be confusing due to String#replace. Maybe replaceEnds or something. Requiring calling 2 functions just to strip/replace both ends is suboptimal as it requires either declaring an extra variable or repeating code.

@lionel-rowe lionel-rowe changed the title feat(text/unstable): add strip, stripStart, stripEnd functions (#6265) feat(text/unstable): add replace, replaceStart, replaceEnd functions (#6265) Dec 26, 2024
@lionel-rowe lionel-rowe changed the title feat(text/unstable): add replace, replaceStart, replaceEnd functions (#6265) feat(text/unstable): add replaceStart, replaceEnd, replaceBoth functions (#6265) Dec 26, 2024
@timreichen
Copy link
Contributor

How about just replaceStart() and replaceEnd() just supporting strings without regexp support?
I feel if one wants to do a complex strip/replacement with a regexp, prepending ^ or appending $ and using the built in replace() function is more suited.
I also don't think we should provide a replaceBoth() function. The name is kinda weird and it is just a small convenience function which can be done by the user with one line.
If there are requests for more functionality by multiple users, we still can add to that.

@kt3k
Copy link
Member

kt3k commented Jan 1, 2025

I'm not in favor of adding replaceStart, replaceEnd, or replaceBoth. They look overlapping too much with String.prototype.replace and String.prototype.replaceAll. What problems do they solve?

Also why did we switch from strip* APIs to replace* APIs in this PR? I don't think this PR resolve #6265 anymore

@lionel-rowe
Copy link
Contributor Author

I'm not in favor of adding replaceStart, replaceEnd, or replaceBoth. They look overlapping too much with String.prototype.replace and String.prototype.replaceAll. What problems do they solve?

I've tried to stick to practical @examples in the JSDoc documentation, which show realistic cases where each function would be useful.

Also why did we switch from strip* APIs to replace* APIs in this PR? I don't think this PR resolve #6265 anymore

Seems like a minimal increase in complexity/surface area that has some handy use cases, and replace*(str, x, '') is easily understandable as strip*(str, x) for anyone familiar with the common str.replace(x, '') paradigm.

Possibly the problem space is too poorly defined at the moment, as I think canac, timreichen, you, and I all have slightly different ideas about what's useful here. That's partly why I've tried to stick to practical examples in the documentation, so we can discuss concretely about which of them make sense. E.g. I think replaceStart(url.href, 'http:', 'https:') is a decently compelling case for replace > strip, and replaceBoth(str, /[^\p{L}\p{M}\p{N}]+/u, "") is a strongly compelling case for *Both and regex support. But maybe there's no compelling case for the combination of *Both, regex support, and replacement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion: stripPrefix/stripSuffix functions
4 participants