Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re schema doesn't properly validate values with line breaks #862

Open
lasiltan opened this issue Mar 6, 2023 · 6 comments
Open

re schema doesn't properly validate values with line breaks #862

lasiltan opened this issue Mar 6, 2023 · 6 comments
Labels
help wanted Help most welcome question Further information is requested

Comments

@lasiltan
Copy link

lasiltan commented Mar 6, 2023

(malli.core/validate [:re #"^thing$"] "thing\n")
=> true

Compare to

(re-matches #"^thing$" "thing\n")
=> nil
@ilmoraunio ilmoraunio added bug Something isn't working help wanted Help most welcome labels Mar 7, 2023
@lasiltan
Copy link
Author

lasiltan commented Mar 7, 2023

This ultimately boils down to the difference in java.util.regex.Matcher methods find and matches (clojure.core/re-findvs. clojure.core/re-matches).

(let [pattern (Pattern/compile "^thing$")
      matcher (.matcher pattern "thing\n")]
  (.matches matcher))
=> false

(let [pattern (Pattern/compile "^thing$")
      matcher (.matcher pattern "thing\n")]
  (.find matcher))
=> true

The javadoc on java.util.regex.Matcher/find states:

     * @return  {@code true} if, and only if, a subsequence of the input
     *          sequence matches this matcher's pattern

whereas the javadoc for java.util.regex.Matcher/matches states:

Attempts to match the entire region against the pattern.

To me, at least, the functionality of java.util.regex.Matcher/matches, i.e clojure.core/re-matches, is what I'm expecting.

@ikitommi
Copy link
Member

ikitommi commented Mar 9, 2023

Interesting. Tested with few alternatives:

Plumatic

(require '[schema.core :as schema])
(schema/validate #"thing" "thing\n")
; => "thing\n" (match)

JSON Schema Spec

{"type": "string", "pattern": "^thing$"}
// => null (match)

Javascript

RegExp('^thing$').exec('thing\n')
// => null (no match)

So, Malli works currently like Plumatic and JSON Schema do, but not like default JavaScript.

@ikitommi
Copy link
Member

ikitommi commented Mar 9, 2023

Do you have more external references (on languages / libraries etc) on what is the correct way to handle this? Options:

  1. keep the current way as others (JSON Schema + Plumatic) are doing it like this
  2. change the default as it's obviously less correct, need to handle JSON Schema mapping
  3. new option to m/-re-schema to override this, e.g. users can configure this

@ikitommi ikitommi added question Further information is requested and removed bug Something isn't working labels Mar 9, 2023
@camsaul
Copy link

camsaul commented Apr 14, 2023

I think the behavior for re-find is definitely a little sneaky with it allowing a newline at the end, but I think using re-find behavior is more intuitive than re-matches behavior.

For example if I have the schema

[:re #"\dcans"]

I would expect that to match any string that contained <num>cans anywhere in the string, e.g. one can, 2cans, 3cans or something like that. If I wanted my string to start with <num>cans, then I'd use ^. re-matches behavior is basically forcing ^ and $ semantics when sometimes that's not what you want. You can always opt in to that behavior by using ^ and $.

@camsaul
Copy link

camsaul commented Apr 14, 2023

I should also mention that you can always use \z instead of $ if you actually want the end of the input, $ is technically The end of a line and \z is the end of the input.

(re-find "^string\z" "string\n") => nil

See https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html

@lasiltan
Copy link
Author

I didn't even know about \z. I think that's good enough for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Help most welcome question Further information is requested
Projects
Status: No status
Development

No branches or pull requests

4 participants