Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIP-67 - Improve strictEquality feature for better compatibility with existing code bases #97

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

mberndt123
Copy link

Hi there,

I'd like to use the strictEquality feature for the improved type safety it provides, but currently find it too inconvenient to use due to an unfortunate interaction with pattern matching. This SIP is my attempt to fix that.
There have been no comments in the Pre-SIP thread for the past two weeks, and it's a very small (though impactful) change to the language, so I felt it was time to submit it.

Best regards
Matthias

@kyouko-taiga kyouko-taiga changed the title improve strictEquality feature for better compatibility with existing code bases SIP-67 - Improve strictEquality feature for better compatibility with existing code bases Nov 15, 2024
@soronpo
Copy link
Contributor

soronpo commented Nov 25, 2024

Thank you for the proposal. Here is the feedback from the SIP Committee:

  • We would like to see a different proposal that special-cases enumerations in both pattern matching and equality operations so that enumerations are considered to have a CanEqual derivation and the compiler would pick up the default equality if no explicit derives CanEqual is given. This will take care of old libraries that do not have CanEqual derivation.
  • In general, we do not think its good to have a different behavior between == and pattern match equality. Do you have such a use-case?

@mberndt123
Copy link
Author

mberndt123 commented Nov 25, 2024

Hey @soronpo,

I'm sorry, I'm afraid I can't do that.

Please consider the following enum:

enum Foo:
  case Bar
  case Baz(f: Int => Int)

I'm going to contend that it shouldn't ever be allowed to compare two Foo values with == because that would require us to determine whether two functions are the same, which isn't possible in a useful way.
Scala agrees with me on this already, because this type is isomorphic to Option[Int => Int], and that type can't be compared with ==, despite the fact that a CanEqual instance for Option is available by default:

scala> import scala.language.strictEquality

scala> Option.empty[Int => Int] == Option.empty[Int => Int]
-- [E172] Type Error: ----------------------------------------------------------
1 |Option.empty[Int => Int] == Option.empty[Int => Int]
  |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |Values of types Option[Int => Int] and Option[Int => Int] cannot be compared with == or !=.

This is not a bug, this is a feature.

Now you might say, OK, then maybe we can make the CanEqual instance available only when all the fields in the enum have a CanEqual instance? I. e. make CanEqual[Foo, Foo] available only when CanEqual[Int => Int, Int => Int] is available? We can simulate this:

scala> enum Foo:
     |   case Bar
     |   case Baz(f: Int => Int)
     | object Foo:
     |   given (using CanEqual[Int => Int, Int => Int]): CanEqual[Foo, Foo] =
     |     CanEqual.derived
     | 
// defined class Foo
// defined object Foo

Alas, while this does prevent nonsensical comparisons with ==, it also fails to achieve the whole point of this proposal, which is to allow pattern matching:

scala> def bla(f: Foo) match
     |   case Foo.Bar => 0
     |   case Foo.Baz(f) => f(0)
     | 
-- [E172] Type Error: ----------------------------------------------------------
2 |  case Foo.Bar => 0
  |       ^^^^^^^
  |Values of types Foo and Foo cannot be compared with == or !=.

(insert sad trombone sound here)

Now you might ask: How come it works just right for Option? You cannot compare two Option[Int => Int] objects with ==, but it is possible to perform pattern matching on an Option[Int => Int]:

scala> Option.empty[Int => Int] match
     |   case None => 0
     |   case Some(f) => f(0)
     | 
val res0: Int = 0

That seems like the sweet spot! The reason this works is that None is of type Option[Nothing], hence this pattern match doesn't require a CanEqual[Option[Int => Int], Option[Int => Int]] but a CanEqual[Option[Int => Int], Option[Nothing]], which is available.
However this doesn't carry over to enum Foo: it doesn't have any type parameters, hence it's not possible to differentiate the empty Bar case from the non-empty Baz one on the type level.

The conclusion from all of this is that it's impossible to make this work correctly by providing a magic CanEqual instance. enum Foo is the proof for this: you either provide the instance, which makes pattern matching work but allows for == comparisons that don't make sense, or you don't, in which case you can't do pattern matching. Whatever the solution to this problem is, this is not it.

@soronpo
Copy link
Contributor

soronpo commented Nov 25, 2024

Please consider the following enum:

enum Foo:
  case Bar
  case Baz(f: Int => Int)

I would not consider this to be a valid use of enum. Have you seen this type of code in the wild?

@jducoeur
Copy link

Have you seen this type of code in the wild?

Not sure offhand, but it seems like it would be unsurprising when constructing a DSL interpreter. (Where the enum is a kind of expression, and the leaf is an instantiation of one expression type.)

The interpreter for my own QL language is still on Scala 2, but I could see myself trying to build it along those lines if it was Scala 3, so I don't think that's just a strawman.

@soronpo
Copy link
Contributor

soronpo commented Nov 25, 2024

Not sure offhand, but it seems like it would be unsurprising when constructing a DSL interpreter.

Still I'm not sure enum is the correct way to do it in Scala 3. It's not meant as a replacement for all kinds of case class hierarchies.

@eugengarkusha
Copy link

eugengarkusha commented Nov 25, 2024

I would not consider this to be a valid use of enum. Have you seen this type of code in the wild?

I have seen ADTs with functions inside
(Hope its ok to consider enum as a nicer syntax for ADT)

@KristianAN
Copy link

I would not consider this to be a valid use of enum. Have you seen this type of code in the wild?

I have seen ADTs with functions inside (Hope its ok to consider enum as a nicer syntax for ADT)

From the Scala 3 book.

Algebraic Data Types (ADTs) can be created with the enum construct, so we’ll briefly review enumerations before looking at ADTs.

Personally I have almost exclusively written Scala 3 and would use enums for this kind of ADT (with a function). If that is not the correct way to do it, then the correct way must be elusive.

@jducoeur
Copy link

Yeah, agreed. My understanding of the enum feature was that it was more or less exactly to reify the pattern that the community has settled on for ADTs over the years. (With enumerations per se being sort of the degenerate case, but far from the whole story.)

Saying that only some ADTs count, and other reasonably well-formed ones don't seems kind of un-Scala to me. A key element of Scala is that functions are values; intuitively, I would expect them to work here like any other value type.

@SystemFw
Copy link

SystemFw commented Nov 26, 2024

I have to agree with @jducoeur , what's the actual argument against functions in enum?
enum already goes quite some way into trying to capture some of the more complex ADT patterns that have little to do with enums, for example allowing extends to encode GADTs, having a value in there that happens to be a function seems like a lot more "normal" in comparison.

The GADT mention is not accidental, the main use case for that is encoding commands:

enum Cmd[A] { 
  case Read() extends Cmd[String]
  case Write extends Cmd[Unit]
}

and command-like GADTs will often have functions in them to allow sequencing:

enum Cmd[A] { 
  case Read() extends Cmd[String]
  case Write extends Cmd[Unit]
  case FlatMap[O, A](fa: Cmd[O], f: O => Cmd[A]) extends Cmd[A]
}

I feel like the argument that enum isn't meant for this would be a lot more stronger if extends wasn't allowed at all

@soronpo
Copy link
Contributor

soronpo commented Nov 26, 2024

Let me clarify. If you want function arguments that should not be part of the pattern match, then these should come as a second argument block of the case, IMO.

enum Cmd[A]:
  case Read() extends Cmd[String]
  case Write extends Cmd[Unit]
  case FlatMap[O, A](fa: Cmd[O])(val f: O => Cmd[A]) extends Cmd[A]

If you want them to be part of the pattern match, you put them in the first block where they have same equality treatment like all the other arguments.

@JD557
Copy link

JD557 commented Nov 26, 2024

I admit that I never played with strict equality, so this might not make much sense, but:

Now you might say, OK, then maybe we can make the CanEqual instance available only when all the fields in the enum have a CanEqual instance? I. e. make CanEqual[Foo, Foo] available only when CanEqual[Int => Int, Int => Int] is available? We can simulate this:

Do we need a CanEqual[Foo, Foo] for the pattern match? Wouldn't a CanEqual[Foo, Foo.Bar.type] suffice?

Now, from what I can tell based on some quick tests, the compiler really wants a CanEqual[Foo, Foo] for pattern matching, but maybe this could be changed?

@mberndt123
Copy link
Author

mberndt123 commented Nov 26, 2024

Hi @JD557,

Do we need a CanEqual[Foo, Foo] for the pattern match? Wouldn't a CanEqual[Foo, Foo.Bar.type] suffice?

Now, from what I can tell based on some quick tests, the compiler really wants a CanEqual[Foo, Foo] for pattern matching, but maybe this could be changed?

Yes, I had this in mind and I was going to propose it – you beat me to it. It can't be done with just a magic CanEqual instance, but if we slightly tweak the behaviour pattern matching under strictEquality I think it should work.

I. e.

def bla(foo: Foo) =
  foo match
    case Foo.Bar => 0 // requires CanEqual[Foo.Bar.type, Foo]
    case Foo.Baz(f) => f(0) // uses unapply

Then all we would need is a magic CanEqual instance that spawns the required CanEqual. That seems like a workable approach.

@mberndt123
Copy link
Author

mberndt123 commented Nov 26, 2024

Let me clarify. If you want function arguments that should not be part of the pattern match, then these should come as a second argument block of the case, IMO.

[…]

If you want them to be part of the pattern match, you put them in the first block where they have same equality treatment like all the other arguments.

First of all, I'd like to address the question of "in the wild" examples of functions within ADTs. This is definitely something that people do, e. g.
https://github.com/typelevel/cats/blob/1cc04eca9f2bc934c869a7c5054b15f6702866fb/free/src/main/scala/cats/free/Free.scala#L219
https://github.com/typelevel/cats-effect/blob/eb918fa59f85543278eae3506fda84ccea68ad7c/core/shared/src/main/scala/cats/effect/IO.scala#L2235

I think this is perfectly good code. It should be possible to use enum for these types, and it is today – strictEquality shouldn't break that. I also see no reason to compel people to rewrite this to a style where the function goes in the second parameter list. In fact, I think it is worse that way, because == is supposed to tell you whether two things are the same – but if you just ignore the function, you're simply going to get a broken equals method that says that two things are the same even though they aren't. This is something that we shouldn't encourage. The better option is to allow the subset of equality comparisons that people are likely to need and that we know we can perform correctly – i. e. comparisons to the singleton cases – and prevent all other equality tests at compile time.

@mberndt123
Copy link
Author

I've pushed a new revision that is based on a minor change to enum pattern matching and a magic CanEqual instance. @soronpo please let me know what you think

@soronpo
Copy link
Contributor

soronpo commented Dec 20, 2024

@mberndt123 Hey Mathias, thank you again for the amendments. We discussed the SIP today and we strive for a more general approach: A case class or enum case will be considered to have CanEqual derivation when all arguments (recursively) have CanEqual derivation (singletons therefore have CanEqual). What do you think?

@mberndt123
Copy link
Author

mberndt123 commented Jan 3, 2025

Hi Oron,

I have considered the committee's proposal and would like to present my conclusions.

  1. When applied to the motivating Nat example in the SIP, the committee's proposal would make CanEqual[Nat.Zero.type, Nat.Zero.type] available (it would not make CanEqual[Nat.Succ, Nat.Succ] available since its only field is of type Nat and a CanEqual[Nat, Nat] is not available). However, def + from that example requires a CanEqual[Nat, Nat] instance to compile. Therefore, the committee's proposal does not solve the problem which this SIP is intended to solve. If Nat had been specified as a sealed trait hierarchy, the pattern match in def + would require a CanEqual[Nat.Zero.type, Nat], which is also not available, so it doesn't work for this case either.
  2. The committee's proposal also does not make the == operator work for enum types, because comparing these with == again requires a CanEqual instance for the enum type, not for the individual cases. The only thing it does make work for enums is contrived code like Some[Nat.Zero.type](Nat.Zero) == Some[Nat.Zero.type](Nat.Zero).
  3. A modified version of the committee's proposal that makes CanEqual available for enum types iff all of their cases' fields' types have a CanEqual will also not fix pattern matching as it cannot work with the enum Foo example from above.

What the committee's proposal does do is make == work for case classes. However, I have some observations about this too:

  1. For deeply nested data structures, the recursive search for CanEqual instances would lead to an increase in compile times. This can not happen in my own proposal as it only provides a magic CanEqual instance for case objects and singleton enum cases – the fields never enter the picture. That is not a bug, it is a feature.
  2. It is not immediately obvious if and how it would work for recursive definitions such as case class Nat(n: Option[Nat])
  3. I am also concerned about maintainability. Imagine you use a library that defines some case class. You use == all over the place to compare it, and it works due to the magic CanEqual instance for case classes. One day the library author decides to change the case class. Maybe he changes one of the fields' types to an opaque type for better type safety. Now there is no longer a CanEqual available for that field and your code no longer compiles. And if this happens in some complex, deeply nested data structure, figuring out where that breakage is coming from is going to be a nightmare.
  4. Making the "magic" CanEqual instance available only when CanEqual is available for the fields' types makes this a safer option than simply adding derives CanEqual to the case class, because it won't make CanEqual available for types that shouldn't have it, whereas derives CanEqual will. This encourages developers to never use derives CanEqual, making all of the above problems even worse.

For all of these reasons, I think "fully automatic" typeclass derivation (i. e. typeclass derivation that works without an explicit derives Something clause) is a bad idea in general, and the language/standard library shouldn't set a bad example in that regard.

To sum up: the committee's proposal doesn't solve the problem that the SIP is designed to solve. It doesn't make the == operator work for enum types either. It does make == work for case classes, but with many downsides that should be avoided. I therefore do not consider the committee's proposal viable. I had already pointed out in my first comment that “it's impossible to make this work correctly by providing a magic CanEqual instance”. This continues to be true and is the reason why my revised proposal also includes language regarding the behaviour of pattern matching, which is required to enable a solution based on a CanEqual instance.

Thinking about this issue once more made me realize that my own revised proposal has a safety problem. It mentions a "union or intersection of one or more sealed or enum types". This was meant to allow pattern matching against unions of different sealed or enum types without nesting matches, but it can also be used to perform equality tests that don't make sense and should be prevented. Since the revised proposal does not have any other upsides over the original proposal and also failed at its other goal of being acceptable to the committee, I've decided to stop pretending and reverted the SIP proposal to its original state.

ADT pattern matching and equality tests are separate things, and the fact that Scala has conflated them is a historical mistake. The best way to move forward now is to untangle them as far as possible: simply don't require a CanEqual instance for those cases that correspond to traditional ADT pattern matching, and require it for those cases where it makes sense, like matching a Vector against Nil or an Int against a named constant.

Regarding CanEqual for case classes and enums, I think that improvements are desirable. derives CanEqual can be added to any type, and there are no guardrails to prevent it from compiling when one of the fields doesn't have a CanEqual instance. Safer alternatives should be provided. However they are neither required nor useful to fix pattern matching, which is the goal of this SIP, and hence they should be explored in a separate SIP, which I will be happy to author once this one is accepted.

Best Regards
Matthias Berndt

@soronpo
Copy link
Contributor

soronpo commented Jan 24, 2025

We had a SIP discussion, and we'll vote on it next time, but I was just thinking about it and the specification needs to account for any singleton that is a sub-type of the matched type, which unreachability already does check for.
I think this should compile (but currently under your spec it does not):

import scala.language.strictEquality
def foo(x: Any): Unit =
  x match
    case 5 =>
    case "hello" =>

I don't know what to think about the function type example now.

@mberndt123
Copy link
Author

Hey @soronpo

Yes, you're right, that code does not compile under the rules I've proposed: Any cannot be equality-compared to an Int or a String.
I think that's fine. The whole point of -language:strictEquality is to increase type safety, and Any is the antithesis of that: it completely throws any type information out of the window. Therefore I don't think that this feature needs to support that use case, and nor is this the kind of code that the language should encourage or be optimized for.
The vast majority of reasonable use cases should be covered by the CanEqual instances that are available by default like CanEqual[String, String] or even CanEqual[Long, Int].

I also don't think that special-casing literals is a good solution here because it should always be possible to use a named constant instead of a literal with a minimum of fuss. IOW, if it works for the pattern "hello", it should also work for the pattern H where val H = "hello". At that point, Strings and Ints are just values like any other and should be treated according to the same rules.

It should also be pointed out that strings, unlike case objects and singleton enum cases, are not singletons.

val a = "a" * 10
val b = "aa" * 5
a eq b // false
a == b // true

So the philosophical reasoning described in the SIP does not apply to Strings.

@bjornregnell
Copy link

bjornregnell commented Jan 31, 2025

I agree with @mberndt123 that Any-comparisons are dubious and the whole point of strictEquality is making risky business not compile. However, I think it is very important to have good error messages so that a learner can understand why such a match don't compile.

I'd like to see a future where structEquality is the default, but to get there we need to make it really ergonomic and error messages, docs and other help to the user/learner is critical to making that happen. After all, safety is next to scalability a unique selling-point of Scala and we should always prioritize sound and ergonomic language features that improve safety, IMHO.

@soronpo
Copy link
Contributor

soronpo commented Jan 31, 2025

The whole point of -language:strictEquality is to increase type safety

I think the point is for equality. Pattern matching does not fall under this category beyond reachability concerns, IMO.

@bjornregnell
Copy link

But equals is called in pattern matching with constant patterns, so it's an equality test?

@soronpo
Copy link
Contributor

soronpo commented Jan 31, 2025

But equals is called in pattern matching with constant patterns, so it's an equality test?

For the record, strict equality is applied to == and not equals. I consider pattern matching to be using equals.

@bjornregnell
Copy link

bjornregnell commented Jan 31, 2025

Aha, thanks @soronpo for the clarification. Yes, nobody knows what equals does as it can be overridden with whatever strange behavior...

@bjornregnell
Copy link

@mberndt123 Perhaps update the proposal to upfront clarify the scope of being == and not equals ? When re-reading the intro etc I'm not sure this is absolutely clear.

@sjrd
Copy link
Member

sjrd commented Jan 31, 2025

Pattern matching uses ==. It doesn't use equals. Proof:
https://scastie.scala-lang.org/A0eckzOaRxaygBFoCMiyWA

locally {
  val x: Any = 5L
  println(x == 5)      // true
  println(x.equals(5)) // false
  println(x match {    // yes
    case 5 => "yes"
    case _ => "nope"
  })
}

@mberndt123
Copy link
Author

@mberndt123 Perhaps update the proposal to upfront clarify the scope of being == and not equals ? When re-reading the intro etc I'm not sure this is absolutely clear.

Hi Björn,

What @soronpo wrote there is not a clarification, it is his opinion, and it is in fact exactly where the controversy is. Currently, pattern matching against a constant (whether it's a literal or a case object or just a val) requires a CanEqual instance. @soronpo believes that this should be dropped entirely. For reasons that I've laid out in in the SIP, I don't think that this is sufficiently safe and I would rather drop the CanEqual requirement only for cases that correspond to traditional ADT pattern matching in statically typed languages like Haskell, OCaml or Rust.

@bjornregnell
Copy link

AHA!! Thanks also to @mberndt123 and @sjrd for yet more clarifications. Maybe it's just me who didn't get the actual controversy, but perhaps the intro to the SIP can be improved by referring to == versus equals and later state what is at stake in terms of compiling or not for a busy reader...

I think I'm leaning towards the stricter end of the design space of the future language - it is after all called strict equality.

@soronpo
Copy link
Contributor

soronpo commented Jan 31, 2025

Pattern matching uses ==. It doesn't use equals. Proof: https://scastie.scala-lang.org/A0eckzOaRxaygBFoCMiyWA

locally {
  val x: Any = 5L
  println(x == 5)      // true
  println(x.equals(5)) // false
  println(x match {    // yes
    case 5 => "yes"
    case _ => "nope"
  })
}

What I meant is that I think it should be equals and therefore not affected by strict equality

@sjrd
Copy link
Member

sjrd commented Jan 31, 2025

It has had the semantics of == for as long as I've been involved in Scala (around 2.9.0). Probably as far back as the introduction of pattern matching in the first place. I don't see why that should change. If it should, it would be a separate SIP with far greater compatibility issues.

IMO we should judge the present SIP with respect to the established semantics of pattern matching.

@soronpo
Copy link
Contributor

soronpo commented Jan 31, 2025

It has had the semantics of == for as long as I've been involved in Scala (around 2.9.0). Probably as far back as the introduction of pattern matching in the first place. I don't see why that should change. If it should, it would be a separate SIP with far greater compatibility issues.

IMO we should judge the present SIP with respect to the established semantics of pattern matching.

I'm ok with keeping == and just inferring CanEqual under pattern matching, as I have already mentioned.

@bjornregnell
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.