lang: add the `union` type #45

zerbina · 2024-09-24T19:36:46Z

Summary

add the union type, which is a concrete anonymous sum type
add the union type constructor
clarify that Return also allows subtypes (covariant return)
union values can currently only be constructed via Return

Details

add the tkUnion type kind
add the cmp procedure to types, used for sorting types
add the isSubtypeOf procedure to types
implement the various type handling for tkUnion (type rendering,
value rendering, etc.)
implement implicit union construction for Return
refactor Return analysis and translation to be more linear

In memory union values are currently represented as an 8-byte integer
tag + a union of all allowed types, but this is subject to change.

Notes For Reviewers

They're not part of the core set of language features for version 0, but union types are useful, and also stress-test the lower-level passes a bit, so I figured I'd implement them right away.

There's no pattern matching facility yet, due to the current lack of local variables. Those are better implemented separately, which will then unblock pattern matching.

Also clarify that `Return` also allows expression with a type that is a subtype of the return type.

The new `fitExpr` procedure handles type mismatch errors as well as performing the necessary conversions. In addition, `Return` translation is refactored to better handle error propagation: the expression having an error no longer prevents the `Return` from being typed as `void`.

`typeToIL` already handles `tkError`, but `genProcType` unnecessarily aborted on encountering it. This is now fixed.

saem · 2024-09-24T21:04:16Z

Edited the description to note that subtype allowance in Return is covariance, this is just to help keyword/searching PRs in the future.

saem

One possible issue with not collapsing unions, but otherwise looks good, hopefully I'm not misunderstanding something.

passes/source2il.nim

saem · 2024-09-24T21:30:30Z

passes/source2il.nim

+        if typ.kind != tkError:
+          c.error("union type operands must be unique")


Perhaps I'm wrong, just looking at this but I think union(union(float, int), union(int, string)) should result in union(float, int, string).

This points to two things:

maybe we shouldn't get a duplication error?

we should flatten unions, I don't believe evalType alone does that

I haven't looked at the whole PR, so this, union inside a union, might not be possibility.

Perhaps I'm wrong, just looking at this but I think union(union(float, int), union(int, string)) should result in union(float, int, string).

I did think about this when writing the specification. My main motivation was to allow int or float or string producing the union(int, float, string) type instead of union(union(int, float), string) without any special cases or rules.

I ultimately discarded the "flatten" idea, mostly for practical usage reasons, but also because it means that type formation, term elimination, and term introduction rules become more complex for union.

There are two concrete situations where I think flattening is a problem.

Case 1: ergonomics and efficiency

Consider the following (using hypothetical syntax):

proc floatOrStr(): union(float, string) = ... proc p(): union(int, union(float, string)) = return floatOrStr()

With the rules as they currently are, this would work as one would expect (in my opinion). If the union type is flattened, then there's a problem, because union(A, B) is currently not considered a subtype of union(A, B, C), meaning that the code above becomes illegal.

Of course, union(A, B) could be made a subtype of union(A, B, C), and I also think it should. The subtype relationship would - in my opinion - need to be coercive, however, given that the necessary run-time conversion is not trivial (the input union needs to be unpacked and then repacked).

In the example showed above, requiring an explicit coercion would hurt ergonomics and also be somewhat counterintuitive.

Case 2: composition in generics

Consider the following:

proc f[A, B](a: A, b: B) = let x: union(A, B) = select(..., a, b) case x of A as val: ... of B as val: ...

My expectation is that a call such as a: int, b: union(float, string) |- f(a, b) should be perfectly valid, but when union types are flattened, it wouldn't be, since the second of branch would specify a type that's not part of the union.

Again, there are some possible remedies:

the notion of strong and weak union types is introduced; strong unions cannot be "destructured", whereas weak ones can be. Union types bound to type variables would then always become strong ones

in of T as val, T is allowed to be a union type covering a subset of the types in the matched-against union

Number 1 introduces additional complexity to the type system. Number 2 would only work for non-lvalue conversion (because the repacking of unions cannot preserve lvalue-ness), thus having composition issues again.

Summing it up, I think that flattening unions (by default) is not a good idea. There could be some type operator, like flatten, which constructs a "flat" union type from a nested one, though I think that's something for later.

I think there's also a consistency argument to be made (though maybe not a very strong one) with regards to tuple, which doesn't collapse nested tuples, for reasons similar to those I stated above.

maybe we shouldn't get a duplication error?

This is also something I though about. I decided to go with reporting an error because I think it'll work better with (future) generic routines.

Consider:

proc f[A, B](a: A, b: B) = let x: union(A, B) = ...

When making duplicating types in a union type constructor an error, it's possible to derive an A != B constraint for the generic parameters when typing the generic body, making it possible to fail analysis as soon as the concrete types are known, which could have some beneficial interactions with overload resolution (should we decide to employ type-based overload resolution, that is).

However, I do have a less strong opinion on an error for duplication than I have on flattening.

Thank you for the detailed response. One thing I didn't consider was strong versus weak unions, that's a pretty cool idea; but you're correct I don't see the benefit to cost at present..

saem · 2024-09-25T01:15:54Z

phy/types.nim

+  ## Computes whether `a` is a subtype of `b`.
+  if b.kind == tkError:
+    true # the error type acts as a top type
+  elif a.kind == tkVoid:


Making void a bottom means we're treating it as an all, a sort of antonym to any (top) which is also occupied by error, pretty sure that works for us long term.

Making void a bottom means we're treating it as an all

I somewhat disagree, but maybe it's only because of the name all. When I first read all, my first thought was that it's the sum type of all possible types, which would make it a top-like type. Even when interpreting it as a bottom type, my first thought was that it is an inhabited type, which void is not.

For completeness' sake (though I believe you know this already), from the start, my intention with void was for it to be the uninhabited bottom type.

Making void a bottom means we're treating it as an all

I somewhat disagree, but maybe it's only because of the name all. When I first read all, my first thought was that it's the sum type of all possible types, which would make it a top-like type. Even when interpreting it as a bottom type, my first thought was that it is an inhabited type, which void is not.

For completeness' sake (though I believe you know this already), from the start, my intention with void was for it to be the uninhabited bottom type.

Yup, I know it's meant to be uninhabited. all, or we could call it every, is something that is all types all once, which can be thought of as either being inhabited by an impossible value (an opposite to unit), or an uninhabitable type (void), the latter is how I tend to think about it. For now all this is fine, until we hit certain cases the distinctions are typically not helpful.

Co-authored-by: Saem Ghani <[email protected]>

zerbina added 7 commits September 24, 2024 18:45

spec: add the union type

1622bef

source2il: implement union types

cc67279

vmexec: implement union value rendering

5704dc3

spec: clarify union subtype behaviour

6e055a4

Also clarify that `Return` also allows expression with a type that is a subtype of the return type.

tests: add tests

c97ff7d

source2il: allow error types in return type positions

be28320

`typeToIL` already handles `tkError`, but `genProcType` unnecessarily aborted on encountering it. This is now fixed.

zerbina added the enhancement New feature or request label Sep 24, 2024

zerbina mentioned this pull request Sep 24, 2024

The Road To Version 0 #33

Open

22 tasks

Merge branch 'main' into source-lang-union-types

3ec1d3d

saem approved these changes Sep 25, 2024

View reviewed changes

docs: ordered -> sorted

7b9ae23

Co-authored-by: Saem Ghani <[email protected]>

saem merged commit c3caa9f into nim-works:main Sep 25, 2024
5 checks passed

zerbina deleted the source-lang-union-types branch October 7, 2024 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lang: add the `union` type #45

lang: add the `union` type #45

zerbina commented Sep 24, 2024 •

edited by saem

Loading

saem commented Sep 24, 2024

saem left a comment

saem Sep 24, 2024

zerbina Sep 25, 2024

saem Sep 25, 2024

saem Sep 25, 2024

zerbina Sep 25, 2024

saem Sep 25, 2024

		if typ.kind != tkError:
		c.error("union type operands must be unique")

lang: add the union type #45

lang: add the union type #45

Conversation

zerbina commented Sep 24, 2024 • edited by saem Loading

Summary

Details

Notes For Reviewers

saem commented Sep 24, 2024

saem left a comment

Choose a reason for hiding this comment

saem Sep 24, 2024

Choose a reason for hiding this comment

zerbina Sep 25, 2024

Choose a reason for hiding this comment

Case 1: ergonomics and efficiency

Case 2: composition in generics

saem Sep 25, 2024

Choose a reason for hiding this comment

saem Sep 25, 2024

Choose a reason for hiding this comment

zerbina Sep 25, 2024

Choose a reason for hiding this comment

saem Sep 25, 2024

Choose a reason for hiding this comment

lang: add the `union` type #45

lang: add the `union` type #45

zerbina commented Sep 24, 2024 •

edited by saem

Loading