Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lang: add the union type #45

Merged
merged 9 commits into from
Sep 25, 2024
Merged

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Sep 24, 2024

Summary

  • add the union type, which is a concrete anonymous sum type
  • add the union type constructor
  • clarify that Return also allows subtypes (covariant return)
  • union values can currently only be constructed via Return

Details

  • add the tkUnion type kind
  • add the cmp procedure to types, used for sorting types
  • add the isSubtypeOf procedure to types
  • implement the various type handling for tkUnion (type rendering,
    value rendering, etc.)
  • implement implicit union construction for Return
  • refactor Return analysis and translation to be more linear

In memory union values are currently represented as an 8-byte integer
tag + a union of all allowed types, but this is subject to change.


Notes For Reviewers

They're not part of the core set of language features for version 0, but union types are useful, and also stress-test the lower-level passes a bit, so I figured I'd implement them right away.

There's no pattern matching facility yet, due to the current lack of local variables. Those are better implemented separately, which will then unblock pattern matching.

Also clarify that `Return` also allows expression with a type that is
a subtype of the return type.
The new `fitExpr` procedure handles type mismatch errors as well as
performing the necessary conversions.

In addition, `Return` translation is refactored to better handle error
propagation: the expression having an error no longer prevents the
`Return` from being typed as `void`.
`typeToIL` already handles `tkError`, but `genProcType` unnecessarily
aborted on encountering it. This is now fixed.
@zerbina zerbina added the enhancement New feature or request label Sep 24, 2024
@zerbina zerbina mentioned this pull request Sep 24, 2024
22 tasks
@saem
Copy link
Contributor

saem commented Sep 24, 2024

Edited the description to note that subtype allowance in Return is covariance, this is just to help keyword/searching PRs in the future.

Copy link
Contributor

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible issue with not collapsing unions, but otherwise looks good, hopefully I'm not misunderstanding something.

passes/source2il.nim Outdated Show resolved Hide resolved
Comment on lines +121 to +122
if typ.kind != tkError:
c.error("union type operands must be unique")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm wrong, just looking at this but I think union(union(float, int), union(int, string)) should result in union(float, int, string).

This points to two things:

  • maybe we shouldn't get a duplication error?
  • we should flatten unions, I don't believe evalType alone does that

I haven't looked at the whole PR, so this, union inside a union, might not be possibility.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm wrong, just looking at this but I think union(union(float, int), union(int, string)) should result in union(float, int, string).

I did think about this when writing the specification. My main motivation was to allow int or float or string producing the union(int, float, string) type instead of union(union(int, float), string) without any special cases or rules.

I ultimately discarded the "flatten" idea, mostly for practical usage reasons, but also because it means that type formation, term elimination, and term introduction rules become more complex for union.

There are two concrete situations where I think flattening is a problem.

Case 1: ergonomics and efficiency

Consider the following (using hypothetical syntax):

proc floatOrStr(): union(float, string) =
  ...

proc p(): union(int, union(float, string)) =
  return floatOrStr()

With the rules as they currently are, this would work as one would expect (in my opinion). If the union type is flattened, then there's a problem, because union(A, B) is currently not considered a subtype of union(A, B, C), meaning that the code above becomes illegal.

Of course, union(A, B) could be made a subtype of union(A, B, C), and I also think it should. The subtype relationship would - in my opinion - need to be coercive, however, given that the necessary run-time conversion is not trivial (the input union needs to be unpacked and then repacked).

In the example showed above, requiring an explicit coercion would hurt ergonomics and also be somewhat counterintuitive.

Case 2: composition in generics

Consider the following:

proc f[A, B](a: A, b: B) =
  let x: union(A, B) = select(..., a, b)
  case x
  of A as val: ...
  of B as val: ...

My expectation is that a call such as a: int, b: union(float, string) |- f(a, b) should be perfectly valid, but when union types are flattened, it wouldn't be, since the second of branch would specify a type that's not part of the union.

Again, there are some possible remedies:

  1. the notion of strong and weak union types is introduced; strong unions cannot be "destructured", whereas weak ones can be. Union types bound to type variables would then always become strong ones
  2. in of T as val, T is allowed to be a union type covering a subset of the types in the matched-against union

Number 1 introduces additional complexity to the type system. Number 2 would only work for non-lvalue conversion (because the repacking of unions cannot preserve lvalue-ness), thus having composition issues again.


Summing it up, I think that flattening unions (by default) is not a good idea. There could be some type operator, like flatten, which constructs a "flat" union type from a nested one, though I think that's something for later.

I think there's also a consistency argument to be made (though maybe not a very strong one) with regards to tuple, which doesn't collapse nested tuples, for reasons similar to those I stated above.

maybe we shouldn't get a duplication error?

This is also something I though about. I decided to go with reporting an error because I think it'll work better with (future) generic routines.

Consider:

proc f[A, B](a: A, b: B) =
  let x: union(A, B) = ...

When making duplicating types in a union type constructor an error, it's possible to derive an A != B constraint for the generic parameters when typing the generic body, making it possible to fail analysis as soon as the concrete types are known, which could have some beneficial interactions with overload resolution (should we decide to employ type-based overload resolution, that is).

However, I do have a less strong opinion on an error for duplication than I have on flattening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed response. One thing I didn't consider was strong versus weak unions, that's a pretty cool idea; but you're correct I don't see the benefit to cost at present..

## Computes whether `a` is a subtype of `b`.
if b.kind == tkError:
true # the error type acts as a top type
elif a.kind == tkVoid:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making void a bottom means we're treating it as an all, a sort of antonym to any (top) which is also occupied by error, pretty sure that works for us long term.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making void a bottom means we're treating it as an all

I somewhat disagree, but maybe it's only because of the name all. When I first read all, my first thought was that it's the sum type of all possible types, which would make it a top-like type. Even when interpreting it as a bottom type, my first thought was that it is an inhabited type, which void is not.

For completeness' sake (though I believe you know this already), from the start, my intention with void was for it to be the uninhabited bottom type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making void a bottom means we're treating it as an all

I somewhat disagree, but maybe it's only because of the name all. When I first read all, my first thought was that it's the sum type of all possible types, which would make it a top-like type. Even when interpreting it as a bottom type, my first thought was that it is an inhabited type, which void is not.

For completeness' sake (though I believe you know this already), from the start, my intention with void was for it to be the uninhabited bottom type.

Yup, I know it's meant to be uninhabited. all, or we could call it every, is something that is all types all once, which can be thought of as either being inhabited by an impossible value (an opposite to unit), or an uninhabitable type (void), the latter is how I tend to think about it. For now all this is fine, until we hit certain cases the distinctions are typically not helpful.

Co-authored-by: Saem Ghani <[email protected]>
@saem saem merged commit c3caa9f into nim-works:main Sep 25, 2024
5 checks passed
@zerbina zerbina deleted the source-lang-union-types branch October 7, 2024 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants