Add new unsigned proposal #341

certik · 2024-10-25T14:59:11Z

The new proposal is in the unsigned.txt. I also committed the J3/24-116 proposal for reference.

Go ahead and suggest improvements.

klausler · 2024-10-25T15:10:34Z

proposals/unsigned/unsigned.txt

+  * `bits`: bit operations are defined, arithmetic operations do not
+    wraparound or are not defined
+  * `modular`: arithmetic operations wraparound using modular 2^n
+    arithmetic


There was also discussing of clamped or saturating semantics. Which would be a bad default, but if you're going to list all of the choices, you missed that one.

Good point. Btw, I added the appropriate intrinsics in 0e6544d.

klausler · 2024-10-25T15:15:09Z

proposals/unsigned/unsigned.txt

+
+An UNSIGNED with n bits has a value range between 0 and 2^n-1.
+(Note that Fortran model integers have values between -2^(n-1)+1 and
+2^(n-1)-1).


And in practice, the range is $-2^{(n-1)}$ to $2^{(n-1)-1}$ with two's-complement interpretation. Fortran's model permits ones'-complement.

klausler · 2024-10-25T15:18:25Z

proposals/unsigned/unsigned.txt

+(Note that Fortran model integers have values between -2^(n-1)+1 and
+2^(n-1)-1).
+
+## 2.2 Arithmetic overflow is undefined


This section is contrary to actual hardware behavior (truncation), 50+ years of expectations from C/C++, and current implementations of UNSIGNED (Sun, gfortran 15, LLVM Flang, and XLF's unsigned vector elements).

This proposal still leaves the door open to later add default wraparound on overflow, if the committee/community wishes to do that.

(Note that you literally just implemented a prototype yesterday into LLVM Flang as a Draft PR: llvm/llvm-project#113504, so I don't know how much this qualifies as "current implementation".)

By the time your glacial process gets around to making any kind of progress, it will have been available for years.

Weren't you the guy who wanted the committee to validate concepts with prototypes?

By the time your glacial process gets around to making any kind of progress, it will have been available for years.

It's not my process and I don't agree with it. I tried to change things, including this very repository or running for committee's leadership or starting a compiler. I don't know what else you want me to do.

Weren't you the guy who wanted the committee to validate concepts with prototypes?

Yes, I've been pushing for that for years. Every proposal needs to have a prototype.

But you should implement the right prototype, and try to reach a consensus with it. So far you implemented a design that you like, and are hoping that if LLVM Flang is successful it will become the de-facto standard. That is no different to how the committee has been operating, not always, but sometimes the decision is effectively done by one person without input from others on the committee or wider community. You rightfully object to that, and I agree. But if your solution to this is problem is to simply push things to LLVM Flang because you have been one of the main authors, without feedback from others, then this is no better than what the committee does, and I object to that too.

But you should implement the right prototype, and try to reach a consensus with it. So far you implemented a design that you like, and are hoping that if LLVM Flang is successful it will become the de-facto standard.

Note that LLVM is not alone in implementing this, this is coordinated (and compatible) with what gfortran does.

you do not arbitrarily add modulo operation (wraparound) in arithmetic, so that you can safely use size - 1 and other common operations. If you make a mistake, the compiler will tell you, instead of being required to wraparound and not tell you.

I think that you're assuming two things here: that compilers will always catch over/underflow in constant expression if the language doesn't define the behavior to be modular, and that they can't support options to emit a warning in those cases if the language does define the behavior to be modular.

that compilers will always catch over/underflow in constant expression if the language doesn't define the behavior to be modular

I am assuming the compiler can always catch it, and indeed that seems to be possible to implement, if you play with such expressions in Rust, you will always get either compile-time or runtime errors. However in "Release" mode users would typically disable such checks (unlike in Rust where I think it's not possibly to disable it). Also some compilers can decide not to implement such checks and it would not break anything.

that they can't support options to emit a warning in those cases if the language does define the behavior to be modular.

The underflow size-1 check is a fundamentally runtime check. If we standardize that size-1 must wraparound, how can the compiler insert runtime checks for this? The KISS example would stop working then. If the wraparound is standardized, I think it is not possible for the compiler to tell if you want modular arithmetic or not for expression like size-1. Please correct me if I am wrong.

The underflow size-1 check is a fundamentally runtime check. If we standardize that size-1 must wraparound, how can the compiler insert runtime checks for this?

Introduce a derived type like the cubelow.

module checked_unsigned implicit none private public :: uc, operator(+), assignment(=) type uc unsigned :: u end type uc interface operator(+) procedure checked_add_uc procedure checked_add_cu procedure checked_add_cc end interface operator(+) interface assignment(=) procedure set_uc procedure set_cu end interface assignment(=) contains recursive elemental function checked_add_uc(a,b) result(c) unsigned, intent(in) :: a type(uc), intent(in) :: b type(uc) :: c c%u = a + b%u if (c%u < a) error stop end function checked_add_uc recursive elemental function checked_add_cu(a,b) result(c) type(uc), intent(in) :: a unsigned, intent(in) :: b type(uc) :: c c%u = a%u + b if (c%u < a%u) error stop end function checked_add_cu recursive elemental function checked_add_cc(a,b) result(c) type(uc), intent(in) :: a, b type(uc) :: c c%u = a%u + b%u if (c%u < a%u) error stop end function checked_add_cc recursive elemental subroutine set_uc(a,b) type(uc), intent(in) :: b unsigned, intent(out) :: a a = b%u end subroutine set_uc recursive elemental subroutine set_cu(a,b) unsigned, intent(in) :: b type(uc), intent(out) :: a a%u = b end subroutine set_cu end module checked_unsigned program main use checked_unsigned implicit none type(uc) :: a a = huge(a%u) print *,a + 1u end program main

That took me a about 10 minutes to type up, because Fortran is a verbose language. The above could also easily be extended as a PDT, which I didn't fell like doing tonight.

To be really effective for checking, this could use two intrinsics: One more easily check for overflow (that would also greatly help integers) and something like gcc_unreachable, let's say UNREACHABLE(). Depending on the settings, the compiler could ignore it completely, make optimizations on the assumption that it will never be reached, or treat it as an ERROR STOP. This could even be an intrinsic module for people who are concerned about users getting it wrong.

And my point of laying traps for people who are used to C still stands. Yes, we have deviations from what C does, but these are caught at compile-time. A compiler doing time-travel optimization based on the assumption that overflow cannot happen will bite people, and believe me - that will happen.

You keep bringing up Rust. Rust is a bad example - Fortran is not rust, its behavior of trapping on debug and prescrbing wrapping on non-debug builds is silly, and Fortran should never be allowed to use unsigned integers for array indices, like Rust does. Everybody has to convert to an INTEGER anyway before using an UNSIGNED as an array index or for a character substring, so the danger of users falling into that particular trap is not something that I am concerned about at all. Everything which counts things in intrinsics etc is left well alone, for that reason.

You keep bringing up Rust. Rust is a bad example - Fortran is not rust,

And you keep bringing up C ("Fortran unsigned should be modular becasue C unsigned is"), but Fortran is not C. Whether the Rust approach isgood or not, at least they didn't try just copying C.

Introduce a derived type like the cubelow [...] That took me a about 10 minutes to type up,

It took you 10mn because you just wrote the addition part. One also need the subraction, the multiplication, the division, and the DT versions of all the intrinsics that work with unsigned. Granted, once written it's OK (*). But if we want to avoid everyone reinventing the wheel by writing their own version (and incompatible with each other version) it really should be an intrinsic module iso_checked_unsigned that provide type(uc) and all the needed routines. Then at this point:

This is now very close to natively providing two intrinsic types modular unsigned and unchecked unsigned, that was the option 1 of @certik. Why having only one of them as intrinsic and the other one as derived type ?

Why not the opposite, the intrinsic type being the (un)checked one, and the derived type being for the wrapped behavior?

About the performances: what you wrote above is a derived type that always check the overflows, which is not what people want most of time. One need something as fast as possible in "release mode", without any check, and possibly enabled checks by the compiler in debug mode. For that you are proposing new intrinsics that would be dedicated to overflow checking (and that should be standardized first, which is definitely not granted), and that would be active or not depending on some compiler switches... Now it gets extremely confusing IMO, especially if it's buried inside an intrinsic module: one has to choose between a type with checked overflows or unchecked overflows, it cannot be "processor dependent".

(*) which doesn't mean it's as convenient as an intrinsic type anyway... e.g. it's not possible to directly write a literal constant with a derived type.

@PierUgit It seems you did not read the entirety of my post. Please re-read it in its entirety, and you will find that two possible solutions to the "always checking" part are already contained in there.

proposals/unsigned/unsigned.txt

Arguments must still be added.

tkoenig1 · 2024-10-25T16:26:28Z

@certik : You are not addressing the point that making this undefined does, in fact, prohibit use of modular arithmetic for portable code. In the guise of allowing both solutions, you are in fact only allowing one.

Do you fail to understand this point?

certik · 2024-10-25T16:52:47Z

You are not addressing the point that making this undefined does, in fact, prohibit use of modular arithmetic for portable code.

@tkoenig1 It's addressed here: https://github.com/j3-fortran/fortran_proposals/pull/341/files#diff-93e619e2d42ff60c769ace351e37bb16d99221ea09ee27f907d88cc71920cfd8R205:

- Add the following new intrinsics (names to be decided):
    - ADD_WRAPPING(UNSIGNED, UNSIGNED) (or ADD_MODULAR) for modular
      aritmetics; the same for SUB, MUL, DIV

I should probably add examples of the random number generator and other examples to make it clear that this proposal in fact does allow you to use the modular arithmetic in a portable way.

Here is an example of code that I wrote in the past:

subroutine lcg(x)
! Linear congruential generator https://en.wikipedia.org/wiki/Linear_congruential_generator
use iso_fortran_env, only: int32
integer(int32), intent(out) :: x
integer(int32), save :: s = 26250493
s = modulo(s * 48271, 2147483647)
x = s
end subroutine

I think this is not portable, since it depends on s * 48271 wrapping around correctly (I think the modulo doesn't quite take care of it).

Using unsigned it would look like this:

subroutine lcg(x)
! Linear congruential generator https://en.wikipedia.org/wiki/Linear_congruential_generator
use iso_fortran_env, only: uint32
integer(uint32), intent(out) :: x
integer(uint32), save :: s = 26250493u
s = modulo(mul_wrapping(s, 48271u), 2147483647u)
x = s
end subroutine

This should be portable. I think one can write lcg to be as simple as s = mul_wrapping(s, 48271u), the wraparound takes care of the modulo for the right modulus.

klausler · 2024-10-25T16:55:08Z

proposals/unsigned/unsigned.txt

+
+- Add the following new intrinsics (names to be decided):
+    - ADD_WRAPPING(UNSIGNED, UNSIGNED) (or ADD_MODULAR) for modular
+      arithmetics; the same for SUB, MUL, DIV


why DIV? unsigned division, unlike signed, can't "overflow".

klausler · 2024-10-25T16:56:33Z

proposals/unsigned/unsigned.txt

+
+- We can consider also adding the following intrinsics:
+    - Possibly add ADD_SATURATING: saturating at the numeric bounds
+      instead of overflowing; the same for SUB, MUL, DIV


unsigned division never results in a value greater than the numerator

klausler · 2024-10-25T16:57:23Z

proposals/unsigned/unsigned.txt

+      instead of overflowing; the same for SUB, MUL, DIV
+    - Possibly add ADD_OVERFLOWING (or ADD_CHECKED): indicates with a
+      flag that overflow occurred, wrapped value is returned; the same
+      for SUB, MUL, DIV


would this intrinsic handle the case of division by zero? if not, it's redundant

kargl · 2024-10-25T17:53:15Z

@certik, your lcg example is fairly sparse code. Go to issue #2 where I posted Marsaglia's KISS generator written with add_wrapping and mul_wrapping. It quickly becomes difficult to read and parse. Why open a second issue? Now, we have 2 places to discuss unsigned.

certik · 2024-10-25T18:41:41Z

@kargl thank you again for the code, this is exactly the kind of discussion that moves us forward.

This thread is a PR with the proposal itself that I plan to submit to the committee, after incorporating all your feedback.

tkoenig1 · 2024-10-27T13:55:54Z

@certik : Please also mention that gfortran and f18 have already implemented (are about to implement) J3/24-116.txt, and that your proposal will create incompatibility if the name UNSIGNED is used. You can cite Unsigned integers for this (also for a list of useful intrinsics to implement).

PierUgit · 2024-10-27T18:54:18Z

@certik : Please also mention that gfortran and f18 have already implemented (are about to implement) J3/24-116.txt, and that your proposal will create incompatibility if the name UNSIGNED is used. You can cite Unsigned integers for this (also for a list of useful intrinsics to implement).

This would actually be wise to use the name modular or unsigned modular instead of unsigned in the gfortran and f18 prototypes. Not only because this is what the prototype is really about, but also because it would prevent such incompatibility in the case unsigned is finally standardized with a different behavior from these prototypes.

kargl · 2024-10-27T19:59:36Z

@certik : Please also mention that gfortran and f18 have already implemented (are about to implement) J3/24-116.txt, and that your proposal will create incompatibility if the name UNSIGNED is used. You can cite Unsigned integers for this (also for a list of useful intrinsics to implement).

This would actually be wise to use the name modular or unsigned modular instead of unsigned in the gfortran and f18 prototypes. Not only because this is what the prototype is really about, but also because it would prevent such incompatibility in the case unsigned is finally standardized with a different behavior from these prototypes.

Completely disagree. gfortran and flang implemented the feature as proposed in 24-116.txt. These are prototypes.

Should @certik change his prototype implementation for templates to use PATTERN and NEEDED (or other synonyms) for TEMPLATE and REQUIREMENT?

tkoenig1 · 2024-10-27T20:09:32Z

@certik : Please also mention that gfortran and f18 have already implemented (are about to implement) J3/24-116.txt, and that your proposal will create incompatibility if the name UNSIGNED is used. You can cite Unsigned integers for this (also for a list of useful intrinsics to implement).

This would actually be wise to use the name modular or unsigned modular instead of unsigned in the gfortran and f18 prototypes. Not only because this is what the prototype is really about,

Nope, this is about much more.

Consider

module tst
  use iso_c_binding, only : c_uint32_t
  type, bind(c) :: mytype
     unsigned(c_uint32_t) :: a,b
  end type mytype
end module tst

Run it through the development version of gfortran, and you will get
(snipped some preprocessor stuff)

typedef struct mytype {
    unsigned int a;
    unsigned int b;
} mytype;

(note that unsigned int is equivalent to uint32_t in the companion processor, i.e. gcc).

PierUgit · 2024-10-28T22:59:07Z

Completely disagree. gfortran and flang implemented the feature as proposed in 24-116.txt. These are prototypes.

Should @certik change his prototype implementation for templates to use PATTERN and NEEDED (or other synonyms) for TEMPLATE and REQUIREMENT?

The situation is different: 24-116.txt has been rejected at the WG5 level, so there's a chance that, should an unsigned type be eventually adopted in F202Y (or F203X), it be different from the proposal.

And after all, if it's "just" a prototype, why warning the committee about a possible incompatibility?

Nope, this is about much more.

OK, it's also about C interoperability. But since the unsigned types in C are modular, using the name modular or modular integer in Fortran still makes sense. modular integer (kind=c_uint32_t) :: looks ok.

kargl · 2024-10-29T03:26:30Z

Completely disagree. gfortran and flang implemented the feature as proposed in 24-116.txt. These are prototypes.
Should @certik change his prototype implementation for templates to use PATTERN and NEEDED (or other synonyms) for TEMPLATE and REQUIREMENT?

The situation is different: 24-116.txt has been rejected at the WG5 level, so there's a chance that, should an unsigned type be eventually adopted in F202Y (or F203X), it be different from the proposal.

And after all, if it's "just" a prototype, why warning the committee about a possible incompatibility?

flang and gfortran are two different compilers. So, instead of implementing 24-116.txt as written
and passed by J3, compiler vendors and developers are expected to contact each other to settle on
a new different syntax. What if one chose modular and the other chose nonnegative? There are
now incompatible prototypes.

With regards to TEMPLATE, I'll note J3 just passed 24-159r4.txt, Syntax for Instantiation of Standalone
Template Procedure, today. There are an additional 5 papers with changes to templates pending. Again,
should @certik choose synonyms for the keywords to avoid conflicts with his prototype and the final spec?

certik · 2024-10-29T21:40:49Z

@kargl your example with templates is a good one: the very paper 24-159r4.txt you quoted actually standardized a different syntax than LFortran implemented, as we implemented a previous proposal. It's not a big deal to change or support both. I was planning to create some poll, and maybe I'll still do it, but at least there is still the door open to later do the syntax that I think is better, if we want to. No codes will break, it would be a smooth "upgrade" like the (/ /) vs [] for arrays.

With this unsigned type, I am trying to achieve something similar, so that we can have a smooth upgrade later (if we want to), and yet at the same time we get the job done today to actually have an unsigned type in Fortran.

Couple points that came above:

In C, "unsigned int" means "modular integer". Once you start seeing every such declaration with this "prism", you notice that sometimes it is the perfect fit, like the KISS example (it's precisely the type you want to use there). But in most other cases (judging from my own C and C++ codes) it's not the right type.
The use case in Add new unsigned proposal #341 (comment) is most likely not "modular integer", it's "unsigned" (as defined in this PR, no wraparound on "+"). It's actually impossible to tell, one would have to see how you use it. But if you are just interfacing C, and then using it, say you extract "size" from some vector like data structure from C++, and then you operate on it from Fortran, like querying item_number > size - 1, then you do not want "modular integer", just "unsigned", so that the expression size-1 behaves correctly (you would get a runtime error "underflow" if size=0, instead of a silent wraparound, thus a bug).
I am happy that both GFortran and Flang prototyped this. As long as it is seen as a prototype and both authors / teams are willing to change it, then there is no problem. I think you don't need to discuss with anybody to create a prototype, you should just do it. So I applaud both @tkoenig1 and @klausler for creating a prototype, which for example uncovered all the intrinsics that need to be updated and all the details. I think this is great. Thank you.
If the compiler prototype is now to be used to make others adapt exactly the same details, as well as encourage users to use it so that a critical mass is created, and this will then force the committee to standardize it, then I also agree, but under one condition: it must be the "right prototype". We must get the details right. Because if we don't, it will be impossible to undo later if it becomes clear we made a mistake, like C++ did.
From a simple operational perspective, the proposal in this PR moves us in the right direction:
- we can finally interface with C easily
- we can do the KISS example (but unfortunately it is more verbose than it could be with a dedicated "modular integer" type)
- I think we can do any other use case as well
- it should be possible to get support at the committee for it as well
- it doesn't close the door for further "upgrades"
- it goes as far as it can, while we all still agree
we can still consider either adding the dedicated "modular integer" type, or "wraparound by default" to simplify the KISS example, as a separate proposal (or decide to do nothing). We can even submit the base proposal, and then two optional proposals on top. That's actually not a bad idea, that way GFortran/Flang prototyped a combo of "base + wraparound" exactly. One downside: the add_wrapping functions would not be needed, unless we make them working for signed integers also (not sure if it is a good idea or not). LFortran could implement the "base + nothing" or "base + modular integer".
There is one more issue: casting signed to unsigned. I believe in Rust if you cast "-1" to unsigned, you get an error. In this PR one solution is to leave it undefined, another solution is to also add a dedicated function cast_wrapping, or something like that.

certik added 2 commits October 25, 2024 07:58

Add original 24-116 proposal

3b67982

Add new unsigned proposal

c475bf7

klausler reviewed Oct 25, 2024

View reviewed changes

Add new intrinsics

0e6544d

klausler reviewed Oct 25, 2024

View reviewed changes

proposals/unsigned/unsigned.txt Show resolved Hide resolved

klausler reviewed Oct 25, 2024

View reviewed changes

proposals/unsigned/unsigned.txt Show resolved Hide resolved

certik added 2 commits October 25, 2024 09:35

Add OUT_OF_RANGE

8bae519

Add more array intrinsics

af6dd67

Arguments must still be added.

certik mentioned this pull request Oct 25, 2024

Unsigned integers #2

Open

klausler reviewed Oct 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new unsigned proposal #341

Add new unsigned proposal #341

certik commented Oct 25, 2024 •

edited

Loading

klausler Oct 25, 2024

certik Oct 25, 2024

klausler Oct 25, 2024 •

edited by certik

Loading

klausler Oct 25, 2024

certik Oct 25, 2024 •

edited

Loading

klausler Oct 25, 2024

certik Oct 25, 2024

tkoenig1 Oct 25, 2024

klausler Oct 26, 2024

certik Oct 29, 2024 •

edited

Loading

tkoenig1 Oct 29, 2024

PierUgit Oct 30, 2024 •

edited

Loading

tkoenig1 Oct 30, 2024

tkoenig1 commented Oct 25, 2024

certik commented Oct 25, 2024 •

edited

Loading

klausler Oct 25, 2024

klausler Oct 25, 2024

klausler Oct 25, 2024

kargl commented Oct 25, 2024

certik commented Oct 25, 2024

tkoenig1 commented Oct 27, 2024

PierUgit commented Oct 27, 2024

kargl commented Oct 27, 2024 •

edited

Loading

tkoenig1 commented Oct 27, 2024

PierUgit commented Oct 28, 2024 •

edited

Loading

kargl commented Oct 29, 2024 •

edited by certik

Loading

certik commented Oct 29, 2024 •

edited

Loading

Add new unsigned proposal #341

Are you sure you want to change the base?

Add new unsigned proposal #341

Conversation

certik commented Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klausler Oct 25, 2024 • edited by certik Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

certik Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

certik Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PierUgit Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkoenig1 commented Oct 25, 2024

certik commented Oct 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kargl commented Oct 25, 2024

certik commented Oct 25, 2024

tkoenig1 commented Oct 27, 2024

PierUgit commented Oct 27, 2024

kargl commented Oct 27, 2024 • edited Loading

tkoenig1 commented Oct 27, 2024

PierUgit commented Oct 28, 2024 • edited Loading

kargl commented Oct 29, 2024 • edited by certik Loading

certik commented Oct 29, 2024 • edited Loading

certik commented Oct 25, 2024 •

edited

Loading

klausler Oct 25, 2024 •

edited by certik

Loading

certik Oct 25, 2024 •

edited

Loading

certik Oct 29, 2024 •

edited

Loading

PierUgit Oct 30, 2024 •

edited

Loading

certik commented Oct 25, 2024 •

edited

Loading

kargl commented Oct 27, 2024 •

edited

Loading

PierUgit commented Oct 28, 2024 •

edited

Loading

kargl commented Oct 29, 2024 •

edited by certik

Loading

certik commented Oct 29, 2024 •

edited

Loading