Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new unsigned proposal #341

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

certik
Copy link
Member

@certik certik commented Oct 25, 2024

The new proposal is in the unsigned.txt. I also committed the J3/24-116 proposal for reference.

Go ahead and suggest improvements.

* `bits`: bit operations are defined, arithmetic operations do not
wraparound or are not defined
* `modular`: arithmetic operations wraparound using modular 2^n
arithmetic

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was also discussing of clamped or saturating semantics. Which would be a bad default, but if you're going to list all of the choices, you missed that one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Btw, I added the appropriate intrinsics in 0e6544d.


An UNSIGNED with n bits has a value range between 0 and 2^n-1.
(Note that Fortran model integers have values between -2^(n-1)+1 and
2^(n-1)-1).
Copy link

@klausler klausler Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in practice, the range is $-2^{(n-1)}$ to $2^{(n-1)-1}$ with two's-complement interpretation. Fortran's model permits ones'-complement.

(Note that Fortran model integers have values between -2^(n-1)+1 and
2^(n-1)-1).

## 2.2 Arithmetic overflow is undefined

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is contrary to actual hardware behavior (truncation), 50+ years of expectations from C/C++, and current implementations of UNSIGNED (Sun, gfortran 15, LLVM Flang, and XLF's unsigned vector elements).

Copy link
Member Author

@certik certik Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal still leaves the door open to later add default wraparound on overflow, if the committee/community wishes to do that.

(Note that you literally just implemented a prototype yesterday into LLVM Flang as a Draft PR: llvm/llvm-project#113504, so I don't know how much this qualifies as "current implementation".)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the time your glacial process gets around to making any kind of progress, it will have been available for years.

Weren't you the guy who wanted the committee to validate concepts with prototypes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the time your glacial process gets around to making any kind of progress, it will have been available for years.

It's not my process and I don't agree with it. I tried to change things, including this very repository or running for committee's leadership or starting a compiler. I don't know what else you want me to do.

Weren't you the guy who wanted the committee to validate concepts with prototypes?

Yes, I've been pushing for that for years. Every proposal needs to have a prototype.

But you should implement the right prototype, and try to reach a consensus with it. So far you implemented a design that you like, and are hoping that if LLVM Flang is successful it will become the de-facto standard. That is no different to how the committee has been operating, not always, but sometimes the decision is effectively done by one person without input from others on the committee or wider community. You rightfully object to that, and I agree. But if your solution to this is problem is to simply push things to LLVM Flang because you have been one of the main authors, without feedback from others, then this is no better than what the committee does, and I object to that too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you should implement the right prototype, and try to reach a consensus with it. So far you implemented a design that you like, and are hoping that if LLVM Flang is successful it will become the de-facto standard.

Note that LLVM is not alone in implementing this, this is coordinated (and compatible) with what gfortran does.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • you do not arbitrarily add modulo operation (wraparound) in arithmetic, so that you can safely use size - 1 and other common operations. If you make a mistake, the compiler will tell you, instead of being required to wraparound and not tell you.

I think that you're assuming two things here: that compilers will always catch over/underflow in constant expression if the language doesn't define the behavior to be modular, and that they can't support options to emit a warning in those cases if the language does define the behavior to be modular.

Copy link
Member Author

@certik certik Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that compilers will always catch over/underflow in constant expression if the language doesn't define the behavior to be modular

I am assuming the compiler can always catch it, and indeed that seems to be possible to implement, if you play with such expressions in Rust, you will always get either compile-time or runtime errors. However in "Release" mode users would typically disable such checks (unlike in Rust where I think it's not possibly to disable it). Also some compilers can decide not to implement such checks and it would not break anything.

that they can't support options to emit a warning in those cases if the language does define the behavior to be modular.

The underflow size-1 check is a fundamentally runtime check. If we standardize that size-1 must wraparound, how can the compiler insert runtime checks for this? The KISS example would stop working then. If the wraparound is standardized, I think it is not possible for the compiler to tell if you want modular arithmetic or not for expression like size-1. Please correct me if I am wrong.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The underflow size-1 check is a fundamentally runtime check. If we standardize that size-1 must wraparound, how can the compiler insert runtime checks for this?

Introduce a derived type like the cubelow.

module checked_unsigned
  implicit none
  private
  public :: uc, operator(+), assignment(=)
  type uc
     unsigned :: u
  end type uc
  interface operator(+)
     procedure checked_add_uc
     procedure checked_add_cu
     procedure checked_add_cc
  end interface operator(+)
  interface assignment(=)
     procedure set_uc
     procedure set_cu
  end interface assignment(=)
contains
  recursive elemental function checked_add_uc(a,b) result(c)
    unsigned, intent(in) :: a
    type(uc), intent(in) :: b
    type(uc) :: c
    c%u = a + b%u
    if (c%u < a) error stop
  end function checked_add_uc
  recursive elemental function checked_add_cu(a,b) result(c)
    type(uc), intent(in) :: a
    unsigned, intent(in) :: b
    type(uc) :: c
    c%u = a%u + b
    if (c%u < a%u) error stop
  end function checked_add_cu
  recursive elemental function checked_add_cc(a,b) result(c)
    type(uc), intent(in) :: a, b
    type(uc) :: c
    c%u = a%u + b%u
    if (c%u < a%u) error stop
  end function checked_add_cc
  recursive elemental subroutine set_uc(a,b)
    type(uc), intent(in) :: b
    unsigned, intent(out) :: a
    a = b%u
  end subroutine set_uc
  recursive elemental subroutine set_cu(a,b)
    unsigned, intent(in) :: b
    type(uc), intent(out) :: a
    a%u = b
  end subroutine set_cu
end module checked_unsigned

program main
  use checked_unsigned
  implicit none
  type(uc) :: a
  a = huge(a%u)
  print *,a + 1u
end program main

That took me a about 10 minutes to type up, because Fortran is a verbose language. The above could also easily be extended as a PDT, which I didn't fell like doing tonight.

To be really effective for checking, this could use two intrinsics: One more easily check for overflow (that would also greatly help integers) and something like gcc_unreachable, let's say UNREACHABLE(). Depending on the settings, the compiler could ignore it completely, make optimizations on the assumption that it will never be reached, or treat it as an ERROR STOP. This could even be an intrinsic module for people who are concerned about users getting it wrong.

And my point of laying traps for people who are used to C still stands. Yes, we have deviations from what C does, but these are caught at compile-time. A compiler doing time-travel optimization based on the assumption that overflow cannot happen will bite people, and believe me - that will happen.

You keep bringing up Rust. Rust is a bad example - Fortran is not rust, its behavior of trapping on debug and prescrbing wrapping on non-debug builds is silly, and Fortran should never be allowed to use unsigned integers for array indices, like Rust does. Everybody has to convert to an INTEGER anyway before using an UNSIGNED as an array index or for a character substring, so the danger of users falling into that particular trap is not something that I am concerned about at all. Everything which counts things in intrinsics etc is left well alone, for that reason.

Copy link
Contributor

@PierUgit PierUgit Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You keep bringing up Rust. Rust is a bad example - Fortran is not rust,

And you keep bringing up C ("Fortran unsigned should be modular becasue C unsigned is"), but Fortran is not C. Whether the Rust approach isgood or not, at least they didn't try just copying C.

Introduce a derived type like the cubelow [...] That took me a about 10 minutes to type up,

It took you 10mn because you just wrote the addition part. One also need the subraction, the multiplication, the division, and the DT versions of all the intrinsics that work with unsigned. Granted, once written it's OK (*). But if we want to avoid everyone reinventing the wheel by writing their own version (and incompatible with each other version) it really should be an intrinsic module iso_checked_unsigned that provide type(uc) and all the needed routines. Then at this point:

  • This is now very close to natively providing two intrinsic types modular unsigned and unchecked unsigned, that was the option 1 of @certik. Why having only one of them as intrinsic and the other one as derived type ?
  • Why not the opposite, the intrinsic type being the (un)checked one, and the derived type being for the wrapped behavior?

About the performances: what you wrote above is a derived type that always check the overflows, which is not what people want most of time. One need something as fast as possible in "release mode", without any check, and possibly enabled checks by the compiler in debug mode. For that you are proposing new intrinsics that would be dedicated to overflow checking (and that should be standardized first, which is definitely not granted), and that would be active or not depending on some compiler switches... Now it gets extremely confusing IMO, especially if it's buried inside an intrinsic module: one has to choose between a type with checked overflows or unchecked overflows, it cannot be "processor dependent".

(*) which doesn't mean it's as convenient as an intrinsic type anyway... e.g. it's not possible to directly write a literal constant with a derived type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PierUgit It seems you did not read the entirety of my post. Please re-read it in its entirety, and you will find that two possible solutions to the "always checking" part are already contained in there.

@certik certik mentioned this pull request Oct 25, 2024
@tkoenig1
Copy link

@certik : You are not addressing the point that making this undefined does, in fact, prohibit use of modular arithmetic for portable code. In the guise of allowing both solutions, you are in fact only allowing one.

Do you fail to understand this point?

@certik
Copy link
Member Author

certik commented Oct 25, 2024

You are not addressing the point that making this undefined does, in fact, prohibit use of modular arithmetic for portable code.

@tkoenig1 It's addressed here: https://github.com/j3-fortran/fortran_proposals/pull/341/files#diff-93e619e2d42ff60c769ace351e37bb16d99221ea09ee27f907d88cc71920cfd8R205:

- Add the following new intrinsics (names to be decided):
    - ADD_WRAPPING(UNSIGNED, UNSIGNED) (or ADD_MODULAR) for modular
      aritmetics; the same for SUB, MUL, DIV

I should probably add examples of the random number generator and other examples to make it clear that this proposal in fact does allow you to use the modular arithmetic in a portable way.

Here is an example of code that I wrote in the past:

subroutine lcg(x)
! Linear congruential generator https://en.wikipedia.org/wiki/Linear_congruential_generator
use iso_fortran_env, only: int32
integer(int32), intent(out) :: x
integer(int32), save :: s = 26250493
s = modulo(s * 48271, 2147483647)
x = s
end subroutine

I think this is not portable, since it depends on s * 48271 wrapping around correctly (I think the modulo doesn't quite take care of it).

Using unsigned it would look like this:

subroutine lcg(x)
! Linear congruential generator https://en.wikipedia.org/wiki/Linear_congruential_generator
use iso_fortran_env, only: uint32
integer(uint32), intent(out) :: x
integer(uint32), save :: s = 26250493u
s = modulo(mul_wrapping(s, 48271u), 2147483647u)
x = s
end subroutine

This should be portable. I think one can write lcg to be as simple as s = mul_wrapping(s, 48271u), the wraparound takes care of the modulo for the right modulus.


- Add the following new intrinsics (names to be decided):
- ADD_WRAPPING(UNSIGNED, UNSIGNED) (or ADD_MODULAR) for modular
arithmetics; the same for SUB, MUL, DIV

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why DIV? unsigned division, unlike signed, can't "overflow".


- We can consider also adding the following intrinsics:
- Possibly add ADD_SATURATING: saturating at the numeric bounds
instead of overflowing; the same for SUB, MUL, DIV

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsigned division never results in a value greater than the numerator

instead of overflowing; the same for SUB, MUL, DIV
- Possibly add ADD_OVERFLOWING (or ADD_CHECKED): indicates with a
flag that overflow occurred, wrapped value is returned; the same
for SUB, MUL, DIV

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this intrinsic handle the case of division by zero? if not, it's redundant

@kargl
Copy link

kargl commented Oct 25, 2024

@certik, your lcg example is fairly sparse code. Go to issue #2 where I posted Marsaglia's KISS generator written with add_wrapping and mul_wrapping. It quickly becomes difficult to read and parse. Why open a second issue? Now, we have 2 places to discuss unsigned.

@certik
Copy link
Member Author

certik commented Oct 25, 2024

@kargl thank you again for the code, this is exactly the kind of discussion that moves us forward.

This thread is a PR with the proposal itself that I plan to submit to the committee, after incorporating all your feedback.

@tkoenig1
Copy link

@certik : Please also mention that gfortran and f18 have already implemented (are about to implement) J3/24-116.txt, and that your proposal will create incompatibility if the name UNSIGNED is used. You can cite Unsigned integers for this (also for a list of useful intrinsics to implement).

@PierUgit
Copy link
Contributor

@certik : Please also mention that gfortran and f18 have already implemented (are about to implement) J3/24-116.txt, and that your proposal will create incompatibility if the name UNSIGNED is used. You can cite Unsigned integers for this (also for a list of useful intrinsics to implement).

This would actually be wise to use the name modular or unsigned modular instead of unsigned in the gfortran and f18 prototypes. Not only because this is what the prototype is really about, but also because it would prevent such incompatibility in the case unsigned is finally standardized with a different behavior from these prototypes.

@kargl
Copy link

kargl commented Oct 27, 2024

@certik : Please also mention that gfortran and f18 have already implemented (are about to implement) J3/24-116.txt, and that your proposal will create incompatibility if the name UNSIGNED is used. You can cite Unsigned integers for this (also for a list of useful intrinsics to implement).

This would actually be wise to use the name modular or unsigned modular instead of unsigned in the gfortran and f18 prototypes. Not only because this is what the prototype is really about, but also because it would prevent such incompatibility in the case unsigned is finally standardized with a different behavior from these prototypes.

Completely disagree. gfortran and flang implemented the feature as proposed in 24-116.txt. These are prototypes.

Should @certik change his prototype implementation for templates to use PATTERN and NEEDED (or other synonyms) for TEMPLATE and REQUIREMENT?

@tkoenig1
Copy link

@certik : Please also mention that gfortran and f18 have already implemented (are about to implement) J3/24-116.txt, and that your proposal will create incompatibility if the name UNSIGNED is used. You can cite Unsigned integers for this (also for a list of useful intrinsics to implement).

This would actually be wise to use the name modular or unsigned modular instead of unsigned in the gfortran and f18 prototypes. Not only because this is what the prototype is really about,

Nope, this is about much more.

Consider

module tst
  use iso_c_binding, only : c_uint32_t
  type, bind(c) :: mytype
     unsigned(c_uint32_t) :: a,b
  end type mytype
end module tst

Run it through the development version of gfortran, and you will get
(snipped some preprocessor stuff)

typedef struct mytype {
    unsigned int a;
    unsigned int b;
} mytype;

(note that unsigned int is equivalent to uint32_t in the companion processor, i.e. gcc).

@PierUgit
Copy link
Contributor

PierUgit commented Oct 28, 2024

Completely disagree. gfortran and flang implemented the feature as proposed in 24-116.txt. These are prototypes.

Should @certik change his prototype implementation for templates to use PATTERN and NEEDED (or other synonyms) for TEMPLATE and REQUIREMENT?

The situation is different: 24-116.txt has been rejected at the WG5 level, so there's a chance that, should an unsigned type be eventually adopted in F202Y (or F203X), it be different from the proposal.

And after all, if it's "just" a prototype, why warning the committee about a possible incompatibility?

Nope, this is about much more.

OK, it's also about C interoperability. But since the unsigned types in C are modular, using the name modular or modular integer in Fortran still makes sense. modular integer (kind=c_uint32_t) :: looks ok.

@kargl
Copy link

kargl commented Oct 29, 2024

Completely disagree. gfortran and flang implemented the feature as proposed in 24-116.txt. These are prototypes.
Should @certik change his prototype implementation for templates to use PATTERN and NEEDED (or other synonyms) for TEMPLATE and REQUIREMENT?

The situation is different: 24-116.txt has been rejected at the WG5 level, so there's a chance that, should an unsigned type be eventually adopted in F202Y (or F203X), it be different from the proposal.

And after all, if it's "just" a prototype, why warning the committee about a possible incompatibility?

flang and gfortran are two different compilers. So, instead of implementing 24-116.txt as written
and passed by J3, compiler vendors and developers are expected to contact each other to settle on
a new different syntax. What if one chose modular and the other chose nonnegative? There are
now incompatible prototypes.

With regards to TEMPLATE, I'll note J3 just passed 24-159r4.txt, Syntax for Instantiation of Standalone
Template Procedure
, today. There are an additional 5 papers with changes to templates pending. Again,
should @certik choose synonyms for the keywords to avoid conflicts with his prototype and the final spec?

@certik
Copy link
Member Author

certik commented Oct 29, 2024

@kargl your example with templates is a good one: the very paper 24-159r4.txt you quoted actually standardized a different syntax than LFortran implemented, as we implemented a previous proposal. It's not a big deal to change or support both. I was planning to create some poll, and maybe I'll still do it, but at least there is still the door open to later do the syntax that I think is better, if we want to. No codes will break, it would be a smooth "upgrade" like the (/ /) vs [] for arrays.

With this unsigned type, I am trying to achieve something similar, so that we can have a smooth upgrade later (if we want to), and yet at the same time we get the job done today to actually have an unsigned type in Fortran.

Couple points that came above:

  • In C, "unsigned int" means "modular integer". Once you start seeing every such declaration with this "prism", you notice that sometimes it is the perfect fit, like the KISS example (it's precisely the type you want to use there). But in most other cases (judging from my own C and C++ codes) it's not the right type.

  • The use case in Add new unsigned proposal #341 (comment) is most likely not "modular integer", it's "unsigned" (as defined in this PR, no wraparound on "+"). It's actually impossible to tell, one would have to see how you use it. But if you are just interfacing C, and then using it, say you extract "size" from some vector like data structure from C++, and then you operate on it from Fortran, like querying item_number > size - 1, then you do not want "modular integer", just "unsigned", so that the expression size-1 behaves correctly (you would get a runtime error "underflow" if size=0, instead of a silent wraparound, thus a bug).

  • I am happy that both GFortran and Flang prototyped this. As long as it is seen as a prototype and both authors / teams are willing to change it, then there is no problem. I think you don't need to discuss with anybody to create a prototype, you should just do it. So I applaud both @tkoenig1 and @klausler for creating a prototype, which for example uncovered all the intrinsics that need to be updated and all the details. I think this is great. Thank you.

  • If the compiler prototype is now to be used to make others adapt exactly the same details, as well as encourage users to use it so that a critical mass is created, and this will then force the committee to standardize it, then I also agree, but under one condition: it must be the "right prototype". We must get the details right. Because if we don't, it will be impossible to undo later if it becomes clear we made a mistake, like C++ did.

  • From a simple operational perspective, the proposal in this PR moves us in the right direction:

    • we can finally interface with C easily
    • we can do the KISS example (but unfortunately it is more verbose than it could be with a dedicated "modular integer" type)
    • I think we can do any other use case as well
    • it should be possible to get support at the committee for it as well
    • it doesn't close the door for further "upgrades"
    • it goes as far as it can, while we all still agree
  • we can still consider either adding the dedicated "modular integer" type, or "wraparound by default" to simplify the KISS example, as a separate proposal (or decide to do nothing). We can even submit the base proposal, and then two optional proposals on top. That's actually not a bad idea, that way GFortran/Flang prototyped a combo of "base + wraparound" exactly. One downside: the add_wrapping functions would not be needed, unless we make them working for signed integers also (not sure if it is a good idea or not). LFortran could implement the "base + nothing" or "base + modular integer".

  • There is one more issue: casting signed to unsigned. I believe in Rust if you cast "-1" to unsigned, you get an error. In this PR one solution is to leave it undefined, another solution is to also add a dedicated function cast_wrapping, or something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants