Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discuss calling conventions #4

Open
jeffhammond opened this issue Nov 18, 2022 · 13 comments
Open

discuss calling conventions #4

jeffhammond opened this issue Nov 18, 2022 · 13 comments

Comments

@jeffhammond
Copy link
Member

jeffhammond commented Nov 18, 2022

Problem

ABI includes calling conventions, which are not standardized.

Proposal

We should not go too deep on this. We should merely state that the MPI library must support the calling convention of the system C compiler on the platform. This is going to be trivial in most cases, and is already widely accepted, since it is necessary in many contexts.

Changes to the Text

Describe the issue, and mention that at least one major architecture (x86) has multiple conventions:

We can cite the ARM calling conventions that are standardized:

Impact on Implementations

None. There is no change to existing practice here.

Impact on Users

None, unless they are using C toolchains that are not compatible with the one used to build MPI.

References and Pull Requests

https://en.wikipedia.org/wiki/X86_calling_conventions

@jeffhammond jeffhammond changed the title discussing calling conventions discuss calling conventions Nov 18, 2022
@besnardjb
Copy link

For info reminds me of one place where the standard mentions what is for me "calling" convention (in fact seem to call it linkage), we may look at it too:

19.1.17 Problems with Code Movement and Register Optimization
Nonblocking Operations
If a variable is local to a Fortran subroutine (i.e., not in a module or a COMMON block), the compiler will assume that it cannot be modified by a called subroutine unless it is an actual argument of the call. In the most common linkage convention, the subroutine is expected to save and restore certain registers. Thus, the optimizer will assume that a register which held a valid copy of such a variable before the call will still hold a valid copy on return.

@besnardjb
Copy link

This same section also explains why all the _begin & _end are in MPI-IO, passing a buffer argument. I do not think it is still the case today -- Fortran is amazing. I feel it is something that should go away with 77, looks like leftovers to me. What is interesting is that to my understanding, these functions were created as mitigation for the calling convention:

This register optimization/code movement problem for nonblocking operations does not occur with MPI parallel file I/O split collective operations, because in the MPI_XXX_BEGIN and MPI_XXX_END calls, the same buffer has to be provided as an actual argument.

@jeffhammond
Copy link
Member Author

Fortran ASYNCHRONOUS solves this, but it only does what we need in Fortran 2018 (although I know of no compiler that doesn't do what we need in practice as of Fortran 2008).

We are working deprecating mpif.h now. Sadly, I don't think we will ever get rid of mpi.mod.

@jeffhammond
Copy link
Member Author

My goal is to leverage the C ABI to write a very nice set of new Fortran bindings that are free from the burden of standardization and can fix all sorts of things like this.

https://github.com/jeffhammond/standalone_mpi_f08_module

@besnardjb
Copy link

This is excellent 🔥 . Yes standardizing two languages (in fact looking at Fortran itself, it is more than 1) is a great source of complexity...

@jeffhammond
Copy link
Member Author

One of the goals in the F08 bindings was to make it possible to write them in a (mostly) implementation-agnostic way on top of the C bindings, but that didn't happen, although the situation is a lot better than with F90.

The standalone F08 experiment has been quite useful in identifying ABI issues...

@wrwilliams
Copy link

Do we want to consider symbol names/visibility as part of calling conventions or break it into its own separate issue?

@jeffhammond
Copy link
Member Author

Please create a separate issue so we know to address it. However, we might be able to solve both at the same time.

@gonzalobg
Copy link

We should merely state that the MPI library must support the calling convention of the system C compiler on the platform.

I think that we should specify the ABI as a C header, say that "it is C", and that's it.

Every platform already has a C ABI specification that specifies how to interface with C on that platform. This covers way more than just the calling convention.

Its not the job of the MPI spec to specify how C (or Fortran) programs interface with each other on particular platforms. It is the platform job to do that, because there are many C and Fortran programs that are not MPI programs and need to interface with each other anyways.

@jeffhammond
Copy link
Member Author

We should merely state that the MPI library must support the calling convention of the system C compiler on the platform.

I think that we should specify the ABI as a C header, say that "it is C", and that's it.

Yes, this is my intent, although I want to add "...as if compiled with the system default C compiler and runtime library," since that addresses the situation with glibc not being the only C RTL on Linux. Alpine uses MUSL.

I don't know to what extent https://wiki.musl-libc.org/functional-differences-from-glibc.html will impact MPI implementations, but we need to be cautious about assuming too much.

Every platform already has a C ABI specification that specifies how to interface with C on that platform. This covers way more than just the calling convention.

Windows apparently does not have a default calling convention. MPI does not assume any operating system and has been designed through its history to support a wide range of operating systems, including ones that are quite strange.

@gonzalobg
Copy link

Yes, this is my intent, although I want to add "...as if compiled with the system default C compiler and runtime library," since that addresses the situation with glibc not being the only C RTL on Linux. Alpine uses MUSL.

Both MUSL and glibc follow the same platform ABI (e.g. x86_64 psABI on x86_64 Linux) and one can have a binary that uses MUSL and calls into a library that uses glibc, for example, passing values like an int back and forth without issues: both agree on the layout and calling convention of an int.

However, since these are two different C standard libraries, what one cannot do is, e.g., allocate memory with malloc on the MUSL side, and free it with free on the glibc side, since these are two separate allocators, and while both MUSL and glibc provide a pthreads mutex, these mutexes have different ABIs since the platform does not specify an ABI for them, so one can't try to share a mutex across both parts of a binary, etc.

Is there an MPI API for which the ABI specified by the platforms do not suffice? For example, an MPI API where the MPI user passes the library a pointer to a mutex, that the application obtained throughout a non-MPI API, or an API where the application passes the MPI API a pointer to memory that the application allocated but MPI is expected to free, or vice-versa?

If not, and the application and the MPI API only pass values specified by the platforms ABIs, then saying more than "it is C" would probably not be necessary.

@gonzalobg
Copy link

Windows apparently does not have a default calling convention.

Windows have multiple default calling conventions for C, depending on the Windows target used (e.g. x86_64-pc-windows-msvc vs x86_64-pc-windows-mingw). These targets do not interoperate with each other and are therefore in practice treated as different platforms, but each of these targets has a stable C ABI that allows all C software on that target to interoperate.

@jeffhammond
Copy link
Member Author

Yes, this is my intent, although I want to add "...as if compiled with the system default C compiler and runtime library," since that addresses the situation with glibc not being the only C RTL on Linux. Alpine uses MUSL.

Both MUSL and glibc follow the same platform ABI (e.g. x86_64 psABI on x86_64 Linux) and one can have a binary that uses MUSL and calls into a library that uses glibc, for example, passing values like an int back and forth without issues: both agree on the layout and calling convention of an int.

However, since these are two different C standard libraries, what one cannot do is, e.g., allocate memory with malloc on the MUSL side, and free it with free on the glibc side, since these are two separate allocators, and while both MUSL and glibc provide a pthreads mutex, these mutexes have different ABIs since the platform does not specify an ABI for them, so one can't try to share a mutex across both parts of a binary, etc.

Is there an MPI API for which the ABI specified by the platforms do not suffice? For example, an MPI API where the MPI user passes the library a pointer to a mutex, that the application obtained throughout a non-MPI API, or an API where the application passes the MPI API a pointer to memory that the application allocated but MPI is expected to free, or vice-versa?

I don't think there is. MPI is very conservative about what it assumes from the system. Technically, we don't assume all processes can do language-standard I/O (see §9.1.2), and, if nothing else, our Fortran support does not assume C memory management exists.

If not, and the application and the MPI API only pass values specified by the platforms ABIs, then saying more than "it is C" would probably not be necessary.

You are probably right, but we should discuss this in detail. I am hoping that our friends from Red Hat and Canonical can provide some expert guidance on platform ABI assumes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants