Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switched to static linking of llvm #100

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

timostrunk
Copy link

@timostrunk timostrunk commented Feb 13, 2025

I switched to static linking of libllvm here and added it to the ignore_run_exports on unix. I did not change the build behaviour on Windows as it seems to already build with default settings there and I have no means to test it.

This fixes #99 and #84.

Reasons against this PR:

  • Obviously this will increase binary size, because libllvm is now statically linked. Im explicitly tagging @xhochy here, because he was against static linking this.
  • If the symbol version issue would be identified and resolved it would be cleaner

Reasons to merge this PR:

  • It resolves a hard to debug issue, which occurs using libraries, where debugging skill is required to find out, why a segmentation fault actually happened
  • It can easily be reverted, once the symbol isolation is gone.

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@timostrunk
Copy link
Author

@conda-forge-admin, please rerender

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@timostrunk
Copy link
Author

ppc64le still broken, will look into it

@timostrunk
Copy link
Author

ppc64le still broken, will look into it

build works locally on cross-compile and goes further than action runner. Test fail (as expected) due to invalid binary format.

@timostrunk timostrunk marked this pull request as draft February 14, 2025 10:57
@timostrunk
Copy link
Author

I will try to dynamic link on ppc64le still. I think the native build has issues. The local crossbuild on my machine works though, but I don't want to sacrifice the tests, which I would need to do, if I let the github runner build on linux64.

Reverted the PR to draft, because there will be some noise.

@timostrunk
Copy link
Author

@conda-forge-admin, please rerender

@conda-forge-admin
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13329229741. Examine the logs at this URL for more detail.

@timostrunk timostrunk marked this pull request as ready for review February 14, 2025 13:09
@timostrunk
Copy link
Author

@jakirkham : This is now ready for review. ppc64le still builds shared, because I was unable to static link on a native ppc64le platform.

Copy link
Member

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already discussed at length previously ( #73 ), will not be accepting a change to static linking

Think we need to find a better way to address the issue users are encountering. One avenue may be changing the linker flags (stripping, symbol visibility, etc.)

@timostrunk
Copy link
Author

My analysis showed that the cause of the issue might not be in llvmlite, so changing the flags here won't change much. The issue is that libllvm15 links into libllvm19, so I presume the right place for a fix is there. If my analysis is right libllvm15 should be in conflict with libllvm19, as it is not safe to have both in the same environment.

Until this fix is found it is desastrous for the conda-forge ecosystem not to provide a workaround. Currently not merging this means that people using pytorch and numba in the same conda environment either

  1. face segmentation faults
  2. need to downpin pytorch and keep the pin until llvmlite gets libllvm19 compatibility at which point this starts working again 'by chance' and will most probably break again with the next libllvm release.

numba and pytorch is a very common combination. Therefore: Please reconsider merging this. It can be reverted once the issue is solved.

@timostrunk
Copy link
Author

Also in #73 you write:
"Users have been working with Numba on CUDA for some time without issues and this uses the dynamic linking solution we currently have

Not following the argument for static linking"

The reason this is working is only because numba and pytorch-cuda required the same libllvm version at this precise time. When libllvm14 was used by llvmlite and libllvm15 was used by pytorch-cuda, it broke. Now the same thing happens with libllvm15 and 19.

@h-vetinari
Copy link
Member

As already discussed at length previously ( #73 ), will not be accepting a change to static linking

Pity you did not respond on the thread I opened in the core channel. I will bring this to a vote within core.

@jakirkham
Copy link
Member

If this is indeed an LLVM issue, let's file on that feedstock. If there was already an issue filed and it was closed prematurely, am happy to reopen (provided a link)

@timostrunk
Copy link
Author

This is the issue I opened on the llvm-dev feedstock: conda-forge/llvmdev-feedstock#312

@beckermr
Copy link
Member

I am happy to take on the maintenance burden of pushing updates to the llvm versions for static linking if that helps unblock this.

@beckermr
Copy link
Member

beckermr commented Feb 25, 2025

OK friends. I have another idea on how to ease the maintenance burden of static linking so we can unblock this PR.

I have added a new feature to the bot in this PR: regro/cf-scripts#3755.

It allows the bot to update static libs according to an abstract spec as follows.

In the extra section you add this bit of yaml

extra:
  static_linking_host_requirements:
    - llvmdev 15.*
    - llvm 15.*

This specifies the abstract requirements for the static library you want in host.

Then in your recipe, you list the exact packages you care about like this

requirements:
  host:
    - python
    - setuptools
    - llvmdev 15.0.7 h2621b3d_4  # [osx and arm64]
    - llvm 15.0.7 h4a7a88c_4     # [osx and arm64]
    - llvmdev 15.0.7 hbedff68_4  # [osx and x86]
    - llvm 15.0.7 hed0f868_4     # [osx and x86]

The bot then does the following computation:

  1. For each abstract static lib host requirement in the extra section
  • Get the latest version for all platforms the feedstock builds
  • If the latest versions+build numbers across all platforms differ, bail on whole update.
  1. Extract the specs of packages in host in the current recipe that
  • match the abstract host requirements in the recipe per platform
  • are exact specs
  1. If any versions and/or build numbers at equal version have increased when comparing 1 and 2
  • update the static libs in the recipe
  • issue a PR

The result of this is an update to the host section like this:

requirements:
  host:
    - python
    - setuptools
    - llvmdev 15.0.7 h4429f82_5  # [osx and arm64]
    - llvm 15.0.7 h0cf516b_5     # [osx and arm64]
    - llvmdev 15.0.7 hc29ff6c_5  # [osx and x86]
    - llvm 15.0.7 hb21d583_5     # [osx and x86]
    - zlib
    - vs2015_runtime  # [win]

The bot stores the new static lib versions it used to update the feedstock as part of the PR info it has. This should prevent it from issuing duplicate PRs.

Further, by bailing if not all of the versions+build numbers of the new static libs match, we should prevent PRs being issued in the middle of a build of the static libs on the backend.

Finally, by restricting the search for updates to the static libs to the abstract specs in extra, the bot will only issue an update for increases in minor+patch versions and/or build numbers in the say 15.* series for llvm, etc.

If this is of interest to you all, let me know and I can finish up the bot PR, make a PR to this feedstock, and we can start trying it out.

cc @isuruf @h-vetinari @jakirkham @conda-forge/llvmlite

@h-vetinari
Copy link
Member

Thank you for trying to find a solution on this @beckermr! I'm fine with whatever setup that lets us get rid of the segfaults. To me the static_linking_host_requirements: machinery sounds a bit like overkill, but if it addresses the maintenance concerns here, then I'm happy to go in that direction.

@isuruf
Copy link
Member

isuruf commented Feb 26, 2025

Let's move it to bot section of conda-forge.yml instead of keeping it in meta.yaml.

Can we also use something like

    - llvmdev 15.0.7 *_5
    - llvm 15.0.7 *_5

instead to make it human updateable too?

@beckermr
Copy link
Member

I'm not in favor of moving it to the bot section because then the recipe is not self-contained.

Happy to make the changes on the hash.

@isuruf
Copy link
Member

isuruf commented Feb 26, 2025

I'm not in favor of moving it to the bot section because then the recipe is not self-contained.

What do you mean by this?

@beckermr
Copy link
Member

I mean when people look at the recipe, they won't see any hints/info on what exactly is going on in host if the abstract requirements are in a entirely separate file. At least by having the abstract requirements in the extra section of the same file, there is a clue that something different is going on. Also, that section gets imbedded in the final artifact which has additional advantages.

@isuruf
Copy link
Member

isuruf commented Feb 26, 2025

You'd already have

host:
    - llvmdev 15.0.7 *_5
    - llvm 15.0.7 *_5

which tells anyone looking at the recipe what is going on.
The bot would come and bump the value from *_5 to *_6.

@beckermr
Copy link
Member

beckermr commented Feb 26, 2025

No it does not. There is missing information as follows.

The spec llvmdev 15.0.7 *_5 could be the result of many other specs one might list in static_linking_host_requirements, eg

  • llvdev <17
  • llvmdev >12,<18
  • llvmdev 15.0.*

The contents in static_linking_host_requirements are what would be in the recipe if we were to not exactly pin the static host requirement and instead let conda solve for it. It thus belongs in the recipe.

@beckermr
Copy link
Member

Globs in the hash part of the build string are now working. The bot will use a glob if the recipe previously had one. Otherwise it will use the exact pin.

@isuruf
Copy link
Member

isuruf commented Feb 26, 2025

We don't even have the pinnings from conda-forge-pinning in the recipe, so I don't see why we need to have it in the recipe explicitly here.

@isuruf
Copy link
Member

isuruf commented Feb 26, 2025

We could also have

host:
- llvmdev {{ llvm_15_static }}

and have a migrator. This should make it work just like other migrators.

@beckermr
Copy link
Member

We don't even have the pinnings from conda-forge-pinning in the recipe, so I don't see why we need to have it in the recipe explicitly here.

It goes beyond this.

With only the semi-exact pin (e.g., llvm 15.0.7 *_6), we don't know which updates are allowed and which ones are not. Can we update llvm 15.0.7 *_6 to llvm 19.0.0 <build>? Can we update llvm 15.0.7 *_6 to llvm 15.1.0 <build>?

Assuming only the major version matters works for this feedstock, but won't work in general.

have a migrator

The migrator idea suffers from the same issue. The pinnings would need to list the exact package spec (eg llvm 15.0.7 *_6) on a given platform. In order for the bot to issue a PR to the pinnings repo, it needs to again know what updates are allowed to this exact pin.

For ABI migrations, the bot checks if the latest version of the package is outside the version range of the current pin. That won't work here since we want updates within a specific version series.

So TL;DR; without something specifying that we want the latest package within llvm 15.*, we'd have to make an assumption (i.e., only update within the major version) or allow any updates (e.g., move to the latest always). That last thing would not work for this feedstock since llvmlite needs llvm 15, llvmdev 15 etc.

@beckermr
Copy link
Member

Here is another idea for the API. We put a key in the conda-forge.yml like this

bot:
  update_static_libs: true

and then in host we list both the "abstract" and "pinned" requirement

requirements:
  host:
    - llvm 15.0.*               # "abstract" requirement since no build string
    - llvm 15.0.* blah_*        # "abstract" requirement since no build number
    - llvm 15.0.7 blah_*_5      # current pin with no hash
    - llvm 15.0.7 blah_h4541_5  # or a current pin w/ an exact hash

The bot would know the difference between the "abstract" versus "pinned" requirement by the presence or absence of a full build string and/or a build number at the end of the build string.

The trade-off here is between being explicit versus laundering information in the extra section. Neither is perfect, but we might prefer one over the other.

@isuruf
Copy link
Member

isuruf commented Feb 26, 2025

Here is another idea for the API

I like this idea.

@beckermr
Copy link
Member

Ok. The bot is ready, pending review and merge of the PR.

Shall we proceed?

@beckermr
Copy link
Member

@conda-forge-admin rerender

@beckermr
Copy link
Member

@conda-forge-admin rerender

@conda-forge-admin
Copy link
Contributor

conda-forge-admin commented Feb 27, 2025

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • ❌ In conda-forge.yml: $.bot = {'update_static_libs': True, 'abi_migration_branches': ['rc']}.

    {'update_static_libs': True, 'abi_migration_branches': ['rc']} is not valid under any of the given schemas

    Schema
    {
      "anyOf": [
        {
          "$ref": "#/$defs/BotConfig"
        },
        {
          "type": "null"
        }
      ]
    }

This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13569596802. Examine the logs at this URL for more detail.

@beckermr
Copy link
Member

@isuruf ppc64le is not statically linked in these builds due to some error @timostrunk had when they tried above.

@beckermr
Copy link
Member

PR to fix the linter: conda-forge/conda-smithy#2253

@beckermr
Copy link
Member

OK. This one is all green or will be soon. I do not want to merge when another maintainer has requested changes.

@jakirkham Can you look at what we've done here and reconsider your review possibly? We've enabled fully automatic updates via the bot and I've added myself to the feedstock to manage things around that as I expect we'll encounter a few bugs along the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linking libllvm-19 before using libllvmlite (currently requiring libllvm-15) leads to segmentation faults.
6 participants