-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential issue with Flux 0.58 or the 0.57 python bindings installed with pip #9
Comments
I think for 0.58.0 flux you'd need 0.58.0 bindings right? Would you like me to build them? |
Also if I remember, the latest that maestro has is pretty old (0.49.0) so that would be something to update too (I'm working on mini-mummi that uses maestro under the hood, although not flux anymore)! https://github.com/LLNL/maestrowf/tree/develop/maestrowf/interfaces/script/_flux |
okay give 0.58.0 a shot. https://pypi.org/project/flux-python/ You can sanity check what flux sees with |
hmph, still get the same issue with the 0.58 bindings. And to be clear, I'm not doing this with maestro, rather just installing the binding and dropping into a python shell and attempting to import flux and getting that error. On maestro's adapter though, that's not really tightly coupled to flux's version. Mostly just tracking breaking changes with it like api tweaks and status codes/abbreviations that leak through into the python bits. Will see if that needs updating after I can verify the binding install on it's own. |
Can you please run flux version and see what it shows? I suspect your setup is wonky. |
Apologies, the vnc systems just booted me off for some updates, so will have to login in later this morning to get the flux version and env outputs for you. But, this isn't so much a particular environment issue as a system issue I think. This is using the system installs on the EAS machines that are flux native and getting python environments to talk to it, not bootstrapping flux brokers myself inside some other scheduler. The issue occurs on both rzvernal and tioga. I've been using python 3.10, but can retry with 3.9 in case that makes any difference. EDIT: forgot one potentially important detail. These python environments are all spawned off of the system python installs available via the module system (which has 3.9 and 3.10 available). |
The way to debug this is to absolutely check the .so that is being paired with the python bindings, and be absolutely sure they are in sync. That error message is the python cffi libraries saying "I don't know what that function is called" and my suspicion is that you still have a version mismatch. The next thing to debug would be Python version and cffi version, which could lead to something similar I think. |
Ok, here's the info: Flux version:
flux env
for the actual .so, how can i find that? (really, it looks like these python bindings are just failing to find that? -> exporting the above PYTHONPATH manually gets everything working again, though that definitely bypasses what the flux-python package is supposed to be doing so maybe a red-herring there..) |
You said you were using a Python 3.8, but in the above I see 3.6 is first on |
No, I've been using 3.10 the whole time. That Note, that python path is only known by the flux install. Unless I manually add it to get around this import bug that env var doesn't exist in my environment. Checked one of my old 3.9 env's with flux 0.51 bindings installed and that one doesn't export any pythonpath into my env either (looks like that's expected with virtualenv simply prepending things to $PATH instead). |
Would it be helpful to know that
Yeah, since Flux is a system package, we have to use the system Python by default, which on RHEL 8 for better or worse is Python 3.6. We do provide RPM subpackages for 3.8 and 3.9 since those exist as alternatives in RHEL 8, but there is no |
@jwhite242 that's why I think you have a mismatch of versions here. The error is telling us that it's expecting a function to exist that does not. If you had the correctly matched .so and flux-python you would not see that. |
Is that not the solution? flux-python can't control how your environment is setup, including PYTHONPATH and flux versions. Is there something specifically you are opening this issue to ask for flux-python to do? |
@grondo I would say it would be helpful to know that that was added as i often don't notic. But for an additional data point, these bindings have actually been working just fine for multiple versions of these standalone bindings up to the 0.58 that looks to have been installed last week, which is interesting. Forget when I switched to this package, maybe 0.51? This bug does happen on a fresh install in a newly created virtual environment, so the bindings shouldn't be suffering from just being old i think; just redid all of that from scratch this morning as well, on both 3.9 and 3.10 envs. @vsoch so, we used to do it that way, but these lovely pip installable bindings had removed the need for that up until last week. (seriously, have been loving being able to just pip install with these and having them just work, so thanks for making this standalone package) So, poking around in the flux-python setup.py, is the lib vs lib64 pathing an issue here (honestly can't remember if flux has been in usr/lib64 this whole time or not..)? The find_flux func in the pip installed bindings suggest it's returning /usr/ from the flux installed in /usr/bin/flux. then the custom_search is adding lib, not lib64 when it's looking for things. Also, when this package is installing is it actually compilng things under the covers and pip is just hiding all that work? and if so, is there maybe some default compilers in the way and messing things up? |
Well I do notice that the flux-python copy of Lines 1 to 10 in 3d413b9
I don't think this explains why things didn't break until now though, but maybe if we update these copies from flux-core things will work again? |
This issue probably needs to be transferred to flux-core - any changes that are needed can be made there and trickle down here. |
I meant to say the local copy of |
One other data point. Tried out one of the other native machines: corona, with flux 0.57 fails the same way. So definitely not isolated to 0.58 and that was likely just timing coincidence that i ran into the issue this past week on that version. |
This is a flux-python issue. The copy of |
Flux python builds directly from the code there, the only thing kept here are the wrappers around that. |
Thanks! Where is the stuff in src coming from? |
And I admit I'm confused, the |
I missed this comment from a while back: #9 (comment) Is that file explicitly in flux-python here, and we need to update it? Is there a way to programmatically determine what should be there? |
I'm not sure as I have no idea how flux-python works. The files here: https://github.com/flux-framework/flux-python/tree/main/src Seem to have been copied from flux-core at some point (definitely before v0.53.0). Whenever they are changed in flux-core, I would think they would need to be updated here. Some of the files are generated during the build, but |
Can you point me to callbacks.h? We likely just need to copy it from flux-core upstream (and I didn't know that). |
Ah it's just part of the python dist: https://github.com/flux-framework/flux-core/blob/0a8e91a532566ff8b3d6e6c53570e6cf86e1a338/src/bindings/python/_flux/callbacks.h#L4 |
I think what needs to happen is that if anything changes in that directory, someone pings me because it warrants an update here. The assumption for the builds here is that the contents of that directory don't change. |
okay I opened a PR to update it - what I can do after that is a release candidate / patch version for the current flux python for someone to test. #10 |
@jwhite242 when that gets merged and I build you bindings to test, do you have preference for a version? |
Awesome, thanks for the fix! For the versions, could a couple be built to blow away what's on pypi? I've had users reporting this on machines with 0.58 and 0.59 installed (0.58 being the toss4_cray variants for the flux native systems, and 0.59 on all the other slurm managed machines). Looks like corona got updated too, so 0.57 seems a non-issue now? at least i'm not aware of any other instances of it. |
Yes but please first test https://pypi.org/project/flux-python/0.59.0rc0/. |
sounds good. will test that out shortly |
alright, looks like that works! import succeeded, and was able to use the bindings to run some jobs |
okay I think what I can do is rebuild a final release for 0.59.0, and then for 58 just do a 0.58.1 |
sounds great. thanks so much! |
yep! Just kicked off the builds. |
okay these should be good to go - https://pypi.org/project/flux-python/#history. Let me know if there are any other issues or if we are good to close here. |
Ran into some import issues with the latest flux 0.58 install (the public systems that are flux native at llnl), and the 0.57 python bindings from pypi: it seems the python bindings install isn't quite finding the right things? Importing flux fails with a missing function in the c-python bindings (stack trace below, but doesn't seem like that function's a particular problme, just the first to get hit):
Manually pulling the PYTHONPATH from
flux env
and setting it in the current shell enables things to work again, including submitting/querying jobs from python.edit: fix code snippet..
The text was updated successfully, but these errors were encountered: