-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --json flag to print download information #5400
Conversation
json schema is of course entirely up for argument. I just went with what seemed reasonable. |
The joys of different dep lists between Python 2 and Python 3 🐍 |
fd30e28
to
9c6eb9e
Compare
9c6eb9e
to
ee6b47f
Compare
Ping. Would love to get feedback about this so I can assume or not if it will be accepted as I want to use it internally @ Facebook. So does @philipjameson :) 👏 |
@@ -222,13 +232,39 @@ def run(self, options, args): | |||
resolver.resolve(requirement_set) | |||
|
|||
downloaded = ' '.join([ | |||
req.name for req in requirement_set.successfully_downloaded | |||
req.name | |||
for req in requirement_set.successfully_downloaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change? I'm not too worried, but it's easier to review if you don't include unneeded cosmetic changes.
if options.json: | ||
details = self._get_download_details( | ||
resolver, requirement_set, options.download_dir) | ||
print(json.dumps(details)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about this - we never use raw "print" statements anywhere else in pip to generate output (see pip list
for a concrete example, which uses logger.info
). This PR should use the same mechanisms as the rest of pip does.
action='store_true', | ||
default=False, | ||
help=("Output information about affected packages " | ||
"in an easily parseable json format"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"affected"? Surely "downloaded", as this is only for pip download
? Also, describing JSON as "easily parseable" is unnecessary...
I've added some review comments. I'm not keen on this PR as a whole - the point you make about "acknowledging that it may not be 100% correct because pip" concerns me, as it's not clear whether the average user would appreciate the limitations. At a minimum, there should probably be some documentation explaining where the data might be wrong and why, so that people know when they can rely on it. I won't block this PR if one of the other pip committers feels it's worth merging, but I probably won't do anything further on it myself. |
Thanks for the comments @pfmoore, and I agree with most of them. I'll leave the cleanup to @philipjameson.
Documentation is a good point. I will work on getting some documentation up once this lands/is accepted. I will describe:
|
I'm perfectly OK with writing to |
As far as the printing to stdout, the reason that I wasn't using logger.info (and indeed added a flag to force info level to go to stderr instead) is because there are a number of other places that we write human readable text at info level (e.g. "Downloaded X (from Y)") that you don't want to end up on the same stream as data you're trying to parse. I'll see if I can find a decent way to setup the logger module to handle that split. |
ee6b47f
to
11703c5
Compare
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
I am a little short on time and I have some concerns here:
I understand that some of these are probably needed here but I don't really have the bandwidth to hash out the details here. |
@pradyunsg - I know you say you do not have time, but it's hard to action your comments to make the PR more acceptable. Can you or someone offer some suggestions. My questions/statements are:
Also, happy to meet in IRC tomorrow to chat this out to see if we can meet everyones concerns! I'll keep it open from 9am pacific time. |
11703c5
to
3aae39b
Compare
@dstufft Any chance you could take a look at this conceptually? Is there someone else that would be best to review this? (sorry, I'm not really familiar w/ pypa's structure). I'd rather not have to maintain a fork of pip if I can avoid it, but the current information that is returned is just not sufficient for the project I'm working on, nor is it for @cooperlees afaik. I'm more than open to getting things fixed up, throwing up whatever red flags / warnings / documentation you all want, etc, but I'm not sure there's anything super actionable here so far, and I don't even have an inkling of whether it'd be likely to be accepted. |
To clarify the organisational position, you need approval from at least one of the @pypa/pip-committers to get a PR merged. Both @pradyunsg and myself are pip committers and we've both expressed reservations. @dstufft is the only other commiter who's active at the moment (we're all volunteers, so other commitments affect everyone's availability). So while pinging @dstufft is the best option for getting another opinion, you're basically hoping he'll be happy enough with this to override the reservations of the other committers... I'll do my best to summarise the concerns raised by @pradyunsg and myself, and try to make them more actionable for you. But that doesn't constitute a commitment to merge is these are addressed - I remain unclear as to the actual real-world situation this is intended to address and why this is the best option (or even why this needs to be in pip and can't be handled by a 3rd party program).
So, for some concrete actions:
I appreciate that's a lot of work, and I'm perfectly OK if you feel that it's more than you're willing to do. There's probably a certain level of compromise that might be reached - if the use case is compelling enough and the risks of this blocking the new resolver work are low enough, we could consider getting something in now and refining it later, for example. But the starting point has to be an attempt to address at least some of the above concerns. |
Thanks @pfmoore for elaborating my comment correctly. :) Additionally, about my first and third points:
|
Thanks both of you for responding. I'll dig through that reply in a bit, but for clarification on the use case I'm working with: I'm looking at representing what pip resolves in a different polyglot build system that we use FB (http://buckbuild.com/). I'd like it so that a person can say "Oh, I want to generate build files for the equivalent of " If there's already a clean interface to do this, I can wrap this up some other way, but I guess it wasn't super clear that any of this resolution information was exposed anywhere. |
At the risk of sounding glib, it seems to me that you're trying to use write something that wants to resolve dependencies the same way as pip does, but rather than writing it yourself you want to reuse pip's logic. While I can see the reason you'd want to do that, it very definitely is something we don't want to support in pip - we don't expose a Python-accessible API for pip precisely for this reason. Some possible alternatives for you:
|
Yes, I absolutely want to reuse pips work because it's a bit mercurial, and I'd have to make sure I mirrored every change to an internal resolver in my code. What's the (rough) timeline on this new resolver? As far as just running the pip commands you mentioned, I don't see any way to determine where the source came from other than trying to parse out 'https' and hoping that's correct. I suppose I could do that and just fail if pip ever decides to change its output. This is necessary because just running 'pip wheel' is /suuuupppper/ non deterministic since it's reaching out to the internet. With the commands you gave, I don't think there's any way to actually replace pip install without rewriting parsing logic to find out actual dependencies. End of the day, I can do a bunch of hacks if I have to. This seems like a cumbersome way to have to use pip, but sure it can be done. |
Yep, and that's why we don't want to expose the internal workings of the current resolver, because then we wouldn't be able to change them.
No idea really. We'd like to get it in in one of the next couple of releases, but volunteer resources make that at best a wild guess. One thing that would help me better understand what you're trying to achieve here, is if you could demonstrate (in pretty-printed form!) the output you expected to see for a project reasonably complex dependencies - something like Jupyter, maybe. I could probably reverse engineer that myself, or apply your PR and look, but I'm not sure when I'll have time to do that. Maybe if I was clearer on what you want to see, it would alleviate some of my concerns. |
I'm a little confused on the resolver comment. If the exposed data to users is limited, I'm not quite sure where the difficulty in changing the backend would be unless the thought is that at some point the resolver wouldn't resolve dependency information or where it got it from? I think I'm missing a piece here. https://gist.github.com/philipjameson/1900c6b4c5b89c5b729ebc648c0bfa51 is an example for jupyter (it'd work the same for --no-binary, but jupyter was being a bit persnickety with one of its deps not working with setuptools properly o_O) |
OK, so:
You may have to duplicate some of the work pip did (querying PyPI, extracting metadata) but I don't see anything there that you can't get by introspecting the directory you downloaded the files to (assuming that directory was empty beforehand). Which doesn't mean that there's no possibility that it could be added to pip, just that the bar is higher if it's stuff you can get yourself already. And I'm still not clear what you'll be doing with the downloaded files once you get that information. You're using the JSON to feed into a system that seems to re-download everything, so are you just throwing away those downloads? If so, then making this an option of the doanload command seems pretty inappropriate... |
Nope, but it just seemed kind of handy if we're downloading anyways. Not super necessary, but it also reduces the need for globbing and making sure that I'm rewriting filenames correctly (e.g. jupyter-core gets rewritten to jupyter_core-.). I think @cooperlees also mentioned that he had run into some cases where the filenames weren't as expected.
No, but it reduces extra lookups and indirection. I hadn't seen the json API, so maybe this isn't quite as necessary.
Is this guaranteed to always be there for sdists? I could have sworn I had run into a case where it wasn't right, but I may be misremembering or running into the naming issue above.
This is kind of the big one for me. Other than the fact that it seems like there are several different ways to find the dependencies (looks like setup.py, setup.cfg as well as requrements files? I'll have to go dig in the pip code some more), it also means that I have to go write a dependency resolver when a perfectly good one already exists (and a better one will exist at some non-determined state in the future :) ). As far as downloading the files, part of it was to verify hashes and potentially do some introspection on them (e.g. finding binaries and exposing them as binary build rules within the build system), but it was mostly to force the dependency resolver into action. Honestly a |
Just setup.py egg_info and directly from wheel metadata.
If I'm remembering correctly, pip doesn't really try to guess the version of an sdist from the file name - it runs setup.py egg_info to get it. There's an open issue for this somewhere in this tracker that suggests it should. |
Paul makes a good point - pypi's json api should be a good way to get almost all this information. The only things that'll be missing is the dependency information for sdists, which can be computed by executing setup.py egg_info. |
While some of this stuff, yes, can be determined from the json api, the important part, the dependency resolver, is not available anywhere afaict. Parsing out files and writing this myself, rather than just having the thing that ran the resolver tell me what the right answer is seems like extra work that shouldn't really have multiple implementations, unless there's some other file that's not in the .egg-info directory that I'm missing. So right now the only way to do this, so far as I can tell, is (and I really hope I'm missing something):
Is that roughly the suggested pattern? |
Cross Referencing: #53. |
Stop after "Pull metadata for urls from the pypi json api", and use the content of the directory, which will be one wheel (or sdist, if you use You can then build the dependency tree by using the dependency data from the projects to decide what projects depend on each other, and the versions you have extracted to determine which precise versions pip decided to use. Note: I believe there have also been requests for a |
I think that's the issue I linked to above -- it doesn't directly mention |
@pradyunsg Thanks, I'd seen the discussions on that issue had moved on to |
Yep -- I could find #1631 (comment) and #924 (comment). |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
Closing this since it's bitrotten. Please file a new PR if there's still interest in this. |
This adds a --json flag to pip download. This prints out some basic information about packages resolved by pip download (acknowledging that it may not be 100% correct because pip) such as name, version, url and dependencies.
Also added a --log-stderr to force info messages to go to stderr so that machine readable output is not clobbered by human readable text.
Added and updated unit tests to make sure it worked. Sample run through jq here: https://gist.github.com/philipjameson/88580962e0fed1ad9ec75f1733c0a625
Fixes #5398