support for [optionally] using hf_transfer to download model #151
Replies: 6 comments 1 reply
-
We currently use snapshot_download for grabbing the files off HF, but as long as hf_transfer supports doing this: snapshot_download(model_name_or_path, allow_patterns="*.json", cache_dir=cache_dir, tqdm_class=Disabledtqdm) i.e. specifying the allowed/disallowed patterns, the cache directory, and modifying the tqdm behaviour, it'd be a drop-in replacement. I'll add it to the backlog. Ideally we'd use aria2 similar to KoboldAI. I also want to use their lazy tensor loader so maybe we can pair them in an implementation. |
Beta Was this translation helpful? Give feedback.
-
Yeah, that lazy loader is sweet. I used KoboldAI exclusively for so long that using anything else annoys me in that respect, heh. What is it actually doing differently, though? I mean, my guess is that it seems like "lazy" refers to something like "do the next thing before the previous thing is finished, but keep track of everything so it doesn't break", e.g. a "lazy unmount" is "unmount now and deal with vfs consequences after without breaking it". With KoboldAI, it happens that there's this code in KAI patches.py: Patch the Transformers loader to use aria2 and our shard tracking.
If I set HF_HUB_ENABLE_HF_TRANSFER=1, I surmise that this hook is never called because it's using hf_transfer instead. I noticed this when I had that variable set, went to download a model with KoboldAI, and noticed that, lo and behold, it was using hf_transfer instead. No progress bar, of course, but, whatever, it downloaded fine. I'm going to drop henk717 a note letting him know that actually works, as it might not have occurred to him that actually works. Plus, if people have hf_transfer enabled, they might not figure out that it actually is still downloading, as it presents no feedback, heh. Hf_transfer is really fast, but I'm not sure that it is actually faster than aria2c (seemed like it might be?). That tool has been around for a zillion years, and is my go-to (well, that or lftp) for downloading anything. Undoubtedly it must be more configurable, not to mention that it has the RPC interface and all that ... If you use a KAI-style patch, you could support both, of course. In fact, I think that's probably the way to go, because hf_transfer does not have any provisions for e.g. proxy support, which you may want. I'm now a big fan of this project. When I first skimmed the vllm documentation, I'd wondered why no one had done something like this with it, haha--so I looked and found it. |
Beta Was this translation helpful? Give feedback.
-
If hf_transfer supports revisions (i.e. branches), then I'd be more than happy to switch. Can probably emulate a download progress bar if it's not enabled for hf_transfer. |
Beta Was this translation helpful? Give feedback.
-
It does do that, because huggingface_hub does it. Near as I can tell, the only major caveat is that it doesn't support resumption. There is a discussion of the advantages and pitfalls here: Overall, I don't think there's really a downside for this case. I'm not sure that it's any better than using aria2c, and aria2c is certainly more featureful; it's just way lighter of a dependency and doesn't require patching anything, etc. |
Beta Was this translation helpful? Give feedback.
-
@BlairSadewitz I finally got around to trying this out, and it seems like we support it out of the box. Just install the export HF_HUB_ENABLE_HF_TRANSFER=1 The downloads will use hf_transfer. However, it breaks a lot of the functionalities, like filtering out the download formats (e.g. it downloads both safetensors and pytorch bins). |
Beta Was this translation helpful? Give feedback.
-
Oh, yeah, I figured that out a couple days ago. I simply never thought to actually try it, heh. It works with KoboldAI, too (if I disable the aria2c hook). I didn't know you couldn't filter downloads with it, but I hadn't tried. Also, it doesn't seem to handle lower speed downloads particularly well, but that could just be my impression based on incidental circumstances. Maybe it would be a good idea, then, to add the aria2c hook like KoboldAI does (?) |
Beta Was this translation helpful? Give feedback.
-
This is way faster than anything else I've used on machines with enough bandwidth (except perhaps using aria2c with the appropriate options)--so much faster I'm kinda at a loss to explain it, actually, but I'm positive it is. My impression (which could be mistaken) is that it only makes a huge difference when there's at least, oh, ~500MB/s (very rough guesstimate) bandwidth, but that's becoming increasingly common, I think.
Now, I don't really know python, but I took a glance at the code and can tell that you're doing more than just pulling files, so I don't know if this would be worth it for you. It would be convenient, though, not to have to use huggingface-cli separately. Thanks.
Beta Was this translation helpful? Give feedback.
All reactions