A fast and reliable downloader for Hugging Face models and datasets with intelligent optimization features that adapt to your system capabilities, network conditions, and specific needs.
HFDL incorporates several intelligent systems that work together to optimize your download experience:
- What it does: Automatically determines the optimal number of threads based on your CPU cores
- How it works:
- 1-2 CPU cores: Allocates 2 threads
- 3-8 cores: Uses a number of threads equal to the core count
- More than 8 cores: Caps at 8 threads to prevent overloading
- Why it's smart: Balances performance and resource usage without manual tuning
- What it does: Classifies files as "small" or "big" based on a configurable threshold (default: 100 MB)
- How it works:
- Small files: Downloaded quickly, often in parallel
- Big files: Handled with bandwidth control for efficient resource allocation
- Why it's smart: Optimizes download strategy based on file characteristics
- What it does: Measures your download speed and limits usage to a percentage (default: 95%)
- How it works:
- Measures initial speed with a sample file
- Allocates bandwidth across threads for large files
- Introduces micro-delays to maintain speed limits when needed
- Why it's smart: Prevents network saturation while maximizing throughput
- What it does: Ensures downloads can be safely interrupted without corrupted files
- How it works:
- Uses a dedicated thread for interrupt signals on multi-core systems
- Implements clean shutdown procedures for all resources
- Why it's smart: Provides reliability and responsiveness during long downloads
- What it does: Anticipates and manages a wide range of potential errors
- How it works:
- Implements custom exception hierarchy for precise error handling
- Provides fallback mechanisms and recovery strategies
- Why it's smart: Maintains operation even under adverse conditions
- What it does: Monitors and displays download progress at both file and overall levels
- How it works:
- Tracks bytes downloaded for each file
- Aggregates progress across all files for overall completion percentage
- Why it's smart: Provides real-time feedback with thread-safe accuracy
pip install hfdl
Or install from source:
git clone https://github.com/MubarakHAlketbi/hfdl.git
cd hfdl
pip install -e .
from hfdl import HFDownloader
# Basic usage
downloader = HFDownloader("MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF")
downloader.download()
# Enhanced mode with custom settings
downloader = HFDownloader(
"Anthropic/hh-rlhf",
repo_type="dataset",
enhanced_mode=True,
size_threshold_mb=100,
bandwidth_percentage=95
)
downloader.download()
The CLI has been reorganized for better usability, with options grouped into Basic, Advanced, and Output categories.
If you run hfdl
without arguments, it will enter interactive mode and guide you through the process:
# Start interactive mode
hfdl
# Basic usage
hfdl MaziyarPanahi/Qwen2.5-7B-Instruct-GGUF
# Advanced mode with optimized downloading
hfdl Anthropic/hh-rlhf --optimize-download
# Custom threads and directory
hfdl Anthropic/hh-rlhf --threads 4 --directory ./models
# Test what would be downloaded without downloading
hfdl Anthropic/hh-rlhf --dry-run
Basic Options:
-d, --directory DIR Directory where files will be saved
-r, --repo-type TYPE Type of repository (model/dataset/space)
--verify Verify integrity of downloaded files
--force Force fresh download, overwriting existing files
--no-resume Disable download resuming
Advanced Options:
--optimize-download Enable optimized downloading with size-based
categorization and bandwidth control
-t, --threads NUM Number of download threads (auto: optimal based on
CPU cores, or specify a positive number)
--size-threshold MB Files larger than this size will use bandwidth control
--bandwidth PERCENT Percentage of measured bandwidth to use
--measure-time SECS Duration to measure initial download speed
Output Options:
--quiet Suppress all output except errors
--verbose Show detailed progress and debug information
--dry-run Show what would be downloaded without downloading
HFDL is designed to work seamlessly across different operating systems:
- Windows, macOS, and Linux support
- Path sanitization to handle OS-specific filename restrictions
- Adaptive file handling that respects platform limitations
HFDL provides comprehensive error handling and recovery mechanisms:
-
Download Errors:
HFDownloadError
: Base exception for all errorsThreadManagerError
: Thread-related errorsFileManagerError
: File operation errorsSpeedManagerError
: Speed control errors
-
Specific Errors:
FileSizeError
: File size calculation issuesFileTrackingError
: Progress tracking issuesSpeedMeasurementError
: Speed measurement issuesSpeedAllocationError
: Speed allocation issues
HFDL implements automatic error recovery:
- Network retry mechanisms for transient failures
- Resource cleanup to prevent leaks
- State recovery to resume interrupted operations
- Fallback to legacy mode when enhanced features encounter issues
- OS-specific path handling to prevent filename-related errors
All operations are thread-safe:
- Resource protection with proper locking mechanisms
- State consistency across concurrent operations
- Safe cleanup even during interruptions
- Error propagation to the appropriate handlers
- Thread-aware progress tracking
HFDL includes comprehensive test coverage:
# Install test dependencies
pip install pytest pytest-mock
# Run all tests
pytest hfdl/tests/
# Run specific test categories
pytest -v -k "error" hfdl/tests/ # Error handling tests
pytest -v -k "thread_safety" hfdl/tests/ # Thread safety tests
pytest -v hfdl/tests/test_downloader.py # Downloader tests
-
Unit Tests:
- Component functionality
- Error handling
- Input validation
- State management
-
Integration Tests:
- Component interaction
- Error propagation
- Resource management
- System behavior
-
Error Tests:
- Error scenarios
- Recovery mechanisms
- Resource cleanup
- State consistency
-
Thread Safety Tests:
- Concurrent operations
- Resource contention
- State consistency
- Error handling
- Fork the repository
- Create your feature branch
- Add tests for your changes
- Ensure all tests pass
- Submit a pull request
- Clone the repository:
git clone https://github.com/yourusername/hfdl.git
cd hfdl
- Create virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
- Install dependencies:
pip install -e ".[dev]"
- Run tests:
pytest hfdl/tests/
This project is licensed under the MIT License - see the LICENSE file for details.