DataInertia is a lightweight framework designed to optimize datasets for machine learning workflows. It streamlines data preprocessing, cleaning, feature engineering, and reporting, making it easy to build robust pipelines for any dataset.
- Preprocessing:
- Normalize and scale numeric features.
- Encode categorical variables (one-hot or label encoding).
- Handle missing values with imputation.
- Cleaning:
- Identify and remove duplicate rows.
- Detect and handle outliers (IQR or Z-score methods).
- Feature Engineering:
- Generate polynomial features and interaction terms.
- Scale features using Min-Max or Standard scaling.
- Pipelines:
- Build preprocessing pipelines.
- Seamlessly integrate with machine learning models.
- Reporting:
- Generate PDF summary reports.
- Visualize missing data with heatmaps.
- Create diagnostics files with dataset insights.
-
Install Dependencies:
pip install -r requirements.txt
-
Explore Examples: Run any example script to see the framework in action:
python examples/<example_file>.py # or python -m unittest discover -s tests