Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/data generation automl #764

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

YGMaerz
Copy link

@YGMaerz YGMaerz commented Jan 19, 2025

Merge branch 'feature/data-generation-automl' into main

Introduce Data Generation and Transformation Components for Darcy AutoML Active Learning

This merge integrates the comprehensive data generation and transformation functionalities developed in the
'feature/data-generation-automl' branch into the main branch. The key contributions include:

  • Data Descriptors and Data Generator: Established structured data descriptors and a generator
    to create datasets tailored for various machine learning models within the Darcy AutoML active learning
    framework.

  • OntologyEngine: Developed an engine to analyze and determine the necessary data transformations
    based on model-specific requirements, ensuring data compatibility and consistency across different stages.

  • OntologyTransformationEngine: Implemented a transformation engine that executes the
    suggested data transformations, handling tasks such as copying files, managing subfolders, and
    preparing data for preprocessing and feature preparation stages.

  • Notebook Cells for Data Generation and Processing: Created Jupyter notebook cells that
    facilitate the generation of transformation plans and the subsequent processing of data through
    various pipeline stages, enhancing reproducibility and ease of experimentation.

  • ModelRegistry with Model Descriptors: Established a registry to manage and maintain model descriptors
    for different machine learning models (e.g., FNO, AFNO), enabling seamless integration and selection
    of models within the AutoML pipeline.

These enhancements collectively streamline the data preprocessing workflow, support multiple model types,
and lay the foundation for scalable and efficient active learning processes in the Darcy AutoML project.

@YGMaerz YGMaerz closed this Jan 19, 2025
@YGMaerz YGMaerz reopened this Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant