Proposal: Integrate Active Learning into Modulus as a Standard Feature #755

YGMaerz · 2025-01-10T21:39:24Z

YGMaerz
Jan 10, 2025

Hi everyone,

I’d like to open a discussion on whether and how to integrate active learning into NVIDIA Modulus as a more “standard” component. Currently, Modulus offers powerful PDE solvers and neural operator frameworks (like the FNO implementations), but does not include a built-in mechanism for iterative data acquisition informed by model uncertainty (a.k.a. active learning).

Background

Use Case: In many real-world PDE settings (e.g., Darcy Flow or other high-dimensional systems), running full-fidelity simulations can be costly. Active learning (AL) helps by identifying where the model is most uncertain and focusing additional simulation efforts on those inputs, reducing overall cost.
Lack of Built-In Support: Although users can manually script an AL loop (train → estimate uncertainty → pick new samples → retrain), there’s no off-the-shelf feature or integrated example in Modulus that demonstrates this workflow.
Community Interest: Given the push towards more data-efficient PDE surrogate modeling, I suspect other users might benefit from having a streamlined active learning example or an optional interface in Modulus.

Possible Approaches

Add a Single Example
- For instance, an active learning variant of the Darcy Flow FNO tutorial.
- Could demonstrate how to measure uncertainty (e.g., MC-Dropout, ensembles) and choose which points to simulate next.
General Toolkit
- A more general design to handle selection criteria (uncertainty-based or diversity-based), batch simulation scheduling, iterative retraining, etc.
- Possibly a new module or library that interacts seamlessly with Modulus’s PDE data pipelines.
Lightweight Integration
- Provide an “active learning loop” script/class that wraps around existing Modulus workflows.
- Keep it minimal—just the logic of:
  1. Evaluate model
  2. Sort by uncertainty
  3. Simulate top-K
  4. Update training set
  5. Retrain

Example (Pseudo-Code)

(Note: Purely illustrative and not tested.)

for round_idx in range(num_rounds):
    # 1. Generate or sample candidate inputs
    candidate_inputs = sample_candidate_permeability_fields()
    
    # 2. Estimate uncertainty (MC-Dropout or ensembles)
    uncertainties = []
    for cand in candidate_inputs:
        mean_pred, var_pred = estimate_uncertainty(model, cand)
        uncertainties.append(var_pred.mean().item())
    
    # 3. Pick top uncertain points
    selected = pick_top_k(candidate_inputs, uncertainties, k=samples_per_round)
    
    # 4. Run PDE solver for new data
    new_data = run_pde_solutions(selected)
    
    # 5. Add to dataset, retrain
    train_data += new_data
    train_model(model, train_data)

Extensibility for Different UQ Methods

One idea is to define a generic UncertaintyEstimator interface that can work with any neural operator (FNO, AFNO, etc.) and can be swapped out for different UQ approaches (e.g., MC-Dropout, ensembles, or a Bayesian library). This keeps the AL loop itself (the “Orchestrator”) relatively unchanged:

MC-Dropout: Insert dropout layers, run multiple forward passes in train() mode.
Ensembles: Keep multiple model instances, measure variance across them.
Bayesian: Potentially integrate external libraries like fortuna, as long as they expose a .forward(...) or similar.

By making the AL orchestrator agnostic to how uncertainty is computed, Modulus could offer a flexible path for advanced users to plug in new approaches with minimal friction.

Discussion Points

Scope: Should AL be showcased as a single example (e.g., Darcy Flow + MC-Dropout) or should it be a more general feature with an extensible interface?
UQ Methods: MC-Dropout is straightforward to add in PyTorch; ensembles might also be feasible if resources permit. Do we want a flexible interface to accommodate more advanced or custom methods?
Integration: Should this live in the main codebase (e.g., a new modulus.active_learning submodule) or start as a separate examples/active_learning folder?
Community Interest: Are PDE simulation costs a primary concern for most Modulus users, or is the typical usage more about smaller-scale PDEs?

I’d love to hear thoughts from the Modulus team and the community regarding:

Would a built-in or officially supported active learning feature benefit enough users?
If so, which approach (lightweight example vs. deeper integration) seems most appropriate?
Are there any known plans or partial implementations already in the works?

Thanks in advance for your insights!
Best,
Y Georg Maerz

(Again, the pseudo-code above is for demonstration only and is not tested. I’m happy to iterate or help contribute if there’s interest in making this more official.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Integrate Active Learning into Modulus as a Standard Feature #755

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Proposal: Integrate Active Learning into Modulus as a Standard Feature #755

YGMaerz Jan 10, 2025

Background

Possible Approaches

Example (Pseudo-Code)

Extensibility for Different UQ Methods

Discussion Points

Replies: 0 comments

YGMaerz
Jan 10, 2025