Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug of forward-mode AD when multi-output is needed #1925

Merged
merged 215 commits into from
Jan 3, 2025

Conversation

Jerry-Jzy
Copy link
Contributor

I find the bug from this case: https://github.com/lululxvi/deepxde/blob/master/examples/operator/stokes_aligned_pideeponet.py

when setting the num_output=3 and multi_output_strategy="independent", the output shape will be (batch size, # of coordinates, 3).
Then, the slice here is wrong, what we really want is slicing in the last dim.

Jerry-Jzy and others added 30 commits May 27, 2022 16:39
Update example of heat equation (lululxvi#706)
Add document for Lorenz inverse with exogenous input (lululxvi#709)
OperatorPredictor supports backends tensorflow.compat.v1, tensorflow,…
@lululxvi
Copy link
Owner

lululxvi commented Jan 3, 2025

Instead of doing reshape here, another solution is modifying dde.gradients to support the output shape of (batch 1, batch 2, dim). This might be a better solution.

@Jerry-Jzy
Copy link
Contributor Author

Instead of doing reshape here, another solution is modifying dde.gradients to support the output shape of (batch 1, batch 2, dim). This might be a better solution.

I think reshape in PDEOperatorCartesianProd is also a good solution. Since the problem is originally from the output structure difference between PI-DeepONet and PINN. After unifying the output structure, dde.gradients is a general-purpose module for gradent computation

@lululxvi
Copy link
Owner

lululxvi commented Jan 3, 2025

Can you test another problem?
change this line


to

return dy_t + dy_x + x[:, 0:1]

The reverse autodiff should run, but I am not sure about the current forward code.

@Jerry-Jzy
Copy link
Contributor Author

Can you test another problem? change this line

to

return dy_t + dy_x + x[:, 0:1]

The reverse autodiff should run, but I am not sure about the current forward code.

the forward-mode is not suitable for this case. The reason is not reshape. Since the forward-code is loop-free. so the shape gradient is same as the output shape. The reverse mode is doable because loop is needed then the shape of gradient is same as input shape. That's the exact difference between JVP and VJP. If we want this case working. we need to repeat the x num_func times.

@lululxvi
Copy link
Owner

lululxvi commented Jan 3, 2025

That is why I consider to modify dde.gradients to handle and keep 3D shape; in this way, we can use broadcast, so no repeating x is needed.

@Jerry-Jzy
Copy link
Contributor Author

That is why I consider to modify dde.gradients to keep 3D shape; in this way, we can use broadcast, so no repeating x is needed.

although 3D shape is kept, Still cannot use broadcast. since the (num_func, num_points) can not broadcast with (num_points, 1).

@lululxvi
Copy link
Owner

lululxvi commented Jan 3, 2025

(num_func, num_points, 1) can broadcast with (num_points, 1)

@lululxvi
Copy link
Owner

lululxvi commented Jan 3, 2025

return dy_t + dy_x + x[:, 0:1]


The reverse autodiff should run, but I am not sure about the current forward code.

The key question now is that if the current reshape approach can solve this problem without asking users to repeat x. If not, then we need to find out another way.

@lululxvi
Copy link
Owner

lululxvi commented Jan 3, 2025

Is current code working for single output? If so, let us make the code work for single output, and just raise an error for multi-outputs. Then we will return to multi-output in another PR.

@Jerry-Jzy
Copy link
Contributor Author

(num_func, num_points, 1) can broadcast with (num_points, 1)

If we uniformly reshape to 3D rather than 2D maybe can solve the problem of this case

Repository owner deleted a comment from Jerry-Jzy Jan 3, 2025
Repository owner deleted a comment from Jerry-Jzy Jan 3, 2025
@Jerry-Jzy Jerry-Jzy closed this Jan 3, 2025
@lululxvi
Copy link
Owner

lululxvi commented Jan 3, 2025

Is current code working for single output? If so, let us make the code work for single output, and just raise an error for multi-outputs. Then we will return to multi-output in another PR.

How about single output case?

@Jerry-Jzy
Copy link
Contributor Author

Is current code working for single output? If so, let us make the code work for single output, and just raise an error for multi-outputs. Then we will return to multi-output in another PR.

How about single output case?

single output works fine

@lululxvi
Copy link
Owner

lululxvi commented Jan 3, 2025

Then clean the code only for single output. We merge the single output code first. Or the current code in the main branch is good?

@Jerry-Jzy
Copy link
Contributor Author

Then clean the code only for single output. We merge the single output code first. Or the current code in the main branch is good?

I have rolled back the code

@Jerry-Jzy Jerry-Jzy reopened this Jan 3, 2025
@lululxvi lululxvi merged commit a3c677c into lululxvi:master Jan 3, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants