Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Warning / Error when sigmoid activation functions are used #212

Open
nicogross opened this issue Feb 4, 2025 · 2 comments
Open

Add a Warning / Error when sigmoid activation functions are used #212

nicogross opened this issue Feb 4, 2025 · 2 comments
Labels
core Feature/bug concerning core functionality enhancement New feature or request

Comments

@nicogross
Copy link

The sigmoid activation function 1/(1+e^-x) is missing two properties which are required by LRP:
f(0) = 0, sign(f(-x)) = -1 (and sign(f(x)) = +1 which is true for the sigmoid function)

I tried using the simple LRP-0 rule on a network calculating sigmoid(2*(-1) + 1*1)) and got un-intuitive results.

@chr5tphr
Copy link
Owner

chr5tphr commented Feb 7, 2025

Hey Nico,

Thanks a lot for the issue. I think this makes sense. We could create another type to replace (Activation)[https://github.com/chr5tphr/zennit/blob/e5699aa7e6fb98bec67505af917d0a17cd81d3b5/src/zennit/types.py#L101] as something like SignSymmetricActivation and use that in the basic LRP rules to skip rather than skipping all activations. With #147, sigmoid would then use the gradient, but would raise a warning.

Just for reference, could you share your un-intuitive results here?

@chr5tphr chr5tphr added enhancement New feature or request core Feature/bug concerning core functionality labels Feb 7, 2025
@nicogross
Copy link
Author

nicogross commented Feb 7, 2025

Just a simple example:

f(x) = sigmoid( x1 w1+ x2 w2 ) = sigmoid(z1 + z2)
x1 = 2 and x2 = 1
w1 = -1 and w2 = 1
-> z1 = -2 and z2 = 1
f(x) = sigmoid( -2 +1 ) = sigmoid(-1) = 0.2689

x1 (or z1) pushes to a lower activation and x2 (or z2) pushes to a higher activation.
x1 should be assigned a negative or small relevance and x2 should be assigned a greater relevance, but:
R1 = 0.5379 an R2 = -0.2689

Maybe following paper can help to understand why:
WB-LRP (https://www.sciencedirect.com/science/article/abs/pii/S0031320324007076) pointed out, that DeepLIFT can be re-formulated as LRP, where the reference (pre-)activations are all set to be 0. This means, that for negative pre-activations, the point on the sigmoid y = sigma(x) is compared to the point (0,0), which obviously does not lie on the sigmoid curve. This leads to a negative slope (y-0)/(x-0) for x<0, y>0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Feature/bug concerning core functionality enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants