Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About selective_scan #19

Open
ZK-Zhou opened this issue Feb 21, 2024 · 4 comments
Open

About selective_scan #19

ZK-Zhou opened this issue Feb 21, 2024 · 4 comments

Comments

@ZK-Zhou
Copy link

ZK-Zhou commented Feb 21, 2024

Hi, great work!
Could you please explain why in selective_scan the "x = torch.zeros((b, d_in, n), device=deltaA.device)"?
In addition, I am confusing on u and x.

Thanks.

@Ykiiii
Copy link

Ykiiii commented Jun 20, 2024

I have the same question as you!
Why reset “x” to 0 in the selective_scan function?

About u and x, here is my understanding
in selective_scan function show as
x(t + 1) = Ax(t) + Bu(t)
y(t) = Cx(t) + Du(t)
here “u” is incoming x,
here “x” is hidden variable,it can be understood as h

@ZhangXG001
Copy link

"x = torch.zeros((b, d_in, n), device=deltaA.device)" is out of the loop, I think it is init the hidden state with 0(x = torch.zeros((b, d_in, n), device=deltaA.device)) @Ykiiii @ZK-Zhou

@Ykiiii
Copy link

Ykiiii commented Sep 19, 2024

It doesn't jump out of the mamba training loop. The hidden state being initialized to 0 when training each batch, just like RNN.
My confusion is, when training with long time series, why not continue using the hidden state. It's more in line with the idea of state-space equation, isn't it?
@ZhangXG001

@shawnnjupt
Copy link

I think x shoulden't be 0 , for long time series, this should be hidden state to be stored

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants