You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your work on MambaVision! I have been trying to reproduce the reported results, specifically for the model MambaVision-T and ran into some problems.
Here is the summary:
I ran 16 training runs, using up to 4 GPUs. To match the global batch size, I used a larger per-GPU batch size.
I experimented with different seeds, but even with the same seed, I observed fluctuations of ~0.1-0.2 percentage points in accuracy.
My highest achieved accuracy is 82.21%, whereas the reported result is 82.3%.
When validating using the provided model checkpoint with validate.sh, I get an accuracy of 82.244%, which does not round to 82.3%.
My environment:
Python 3.10.12
torch==2.5.1
timm==1.0.14
einops==0.8.0
transformers==4.48.1
causal-conv1d @ file:///causal-conv1d (using the newest commit as of this post: 82867a9)
mamba-ssm @ file:///mamba (using the newest commit as of this post: 0cce0fa)
Could you clarify if there are any additional details regarding the training setup or hyperparameters that might explain these discrepancies? Also, was any additional post-processing or averaging applied to obtain the reported accuracy?
The text was updated successfully, but these errors were encountered:
Hi,
Thank you for your work on MambaVision! I have been trying to reproduce the reported results, specifically for the model MambaVision-T and ran into some problems.
Here is the summary:
My environment:
Could you clarify if there are any additional details regarding the training setup or hyperparameters that might explain these discrepancies? Also, was any additional post-processing or averaging applied to obtain the reported accuracy?
The text was updated successfully, but these errors were encountered: