-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Investigate how robust are parameters and RMSE #64
Comments
I investigated my collection. The result of original collection:
The result of collection removed the latest 5% revlogs:
|
RMSE changed by about 30%, but in your case it's very low, so it's possible that even small changes will greatly affect it. In my experience with testing, collections low RMSE are more sensitive to changes. |
I run an initial test on the benchmark dataset. Here is the result:
mean(RMSE(all reviews)/RMSE(95%)) = 1.017 (unweighted, because the size has been changed.) |
That seems quite stable. I wonder why people reported sudden changes in parameters and/or RMSE. |
If they are evaluated by the same dataset, I recommend using the old parameters. The user could still evaluate the old parameters with the new data. |
No, I meant if the new dataset has a little bit more reviews, but higher RMSE. |
It's incorrect to compare the metrics based on different dataset. If RMSE is lower before re-optimizing parameters for the old data and higher after for the new data, what about the RMSE of the old parameters for the new data? I guess it's higher than the new parameters. |
I know it's incorrect to compare them like that, but users have asked about this, and we need to give them some answer. |
I'm afraid that it could be a bug. I need to fix it. However, I need to reproduce it at first. According above discussion and experiment, I haven't reproduced it successfully. So I hope the user could share the affected collection. |
My RMSE changes a lot even after a few reviews. A few thousand reviews ago I got parameters that show an RMSE of 2.8%. In all this time the quality of these parameters has only changed by a few hundredths of a percent, because the number of new reviews is very small. I often try to optimize the parameters, but always got RMSE only higher than on my current parameters. On average the RMSE was equal to 4-9%. For example, I did an experiment now by optimizing the parameters 3 times in a row. Between optimizations I did 10 reviews each. I got RMSE 6.82, 7.58, 4.51. I didn't apply the new parameters, only looked at their RMSE. Deck file. Github doesn't allow apkg upload, so I added it to zip. This is the only deck that is in the profile that I do optimizations for. I have 128,000 reviews in total. Anki version 23.10.1 |
@L-M-Sherlock I found a bug in the latest beta. "Evaluate" is bugged - it uses all reviews, not just reviews from the selected preset. |
Could you submit a detailed report for that? https://forums.ankiweb.net/ |
I pinged Dae in the thread about 23.12 beta. |
Could you test it in the latest version? https://github.com/ankitects/anki/releases/tag/23.12beta2 I introduced a better outlier filter in that version. |
What difference in RMSE for the same data is considered normal? My current best parameters now show an RMSE of 2.62%. I've tried a few dozen more optimizations, and I'm not quite sure of the results as I don't know if it's due to the new version or just because I'm doing more reviews. I got different RMSE with the minimum close to my best and the maximum up to 4.93%. |
A lot of users have reported that parameters and RMSE can change significantly even after doing just a few dozen more reviews. I think a good way to investigate this is to do the following:
This should be fairly straightforward to implement. The problem is what comes next. If it's true that even a few more (or fewer) reviews can vastly change RMSE and parameters, the question is "How do we make it more robust?". I don't know a good answer. One way that I have proposed is to run the optimizer from different starting points. In other words, instead of running it from just one set of the default parameters, we use 3-5 different sets of default parameters. This will ensure that we explore the parameter space more thoroughly.
The text was updated successfully, but these errors were encountered: