Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Investigate how robust are parameters and RMSE #64

Closed
Expertium opened this issue Dec 7, 2023 · 18 comments
Closed

[Feature Request] Investigate how robust are parameters and RMSE #64

Expertium opened this issue Dec 7, 2023 · 18 comments
Labels
enhancement New feature or request

Comments

@Expertium
Copy link
Contributor

A lot of users have reported that parameters and RMSE can change significantly even after doing just a few dozen more reviews. I think a good way to investigate this is to do the following:

  1. Choose a collection and make two (or more) collections out of it, for example, the original collection and the collection where 5% of the reviews have been removed.
  2. Run FSRS on all of those collections.
  3. See how much parameters and RMSE vary.

This should be fairly straightforward to implement. The problem is what comes next. If it's true that even a few more (or fewer) reviews can vastly change RMSE and parameters, the question is "How do we make it more robust?". I don't know a good answer. One way that I have proposed is to run the optimizer from different starting points. In other words, instead of running it from just one set of the default parameters, we use 3-5 different sets of default parameters. This will ensure that we explore the parameter space more thoroughly.

@L-M-Sherlock L-M-Sherlock added the enhancement New feature or request label Dec 9, 2023
@L-M-Sherlock
Copy link
Member

I investigated my collection.

The result of original collection:

{
    // Generated, Optimized anki deck settings
    "deckName": "collection-2022-09-18@13-21-58",// PLEASE CHANGE THIS TO THE DECKS PROPER NAME
    "w": [1.1142, 1.2747, 5.8932, 10.8999, 5.3952, 1.5761, 1.226, 0.0003, 1.5798, 0.174, 1.0238, 2.7551, 0.0135, 0.3058, 0.3951, 0.0005, 2.9805],
    "requestRetention": 0.8484867970566436,
    "maximumInterval": 36500,
},

Loss before training: 0.3061
Loss after training: 0.3022
R-squared: 0.9699
RMSE: 0.0132
MAE: 0.0045
[0.04236619 0.9547611 ]

The result of collection removed the latest 5% revlogs:

{
    // Generated, Optimized anki deck settings
    "deckName": "collection-2022-09-18@13-21-58",// PLEASE CHANGE THIS TO THE DECKS PROPER NAME
    "w": [1.1113, 1.2653, 5.8881, 10.7407, 5.2162, 1.6995, 1.3913, 0.0008, 1.6479, 0.2336, 1.044, 2.7165, 0.0151, 0.3141, 0.2914, 0.0, 2.9657],
    "requestRetention": 0.83465713201,
    "maximumInterval": 36500,
},

Loss before training: 0.3049
Loss after training: 0.3003
R-squared: 0.9829
RMSE: 0.0103
MAE: 0.0057
[-0.00923505  1.01181382]

@Expertium
Copy link
Contributor Author

RMSE changed by about 30%, but in your case it's very low, so it's possible that even small changes will greatly affect it. In my experience with testing, collections low RMSE are more sensitive to changes.
I would suggest the following: do the same for at least a few hundred collections, ideally a few thousand. You'll have to develop an automated tool for that. For each collection, record RMSE(all reviews)/RMSE(95%), the ratio of RMSEs. Then report average ratio, weighted by n_reviews and by ln(n_reviews).

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Dec 9, 2023

I run an initial test on the benchmark dataset. Here is the result:

Model: FSRS-rs
Total number of users: 104
Total number of reviews: 3013665
metric: LogLoss
FSRS-rs mean: 0.3968
metric: RMSE
FSRS-rs mean: 0.3345
metric: RMSE(bins)
FSRS-rs mean: 0.0926
Total number of users: 104
Total number of reviews: 2897255
metric: LogLoss
remove_5% mean: 0.3958
metric: RMSE
remove_5% mean: 0.3339
metric: RMSE(bins)
remove_5% mean: 0.0918

mean(RMSE(all reviews)/RMSE(95%)) = 1.017 (unweighted, because the size has been changed.)

@Expertium
Copy link
Contributor Author

That seems quite stable. I wonder why people reported sudden changes in parameters and/or RMSE.
Also, question: if RMSE is lower before re-optimizing parameters and higher after, should the user keep old parameters (with lower RMSE, but based on fewer reviews) or new parameters (higher RMSE, more reviews)?

@L-M-Sherlock
Copy link
Member

If they are evaluated by the same dataset, I recommend using the old parameters. The user could still evaluate the old parameters with the new data.

@Expertium
Copy link
Contributor Author

No, I meant if the new dataset has a little bit more reviews, but higher RMSE.

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Dec 9, 2023

It's incorrect to compare the metrics based on different dataset. If RMSE is lower before re-optimizing parameters for the old data and higher after for the new data, what about the RMSE of the old parameters for the new data? I guess it's higher than the new parameters.

@Expertium
Copy link
Contributor Author

Expertium commented Dec 9, 2023

I know it's incorrect to compare them like that, but users have asked about this, and we need to give them some answer.

@L-M-Sherlock
Copy link
Member

L-M-Sherlock commented Dec 9, 2023

I'm afraid that it could be a bug. I need to fix it. However, I need to reproduce it at first. According above discussion and experiment, I haven't reproduced it successfully. So I hope the user could share the affected collection.

@Androix777
Copy link

Androix777 commented Dec 9, 2023

My RMSE changes a lot even after a few reviews.

A few thousand reviews ago I got parameters that show an RMSE of 2.8%. In all this time the quality of these parameters has only changed by a few hundredths of a percent, because the number of new reviews is very small.

I often try to optimize the parameters, but always got RMSE only higher than on my current parameters. On average the RMSE was equal to 4-9%. For example, I did an experiment now by optimizing the parameters 3 times in a row. Between optimizations I did 10 reviews each. I got RMSE 6.82, 7.58, 4.51. I didn't apply the new parameters, only looked at their RMSE.

Deck file. Github doesn't allow apkg upload, so I added it to zip. This is the only deck that is in the profile that I do optimizations for. I have 128,000 reviews in total.
[Deleted]

Anki version ⁨23.10.1

@Expertium
Copy link
Contributor Author

@L-M-Sherlock I found a bug in the latest beta. "Evaluate" is bugged - it uses all reviews, not just reviews from the selected preset.

@L-M-Sherlock
Copy link
Member

I found a bug in the latest beta. "Evaluate" is bugged - it uses all reviews, not just reviews from the selected preset.

Could you submit a detailed report for that? https://forums.ankiweb.net/

@Expertium
Copy link
Contributor Author

I pinged Dae in the thread about 23.12 beta.

@L-M-Sherlock
Copy link
Member

My RMSE changes a lot even after a few reviews.

I also did an experiment by optimizing the parameters 4 times in a row. Between optimizations I did 10 reviews each. Here is the result:

image

image

image

image

RMSE(bins): 3.97% -> 4.18% -> 3.9% -> 3.63%. I haven't reached RMSE=9%.

@Androix777
Copy link

Androix777 commented Dec 10, 2023

@L-M-Sherlock

I'm not sure what the issue might be, I think I usually get different results.

Then I can provide a more localized case. This time it's the whole collection, not just one deck, just in case.
[Deleted]

At the moment, the parameters there are:
0.1000, 0.2397, 0.8897, 10.9000, 5.3593, 0.8727, 0.5075, 0.0219, 1.1405, 0.1622, 0.5107, 2.3686, 0.0700, 0.3547, 0.6942, 0.6125, 6.1129
with which I get an RMSE of 2.8.
изображение

If I do the optimization right away in the collection that I sent without doing any review, I get the parameters:
0.1000, 0.2397, 0.8900, 10.9000, 5.2926, 0.7769, 0.5278, 0.0867, 1.1901, 0.2413, 0.5531, 2.3063, 0.0350, 0.3970, 0.6637, 0.4895, 5.2721
with an RMSE of 7.24.
изображение

P.S. I've now tried doing more optimizations, and it looks like the RMSE is now a bit more stable than usual, averaging values of 3-6%, which is still sometimes 2x worse than my current settings. I've never once been able to get a value higher than yesterday's 7.58, although I've made many more attempts. Apparently the stability of optimization can also vary.

After a few dozen optimization attempts, I was even able to get parameters slightly better than my current ones with 2.72% RMSE.

@L-M-Sherlock
Copy link
Member

I'm not sure what the issue might be, I think I usually get different results.

Could you test it in the latest version? https://github.com/ankitects/anki/releases/tag/23.12beta2

I introduced a better outlier filter in that version.

@Androix777
Copy link

Androix777 commented Dec 10, 2023

@L-M-Sherlock

What difference in RMSE for the same data is considered normal?

My current best parameters now show an RMSE of 2.62%. I've tried a few dozen more optimizations, and I'm not quite sure of the results as I don't know if it's due to the new version or just because I'm doing more reviews. I got different RMSE with the minimum close to my best and the maximum up to 4.93%.

@L-M-Sherlock
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants