[Feature Request] Investigate how robust are parameters and RMSE #64

Expertium · 2023-12-07T20:07:33Z

A lot of users have reported that parameters and RMSE can change significantly even after doing just a few dozen more reviews. I think a good way to investigate this is to do the following:

Choose a collection and make two (or more) collections out of it, for example, the original collection and the collection where 5% of the reviews have been removed.
Run FSRS on all of those collections.
See how much parameters and RMSE vary.

This should be fairly straightforward to implement. The problem is what comes next. If it's true that even a few more (or fewer) reviews can vastly change RMSE and parameters, the question is "How do we make it more robust?". I don't know a good answer. One way that I have proposed is to run the optimizer from different starting points. In other words, instead of running it from just one set of the default parameters, we use 3-5 different sets of default parameters. This will ensure that we explore the parameter space more thoroughly.

L-M-Sherlock · 2023-12-09T07:36:39Z

I investigated my collection.

The result of original collection:

{
    // Generated, Optimized anki deck settings
    "deckName": "collection-2022-09-18@13-21-58",// PLEASE CHANGE THIS TO THE DECKS PROPER NAME
    "w": [1.1142, 1.2747, 5.8932, 10.8999, 5.3952, 1.5761, 1.226, 0.0003, 1.5798, 0.174, 1.0238, 2.7551, 0.0135, 0.3058, 0.3951, 0.0005, 2.9805],
    "requestRetention": 0.8484867970566436,
    "maximumInterval": 36500,
},

Loss before training: 0.3061
Loss after training: 0.3022
R-squared: 0.9699
RMSE: 0.0132
MAE: 0.0045
[0.04236619 0.9547611 ]

The result of collection removed the latest 5% revlogs:

{
    // Generated, Optimized anki deck settings
    "deckName": "collection-2022-09-18@13-21-58",// PLEASE CHANGE THIS TO THE DECKS PROPER NAME
    "w": [1.1113, 1.2653, 5.8881, 10.7407, 5.2162, 1.6995, 1.3913, 0.0008, 1.6479, 0.2336, 1.044, 2.7165, 0.0151, 0.3141, 0.2914, 0.0, 2.9657],
    "requestRetention": 0.83465713201,
    "maximumInterval": 36500,
},

Loss before training: 0.3049
Loss after training: 0.3003
R-squared: 0.9829
RMSE: 0.0103
MAE: 0.0057
[-0.00923505  1.01181382]

Expertium · 2023-12-09T09:08:15Z

RMSE changed by about 30%, but in your case it's very low, so it's possible that even small changes will greatly affect it. In my experience with testing, collections low RMSE are more sensitive to changes.
I would suggest the following: do the same for at least a few hundred collections, ideally a few thousand. You'll have to develop an automated tool for that. For each collection, record RMSE(all reviews)/RMSE(95%), the ratio of RMSEs. Then report average ratio, weighted by n_reviews and by ln(n_reviews).

L-M-Sherlock · 2023-12-09T10:00:49Z

I run an initial test on the benchmark dataset. Here is the result:

Model: FSRS-rs
Total number of users: 104
Total number of reviews: 3013665
metric: LogLoss
FSRS-rs mean: 0.3968
metric: RMSE
FSRS-rs mean: 0.3345
metric: RMSE(bins)
FSRS-rs mean: 0.0926

Total number of users: 104
Total number of reviews: 2897255
metric: LogLoss
remove_5% mean: 0.3958
metric: RMSE
remove_5% mean: 0.3339
metric: RMSE(bins)
remove_5% mean: 0.0918

mean(RMSE(all reviews)/RMSE(95%)) = 1.017 (unweighted, because the size has been changed.)

Expertium · 2023-12-09T10:07:33Z

That seems quite stable. I wonder why people reported sudden changes in parameters and/or RMSE.
Also, question: if RMSE is lower before re-optimizing parameters and higher after, should the user keep old parameters (with lower RMSE, but based on fewer reviews) or new parameters (higher RMSE, more reviews)?

L-M-Sherlock · 2023-12-09T10:16:21Z

If they are evaluated by the same dataset, I recommend using the old parameters. The user could still evaluate the old parameters with the new data.

Expertium · 2023-12-09T10:18:05Z

No, I meant if the new dataset has a little bit more reviews, but higher RMSE.

L-M-Sherlock · 2023-12-09T10:21:31Z

It's incorrect to compare the metrics based on different dataset. If RMSE is lower before re-optimizing parameters for the old data and higher after for the new data, what about the RMSE of the old parameters for the new data? I guess it's higher than the new parameters.

Expertium · 2023-12-09T12:42:08Z

I know it's incorrect to compare them like that, but users have asked about this, and we need to give them some answer.

L-M-Sherlock · 2023-12-09T13:34:47Z

I'm afraid that it could be a bug. I need to fix it. However, I need to reproduce it at first. According above discussion and experiment, I haven't reproduced it successfully. So I hope the user could share the affected collection.

Androix777 · 2023-12-09T14:15:11Z

My RMSE changes a lot even after a few reviews.

A few thousand reviews ago I got parameters that show an RMSE of 2.8%. In all this time the quality of these parameters has only changed by a few hundredths of a percent, because the number of new reviews is very small.

I often try to optimize the parameters, but always got RMSE only higher than on my current parameters. On average the RMSE was equal to 4-9%. For example, I did an experiment now by optimizing the parameters 3 times in a row. Between optimizations I did 10 reviews each. I got RMSE 6.82, 7.58, 4.51. I didn't apply the new parameters, only looked at their RMSE.

Deck file. Github doesn't allow apkg upload, so I added it to zip. This is the only deck that is in the profile that I do optimizations for. I have 128,000 reviews in total.
[Deleted]

Anki version ⁨23.10.1

Expertium · 2023-12-09T21:45:10Z

@L-M-Sherlock I found a bug in the latest beta. "Evaluate" is bugged - it uses all reviews, not just reviews from the selected preset.

L-M-Sherlock · 2023-12-10T02:52:01Z

I found a bug in the latest beta. "Evaluate" is bugged - it uses all reviews, not just reviews from the selected preset.

Could you submit a detailed report for that? https://forums.ankiweb.net/

Expertium · 2023-12-10T03:09:48Z

I pinged Dae in the thread about 23.12 beta.

L-M-Sherlock · 2023-12-10T05:18:58Z

My RMSE changes a lot even after a few reviews.

I also did an experiment by optimizing the parameters 4 times in a row. Between optimizations I did 10 reviews each. Here is the result:

RMSE(bins): 3.97% -> 4.18% -> 3.9% -> 3.63%. I haven't reached RMSE=9%.

Androix777 · 2023-12-10T11:04:35Z

@L-M-Sherlock

I'm not sure what the issue might be, I think I usually get different results.

Then I can provide a more localized case. This time it's the whole collection, not just one deck, just in case.
[Deleted]

At the moment, the parameters there are:
0.1000, 0.2397, 0.8897, 10.9000, 5.3593, 0.8727, 0.5075, 0.0219, 1.1405, 0.1622, 0.5107, 2.3686, 0.0700, 0.3547, 0.6942, 0.6125, 6.1129
with which I get an RMSE of 2.8.

If I do the optimization right away in the collection that I sent without doing any review, I get the parameters:
0.1000, 0.2397, 0.8900, 10.9000, 5.2926, 0.7769, 0.5278, 0.0867, 1.1901, 0.2413, 0.5531, 2.3063, 0.0350, 0.3970, 0.6637, 0.4895, 5.2721
with an RMSE of 7.24.

P.S. I've now tried doing more optimizations, and it looks like the RMSE is now a bit more stable than usual, averaging values of 3-6%, which is still sometimes 2x worse than my current settings. I've never once been able to get a value higher than yesterday's 7.58, although I've made many more attempts. Apparently the stability of optimization can also vary.

After a few dozen optimization attempts, I was even able to get parameters slightly better than my current ones with 2.72% RMSE.

L-M-Sherlock · 2023-12-10T11:10:38Z

I'm not sure what the issue might be, I think I usually get different results.

Could you test it in the latest version? https://github.com/ankitects/anki/releases/tag/23.12beta2

I introduced a better outlier filter in that version.

Androix777 · 2023-12-10T11:54:10Z

@L-M-Sherlock

What difference in RMSE for the same data is considered normal?

My current best parameters now show an RMSE of 2.62%. I've tried a few dozen more optimizations, and I'm not quite sure of the results as I don't know if it's due to the new version or just because I'm doing more reviews. I got different RMSE with the minimum close to my best and the maximum up to 4.93%.

L-M-Sherlock · 2024-01-13T10:28:39Z

Done in https://github.com/open-spaced-repetition/fsrs-benchmark/blob/main/notebook/metric_over_time.ipynb

L-M-Sherlock added the enhancement New feature or request label Dec 9, 2023

L-M-Sherlock closed this as completed Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Investigate how robust are parameters and RMSE #64

[Feature Request] Investigate how robust are parameters and RMSE #64

Expertium commented Dec 7, 2023

L-M-Sherlock commented Dec 9, 2023

Expertium commented Dec 9, 2023

L-M-Sherlock commented Dec 9, 2023 •

edited

Loading

Expertium commented Dec 9, 2023

L-M-Sherlock commented Dec 9, 2023

Expertium commented Dec 9, 2023

L-M-Sherlock commented Dec 9, 2023 •

edited

Loading

Expertium commented Dec 9, 2023 •

edited

Loading

L-M-Sherlock commented Dec 9, 2023 •

edited

Loading

Androix777 commented Dec 9, 2023 •

edited

Loading

Expertium commented Dec 9, 2023

L-M-Sherlock commented Dec 10, 2023

Expertium commented Dec 10, 2023

L-M-Sherlock commented Dec 10, 2023

Androix777 commented Dec 10, 2023 •

edited

Loading

L-M-Sherlock commented Dec 10, 2023

Androix777 commented Dec 10, 2023 •

edited

Loading

L-M-Sherlock commented Jan 13, 2024

[Feature Request] Investigate how robust are parameters and RMSE #64

[Feature Request] Investigate how robust are parameters and RMSE #64

Comments

Expertium commented Dec 7, 2023

L-M-Sherlock commented Dec 9, 2023

Expertium commented Dec 9, 2023

L-M-Sherlock commented Dec 9, 2023 • edited Loading

Expertium commented Dec 9, 2023

L-M-Sherlock commented Dec 9, 2023

Expertium commented Dec 9, 2023

L-M-Sherlock commented Dec 9, 2023 • edited Loading

Expertium commented Dec 9, 2023 • edited Loading

L-M-Sherlock commented Dec 9, 2023 • edited Loading

Androix777 commented Dec 9, 2023 • edited Loading

Expertium commented Dec 9, 2023

L-M-Sherlock commented Dec 10, 2023

Expertium commented Dec 10, 2023

L-M-Sherlock commented Dec 10, 2023

Androix777 commented Dec 10, 2023 • edited Loading

L-M-Sherlock commented Dec 10, 2023

Androix777 commented Dec 10, 2023 • edited Loading

L-M-Sherlock commented Jan 13, 2024

L-M-Sherlock commented Dec 9, 2023 •

edited

Loading

L-M-Sherlock commented Dec 9, 2023 •

edited

Loading

Expertium commented Dec 9, 2023 •

edited

Loading

L-M-Sherlock commented Dec 9, 2023 •

edited

Loading

Androix777 commented Dec 9, 2023 •

edited

Loading

Androix777 commented Dec 10, 2023 •

edited

Loading

Androix777 commented Dec 10, 2023 •

edited

Loading