-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the median instead of the mean for recall costs and learn cost #107
Comments
Additionally, we should remove all times that are exactly equal to 0 before calculating the median. The reason is same as that given in I am requesting this again because I think that skipping the entries during median calculation needs to be done in a way different than during mean calculation. |
If I understand this function correctly, the recorded answer times would never be greater than this setting. So, do we need to consider it? |
I mean, if the values reach this limit, should we include them? |
During calculation of the median, the exact values of the lowest and highest values don't matter. So, I don't think that we need to remove the entries equal to the maximum limit. Rather, removing those entries would cause the median to become unexpectedly small. You might think that is contradictory to Expertium's suggestion of excluding answer times > 20 min. But, in that case, we are dealing with a specific situation where the user has set the maximum answer time to an unreasonably high value. |
I agree that all times equal to zero should be removed. As for "Maximum answer seconds", I think it's fine to keep capped values, but only of they don't exceed 20 minutes. |
@L-M-Sherlock just a reminder |
I have checked the code which needs to update today, and found that it's a little harder than my initial vision. The hardest part is the fsrs-optimizer/src/fsrs_optimizer/fsrs_optimizer.py Lines 1210 to 1215 in 17d6aef
I will re-design this part when I have more available time. |
I suggest replacing the mean with the median here, as the median is not sensitive to outliers. Relevant problem: https://forums.ankiweb.net/t/clarify-what-optimal-retention-means/42803/50?u=expertium
This could help mitigate such problems when, for example, the user went away to make dinner and, as a result, the review time ended up being orders of magnitude greater than usual, skewing the mean. And don't forget to modify how the learn cost is calculated as well, the median should be used for all time costs.
Additionally, to make the estimate even more robust (granted, the median is already robust), we can remove all times >20 minutes before calculating the median, since obviously nobody spends that much time per card.
The text was updated successfully, but these errors were encountered: