Add percentiles? #156

1DWalker · 2025-01-18T17:44:25Z

Well done on the regularization! Now FSRS-5-recency is stronger on more collections than GRU-P.
Interestingly, GRU-P still does better when we take an average. Maybe GRU-P mostly does better on users that are not representative of the usual anki user, and for those users GRU-P somehow does much better.

I added 0th, 25th, 50th, 75th, 100th percentiles to evaluate.py to check.

Model: GRU-P
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
GRU-P LogLoss (mean±std): 0.3251±0.1508
GRU-P LogLoss quantiles: [0.0001, 0.2202, 0.331, 0.4321, 1.2325]
GRU-P RMSE(bins) (mean±std): 0.0433±0.0288
GRU-P RMSE(bins) quantiles: [0.0001, 0.0252, 0.0366, 0.0533, 0.6223]
GRU-P AUC (mean±std): 0.6991±0.0812
GRU-P AUC quantiles: [0.0044, 0.6615, 0.7002, 0.7398, 0.9938]

Model: FSRS-5-recency
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5-recency LogLoss (mean±std): 0.3256±0.1519
FSRS-5-recency LogLoss quantiles: [0.0007, 0.2192, 0.3305, 0.4325, 1.2199]
FSRS-5-recency RMSE(bins) (mean±std): 0.0493±0.0321
FSRS-5-recency RMSE(bins) quantiles: [0.0012, 0.0289, 0.0421, 0.0606, 0.4159]
FSRS-5-recency AUC (mean±std): 0.7056±0.0755
FSRS-5-recency AUC quantiles: [0.0021, 0.6664, 0.705, 0.7488, 0.9979]

Here's a version where I use 11 equally spaced percentiles instead for weighted by reviews:
GRU-P:
[0.0001, 0.1156, 0.1945, 0.247, 0.2881, 0.331, 0.3673, 0.4099, 0.4591, 0.5227, 1.2325]
FSRS-5-recency:
[0.0007, 0.1151, 0.1932, 0.247, 0.2892, 0.3305, 0.3677, 0.4108, 0.4569, 0.5268, 1.2199]

Should we add some of this information to the table? At least the median?
And if we want this to be included in evaluate.py then let me know if the information should be formatted differently.

L-M-Sherlock · 2025-01-18T17:56:01Z

What about plotting the distribution of the metrics? Something like box plot or violin plot.

1DWalker · 2025-01-18T18:12:14Z

Sure, I'll work on that.

Expertium · 2025-01-18T18:41:50Z

Yeah, the table has too many numbers already. Distribution plots would be nicer.

1DWalker · 2025-01-18T19:30:19Z

GRU-P-short (blue) vs FSRS-5-recency on
Log loss:

RMSE (bins):

I might look into plotting the difference of the two models so that percentile trends will be more apparent.

Expertium · 2025-01-18T20:14:13Z

Hell nah, not the ~~vagina~~ violin plots. Just make histograms. Also, it's not clear which color represents which algorithm. I mean, you said it, but it should be labeled on the image itself.

1DWalker · 2025-01-18T21:00:31Z

Yeah don't worry about the specifics like labels until a PR comes in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add percentiles? #156

Add percentiles? #156

1DWalker commented Jan 18, 2025

L-M-Sherlock commented Jan 18, 2025 •

edited

Loading

1DWalker commented Jan 18, 2025

Expertium commented Jan 18, 2025

1DWalker commented Jan 18, 2025

Expertium commented Jan 18, 2025 •

edited

Loading

1DWalker commented Jan 18, 2025

Add percentiles? #156

Add percentiles? #156

Comments

1DWalker commented Jan 18, 2025

L-M-Sherlock commented Jan 18, 2025 • edited Loading

1DWalker commented Jan 18, 2025

Expertium commented Jan 18, 2025

1DWalker commented Jan 18, 2025

Expertium commented Jan 18, 2025 • edited Loading

1DWalker commented Jan 18, 2025

L-M-Sherlock commented Jan 18, 2025 •

edited

Loading

Expertium commented Jan 18, 2025 •

edited

Loading