-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add percentiles? #156
Comments
What about plotting the distribution of the metrics? Something like box plot or violin plot. |
Sure, I'll work on that. |
Yeah, the table has too many numbers already. Distribution plots would be nicer. |
Hell nah, not the |
Yeah don't worry about the specifics like labels until a PR comes in. |
Well done on the regularization! Now FSRS-5-recency is stronger on more collections than GRU-P.
Interestingly, GRU-P still does better when we take an average. Maybe GRU-P mostly does better on users that are not representative of the usual anki user, and for those users GRU-P somehow does much better.
I added 0th, 25th, 50th, 75th, 100th percentiles to evaluate.py to check.
Model: GRU-P
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
GRU-P LogLoss (mean±std): 0.3251±0.1508
GRU-P LogLoss quantiles: [0.0001, 0.2202, 0.331, 0.4321, 1.2325]
GRU-P RMSE(bins) (mean±std): 0.0433±0.0288
GRU-P RMSE(bins) quantiles: [0.0001, 0.0252, 0.0366, 0.0533, 0.6223]
GRU-P AUC (mean±std): 0.6991±0.0812
GRU-P AUC quantiles: [0.0044, 0.6615, 0.7002, 0.7398, 0.9938]
Model: FSRS-5-recency
Total number of users: 9999
Total number of reviews: 349923850
Weighted average by reviews:
FSRS-5-recency LogLoss (mean±std): 0.3256±0.1519
FSRS-5-recency LogLoss quantiles: [0.0007, 0.2192, 0.3305, 0.4325, 1.2199]
FSRS-5-recency RMSE(bins) (mean±std): 0.0493±0.0321
FSRS-5-recency RMSE(bins) quantiles: [0.0012, 0.0289, 0.0421, 0.0606, 0.4159]
FSRS-5-recency AUC (mean±std): 0.7056±0.0755
FSRS-5-recency AUC quantiles: [0.0021, 0.6664, 0.705, 0.7488, 0.9979]
Here's a version where I use 11 equally spaced percentiles instead for weighted by reviews:
GRU-P:
[0.0001, 0.1156, 0.1945, 0.247, 0.2881, 0.331, 0.3673, 0.4099, 0.4591, 0.5227, 1.2325]
FSRS-5-recency:
[0.0007, 0.1151, 0.1932, 0.247, 0.2892, 0.3305, 0.3677, 0.4108, 0.4569, 0.5268, 1.2199]
Should we add some of this information to the table? At least the median?
And if we want this to be included in evaluate.py then let me know if the information should be formatted differently.
The text was updated successfully, but these errors were encountered: