You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is more of a design consideration for future development. Our current pair generator tries to minimize the difference between scores when generating pairs. This actually enhances the ranking reliability when we have expert judges, as they are able to finely tell the difference between answers. But with untrained judges, it hurts ranking reliability as it's harder for them to tell answers apart.
For untrained judges, having a gap between the scores of two different answers makes it easier for them to tell the answers apart and also give 'incorrectly' judged answers more of a chance to climb back up. We should consider implementing this gap for our pair generator.
There's two additional factors for consideration due to the nature of ComPAIR as a learning tool rather than an assessment tool:
There might be more pedagogical benefit in having students try to distinguish between two very similar quality answers.
Even with this score gap, it's recommended that we have around 12-15 rounds of comparisons for a reliable ranking. This is far more comparisons than the usual 3 rounds that is ComPAIR's default.
So perhaps the size of the gap could be made configurable.
Thanks to Peter Thwaites (UCLouvain) for bringing this up and providing the papers below:
This is more of a design consideration for future development. Our current pair generator tries to minimize the difference between scores when generating pairs. This actually enhances the ranking reliability when we have expert judges, as they are able to finely tell the difference between answers. But with untrained judges, it hurts ranking reliability as it's harder for them to tell answers apart.
For untrained judges, having a gap between the scores of two different answers makes it easier for them to tell the answers apart and also give 'incorrectly' judged answers more of a chance to climb back up. We should consider implementing this gap for our pair generator.
There's two additional factors for consideration due to the nature of ComPAIR as a learning tool rather than an assessment tool:
So perhaps the size of the gap could be made configurable.
Thanks to Peter Thwaites (UCLouvain) for bringing this up and providing the papers below:
Paper 1 provides recommendations for the score gap size. Papers 2 & 3 details the issue with 'highly adaptive' pair generators like ComPAIR's.
The text was updated successfully, but these errors were encountered: