Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating pairs that are too close in score harms accurate final rankings #1040

Open
ionparticle opened this issue Oct 4, 2022 · 0 comments

Comments

@ionparticle
Copy link
Member

ionparticle commented Oct 4, 2022

This is more of a design consideration for future development. Our current pair generator tries to minimize the difference between scores when generating pairs. This actually enhances the ranking reliability when we have expert judges, as they are able to finely tell the difference between answers. But with untrained judges, it hurts ranking reliability as it's harder for them to tell answers apart.

For untrained judges, having a gap between the scores of two different answers makes it easier for them to tell the answers apart and also give 'incorrectly' judged answers more of a chance to climb back up. We should consider implementing this gap for our pair generator.

There's two additional factors for consideration due to the nature of ComPAIR as a learning tool rather than an assessment tool:

  1. There might be more pedagogical benefit in having students try to distinguish between two very similar quality answers.
  2. Even with this score gap, it's recommended that we have around 12-15 rounds of comparisons for a reliable ranking. This is far more comparisons than the usual 3 rounds that is ComPAIR's default.

So perhaps the size of the gap could be made configurable.

Thanks to Peter Thwaites (UCLouvain) for bringing this up and providing the papers below:

  1. Rangel-Smith and Lynch - 2018 - Addressing the issue of bias in the measurement of.pdf
  2. Bramley - 2015 - Investigating the reliability of adaptive comparat.pdf
  3. Bramley and Vitello - 2019 - The effect of adaptivity on the reliability coeffi.pdf

Paper 1 provides recommendations for the score gap size. Papers 2 & 3 details the issue with 'highly adaptive' pair generators like ComPAIR's.

@ionparticle ionparticle added this to the Future Versions milestone Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant