Skip to content

Analyze Results

DharmendraVaghela edited this page Jul 6, 2016 · 5 revisions

The correctness of judgments along with the question frequencies can then be used to plot precision and ROC curves.

generate collated file from all systems:

themis analyze collate <qa-pairs.csv> <answers.system1.csv> <answers.system2.csv> <answers.system3.csv> --judgments <judgments.csv> > <collate_agree.csv> 

here qa-pairs.csv is question frequency file generated by the 'question extract' command. answers.system1.csv ,answers.system2.csv , answers.system3.csv are answer file generated by querying respective systems(can be multiple files from multiple systems). Optional argument judgement.csv is Q&A pair judgments generated by the 'judge interpret' command. output is 'collate_agree.csv'.

This command collate system answer confidences and annotator judgments by question-answer pair. If annotation is not completed for whole or subset of the question list there might be possibility that judgement.csv will not be the subset of qa-pairs.csv. If multiple systems are being judged, there may be Q/A pairs in the judgements that don't appear in the system answers.

Clone this wiki locally