Analyze Results

The correctness of judgments along with the question frequencies can then be used to plot precision and ROC curves.

generate collated file from all systems:

themis analyze collate <qa-pairs.csv> <answers.system1.csv> <answers.system2.csv> <answers.system3.csv> --judgments <judgments.csv> > <collate_agree.csv>

here qa-pairs.csv is question frequency file generated by the 'question extract' command. answers.system1.csv ,answers.system2.csv , answers.system3.csv are answer file generated by querying respective systems(can be multiple files from multiple systems). Optional argument judgement.csv is Q&A pair judgments generated by the 'judge interpret' command. output is 'collate_agree.csv'.

This command collate system answer confidences and annotator judgments by question-answer pair. If annotation is not completed for whole or subset of the question list there might be possibility that judgement.csv will not be the subset of qa-pairs.csv. If multiple systems are being judged, there may be Q/A pairs in the judgements that don't appear in the system answers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyze Results

generate collated file from all systems:

Clone this wiki locally