-
Notifications
You must be signed in to change notification settings - Fork 4
Orchestrations
To quantify theoretical upper limit which is composed of question-answering systems, oracle experiment can be performed. The purpose of performing this experiment is to provide upper bound on the estimate of how well a combination of these two systems could be made to perform. In this experiment, answers from two systems are taken and based on the correctness and confidence score best answer is selected. If both systems gets incorrect answer then it would reflected as wrong answer in this combined system experiment.
themis analyze collate <qa-pairs.csv> <answers.system1.csv> <answers.system2.csv> <answers.system3.csv> --judgments <judgments.csv> > <collate_agree.csv>
here qa-pairs.csv
is question frequency file generated by the 'question extract' command. answers.system1.csv
,answers.system2.csv
, answers.system3.csv
are answer file generated by querying respective systems(can be multiple files from multiple systems). Optional argument judgement.csv
is Q&A pair judgments generated by the 'judge interpret' command. output is 'collate_agree.csv'.
This command collate system answer confidences and annotator judgments by question-answer pair. If annotation is not completed for whole or subset of the question list there might be possibility that judgement.csv
will not be the subset of qa-pairs.csv
. If multiple systems are being judged, there may be Q/A pairs in the
judgements that don't appear in the system answers.
themis analyze oracle <collate_agree.csv> <system1> <system2> > <collate_agree_oracle.csv>
here 'collate_agree.csv' is the combined file with all systems answers for each question. It contains answers from each system along with confidence, judgement, purview, frequency etc for all questions in the test set. system1 and system2 are the name of systems to be combined into single collated file.
This command will generate output file as 'collate_agree_oracle.csv' which will have unique answers for each question in the test set with best Answering System.
Note: this combination command is applicable for more than two systems as well.
'collate_agree.csv' from oracle experiment serves as an input in three orchestration schemes to determine which system will answer a question posted by user. It is assumed that only two systems are considered for this orchestrations.
themis analyze fallback <collate_agree.csv> <system1> <system2> > <fallback_output.csv>
here 'collate_agree.csv' is the combined file with all systems answers for each question. system1 and system2 will be name of two systems. fallback_output.csv is the output of fallback system.
Fallback will first query system1 if the top class has a confidence threshold above some value t, it will return the answers associated with system1. If the confidence threshold is below t, it will return the answers from system2. The threshold value t is chosen to be optimal for each data-set.
themis analyze voting-router <collate_agree.csv> <system1> <system2> > <voting_output.csv>
here 'collate_agree.csv' is the combined file with all systems answers for each question. system1 and system2 will be name of two systems. 'voting_output.csv' is the output of voting system.
Voting will query system1 and system2 in parallel. It will then pick the answer to use based on the empirically measured precision value associated with the system1 confidence value of the top class, and the empirically measured precision value associated with the system2 score of the top document.
This is the binary routing system with two phase: training and testing
####Training:
themis analyze nlc-as-router train <url> <username> <password> <collate_agree_oracle.csv> <path>
here url, username and password are credentials from NLC instance on Bluemix. 'collate_agree_oracle.csv' is output file from oracle experiment. path is local system directory path where intermediate results are stored and input files are taken from.
This command will first divide the collate_agree_oracle.csv into 8 equal training and testing data-sets using 8-fold cross-validation. Each fold is then used as input in the NLC binary training system. Finally this command will return with the list of classifier ids.
####Testing:
themis analyze nlc-as-router test <url> <username> <password> <collate_agree.csv> <path> -ids <classifier_id_list> > <router_output.csv>
here url, username and password are credentials from NLC instance on Bluemix. 'collate_agree.csv' is the combined file with all systems answers for each question.path is local system directory path where intermediate results are stored and input files are taken from. classifier_id_list is the list of classifier ids retrieved from training command.'router_output.csv' is the final output of nlc as a router system.
This command will test on the trained data from respective fold and combine the result into one single file with questions in the testing set associated to the respective answers from answering system based on the NLC training on the data-set.