Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add all simulation options to Makita #48

Merged
merged 4 commits into from
Nov 30, 2023
Merged

Add all simulation options to Makita #48

merged 4 commits into from
Nov 30, 2023

Conversation

jteijema
Copy link
Member

Adds every simulation option to all three templates. I would prefer this to be available dynamically, but for now this is the best option.

@jteijema jteijema requested a review from J535D165 November 29, 2023 17:21
@jteijema
Copy link
Member Author

Basic template

command: basic

The basic template prepares a script for conducting a simulation study with one run using the default model settings, and two randomly chosen priors (one relevant and one irrelevant record).

optional arguments:

  -h, --help                                show this help message and exit
  --job_file JOB_FILE, -f JOB_FILE          The name of the file with jobs.                 Default jobs.bat for Windows, otherwise jobs.sh.
  -s DATA_FOLDER                            Dataset folder
  -o OUTPUT_FOLDER                          Output folder
  --init_seed INIT_SEED                     Seed of the priors.                             Seed is set to 535 by default.
  --model_seed MODEL_SEED                   Seed of the models.                             Seed is set to 165 by default.
  --template TEMPLATE                       Overwrite template with template file path.
  --platform PLATFORM                       Platform to run jobs: Windows, Darwin, Linux.   Default: the system of rendering templates.
  --n_runs N_RUNS                           Number of runs.                                 Default: 1.
  --no_wordclouds                           Disables the generation of wordclouds.
  --classifier CLASSIFIER                   Classifier to use.                              Default: nb.
  --feature_extractor FEATURE_EXTRACTOR     Feature_extractor to use.                       Default: tfidf.
  --query_strategy QUERY_STRATEGY           Query strategy to use.                          Default: max.
  --balance_strategy BALANCE_STRATEGY       Balance strategy to use.                        Default: double.
  --instances_per_query INSTANCES_PER_QUERY Number of instances per query.                  Default: 1.
  --stop_if STOP_IF                         The number of label actions to simulate.        Default 'min' will stop simulating when all relevant records are found.

ARFI template

command: arfi

The ARFI template (All relevant, fixed irrelevant) prepares a script for running a simulation study in such a way that for every relevant record 1 run will be executed with 10 randomly chosen irrelevant records which are kept constant over runs. When multiple datasets are available the template orders the tasks in the job file per dataset.

optional arguments:

  -h, --help                                show this help message and exit
  --job_file JOB_FILE, -f JOB_FILE          The name of the file with jobs.                 Default jobs.bat for Windows, otherwise jobs.sh.
  -s DATA_FOLDER                            Dataset folder
  -o OUTPUT_FOLDER                          Output folder
  --init_seed INIT_SEED                     Seed of the priors.                             Seed is set to 535 by default.
  --model_seed MODEL_SEED                   Seed of the models.                             Seed is set to 165 by default.
  --template TEMPLATE                       Overwrite template with template file path.
  --platform PLATFORM                       Platform to run jobs: Windows, Darwin, Linux.   Default: the system of rendering templates.
  --n_priors N_PRIORS                       Number of priors.                               Default: 10.
  --no_wordclouds                           Disables the generation of wordclouds.
  --classifier CLASSIFIER                   Classifier to use.                              Default: nb.
  --feature_extractor FEATURE_EXTRACTOR     Feature_extractor to use.                       Default: tfidf.
  --query_strategy QUERY_STRATEGY           Query strategy to use.                          Default: max.
  --balance_strategy BALANCE_STRATEGY       Balance strategy to use.                        Default: double.
  --instances_per_query INSTANCES_PER_QUERY Number of instances per query.                  Default: 1.
  --stop_if STOP_IF                         The number of label actions to simulate.        Default 'min' will stop simulating when all relevant records are found.

Multiple models template

command: multiple_models

The multiple model template prepares a script for running a simulation study comparing multiple models for one dataset and a fixed set of priors (one relevant and one irrelevant record; identical across models).

optional arguments:

  -h, --help                                show this help message and exit
  --job_file JOB_FILE, -f JOB_FILE          The name of the file with jobs.                 Default jobs.bat for Windows, otherwise jobs.sh.
  -s DATA_FOLDER                            Dataset folder
  -o OUTPUT_FOLDER                          Output folder
  --init_seed INIT_SEED                     Seed of the priors.                             Seed is set to 535 by default.
  --model_seed MODEL_SEED                   Seed of the models.                             Seed is set to 165 by default.
  --template TEMPLATE                       Overwrite template with template file path.
  --platform PLATFORM                       Platform to run jobs: Windows, Darwin, Linux.   Default: the system of rendering templates.
  --n_runs N_RUNS                           Number of runs.                                 Default: 1.
  --no_wordclouds                           Disables the generation of wordclouds.
  --query_strategy QUERY_STRATEGY           Query strategy to use.                          Default: max.
  --balance_strategy BALANCE_STRATEGY       Balance strategy to use.                        Default: double.
  --instances_per_query INSTANCES_PER_QUERY Number of instances per query.                  Default: 1.
  --stop_if STOP_IF                         The number of label actions to simulate.        Default 'min' will stop simulating when all relevant records are found.
  --classifiers CLASSIFIERS                 Classifiers to use                              Default: ['logistic', 'nb', 'rf', 'svm']
  --feature_extractors FEATURE_EXTRACTOR    Feature extractors to use                       Default: ['doc2vec', 'sbert', 'tfidf']
  --impossible_models IMPOSSIBLE_MODELS     Model combinations to exclude                   Default: ['nb,doc2vec', 'nb,sbert']

Copy link
Member

@J535D165 J535D165 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me!

@jteijema jteijema merged commit a9c6fac into asreview:main Nov 30, 2023
3 checks passed
@jteijema jteijema deleted the Add-simulation-options branch November 30, 2023 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants