simAIRR provides a simulation approach to generate synthetic AIRR datasets that are suitable for benchmarking machine learning (ML) methods, where undesirable access to ground truth signals in training datasets for ML methods is mitigated. Unlike state-of-the-art approaches, simAIRR constructs antigen-experienced-like baseline repertoires and introduces signals by following the empirical relationship between generation probability and sharing pattern of public sequences calibrated from real-world experimental datasets.
To get started:
- For installation instructions and tutorials, see documentation: https://kanduric.github.io/simAIRR/
- Consult the tutorials for detailed examples of different workflows
- Read a brief overview of simAIRR's simulation approach under simulation approach
- Consult the descriptions of valid parameter configurations
$ pip install simAIRR
$ pip install git+https://github.com/KanduriC/simAIRR.git
$ docker run -it -v $(pwd):/wd --name my_container kanduric/simairr:latest sim_airr --help