Bento Demo Dataset

Partially synthetic demo dataset for the Bento platform. Requires Python 3.10+

Based partly on data from:

The 1000 Genomes project, © EMBL-EBI
The International Human Epigenome Consortium

Requirements:

Optionally create a virtual environment, e.g.:

virtualenv -p python3 ./env
source env/bin/activate

To install dependencies run:

pip install -r requirements.txt

Usage:

To run:

python generate_dataset.py

This will write phenopackets to synthetic_phenopackets.json and experiments to synthetic_experiments.json.

It also generates transcriptomics files matching the Phenopackets:

counts_matrix_group_{num}.csv
- Raw count matrices
- Sample ID columns
  - Corresponds to biosample IDs in synthetic_phenopackets.json
- Gene ID rows
- Cells represent the raw count for a gene-sample pair
gene_lenghts.csv
- Stores the gene IDs and the genes lengths for normalization

Other useful files are available in the /dataset_files directory:

config.json: a Katsu config file matching the dataset
dats.json: an example DATS file
extra_properties_typing.json: to configure typed extra properties
mock experiment files in .csv, .jpg, .md, .mp4, .pdf, and .xlsx format

Optional Configuration:

The dataset is a mix of fixed and randomly generated values, random values will be the same across different runs of generate_dataset.py. To change the output, modify any of the values in config/constants.py.

The dataset is generated based on the input file config/individuals.json. You can add (or remove) individuals for different output. Individuals with "id" and "sex" fields only will get fully synthetic metadata, while any values in the "biosamples", "experiments" or "diseases" fields will be copied over unmodified. This allows, for example, generating appropriate metadata for real data files (which may involve, e.g., a particular disease).

Optional Data Files:

The dataset is meant for use with genomic data from the 1000 Genomes Project, and transcriptomics data from the International Human Epigenome Consortium. See here for more details on data files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Bento Demo Dataset

Requirements:

Usage:

Optional Configuration:

Optional Data Files:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Bento Demo Dataset

Requirements:

Usage:

Optional Configuration:

Optional Data Files: