-
Notifications
You must be signed in to change notification settings - Fork 13
Configure the parameters for Fast Higashi
ruochiz edited this page Nov 27, 2022
·
3 revisions
All customizable parameters are stored in a JSON config file. The path to this JSON config file will be needed when running the program. For examples of the configuration JSON file, see the tutorials linked in this wiki.
Higashi and Fast-Higashi shares most of the parameters, the JSON file you prepared for Higashi program can be directly reused for Fast-Higashi with a few additional parameters.
If you plan to run Higashi also, and already prepared the config JSON file according to this, simply include the following parameters, and you are good!
params | Type | Required/Optional | description | example |
---|---|---|---|---|
resolution_fh | list | Required | The resolution of contact maps | [500000] |
batch_id | str | Optional | The name of the batch id information stored in label_info.pickle . The corresponding information would be used to remove batch effects |
"batch id" |
blacklist | str | Optional | Path of the ENCODE black file (https://github.com/Boyle-Lab/Blacklist). Will be used to filter out contacts | "/home/rzhang/Higashi/hg19-blacklist.v2.bed" |
If you plan to only run Fast-Higashi, you will need the following parameters as well.
params | Type | Required/Optional | description | example |
---|---|---|---|---|
data_dir | str | Required | Directory where the data are stored | "/sn-m3C-seq" |
input_format | str | Optional | How the data are stored. Can either be "higashi_v1" or "higashi_v2". "higashi_v1" stands for storing the scHi-C dataset as one big table named as data.txt. "higashi_v2" stands for storing contact pairs as individual tables for each cell, and list the path to these files in the filelist.txt | "higashi_v1" |
header_included | bool |
Required when input_format ="higashi_v2" |
whether the header of the tab is included in each table | true |
contact_header | list |
Required when input_format ="higashi_v2" and header_included is false |
The header of the contact pairs. Must include ["chrom1", "pos1", "chrom2","pos2"], when "count" is not included, the program assumes count=1 for all contact pairs | ["chrom1", "pos1", "chrom2", "pos2", "count"] |
structured | bool | Required | Whether the data.txt file is structured (interaction pairs of a cell i is successive in the dataframe not randomly placed). If the data.txt is organized before, it could save a lot of memory and time for processing | true |
temp_dir | str | Required | Directory where the temporary files will be stored. An empty folder will be created if it doesn't exists. | "../Temp/sn-m3C_1Mb" |
genome_reference_path | str | Required | Path of the genome reference file from USCS Genome Browser, will be used to generate bin nodes | "../hg19.chrom.sizes.txt" |
chrom_list | str | Required | List of chromosomes to train the model on. The name convention should be the same as the data.txt and the genome_reference file | ["chr1", "chr2","chr3","chr4","chr5"] |
resolution | int | Required | Resolution for imputation. | 1000000 |
resolution_cell | int | Required | Resolution for generate attributes of the cell nodes. Recommend to use 1Mb (data with lower coverage per cell) or 500Kb (data with higher coverage per cell). | 1000000 |
Higashi ~ ~ Wiki
- Input files
- Usage (API)
- [Fast-Higashi initialized Higashi (Under construction)]
- Runtime of Fast-Higashi