This package provides a simple tool for performing Genome-Wide Association Studies (GWAS) on continuous phenotypes using linear regression.
To install the package, access via GitHub repository link.
git clone
It is recommended to download Anaconda and run package within Anaconda terminal. Also, in order to ensure dependencies and paths are maintained correctly, it is recommended to create a new Anaconda environment. Then, continue with the following steps.
cd gwas-project
pip install -r requirements.txt
python install
After installing the package, you can use to perform GWAS on your data.
gwas-tools-cli --vcf <path_to_vcf_file> --pheno <path_to_phenotype_file> --out <output_file_prefix>
Replace <path_to_vcf_file> with the path to your VCF file containing genotype data, <path_to_phenotype_file> with the path to your phenotype file, and <output_file_prefix> with the desired name for the output files.
--maf <maf_threshold>
--h OR --help
Adjust MAF threshold for filtering SNPs as needed; the default is 0.05. Use --help for a list of valid arguments that can be used
- <output_file_prefix>_results.csv: CSV file containing the results of the linear regression analysis.
- <output_file_prefix>_manhattan_plot.png: Manhattan plot visualizing the results of the GWAS analysis.
- <output_file_prefix>_QQ_plot.png: QQ plot visualizing the results of the GWAS analysis.
gwas-tools-cli --vcf subset_lab3_gwas_CHR_18_19_20.vcf.gz --pheno subset_lab3_gwas_CHR_18_19_20.phen --out gwas_results
This command will perform GWAS on the subsections of genotype and phenotype data files from Lab 3 (ie: data from chromosomes 18, 19, 20), and save the results into 3 gwas_results files.
- pandas
- numpy
- pyvcf3
- statsmodels
- matplotlib