This package provides a simple tool for performing Genome-Wide Association Studies (GWAS) on continuous phenotypes using linear regression.
To install the package, access via GitHub repository link.
git clone https://github.com/chloekeggen/gwas-project.git
It is recommended to download Anaconda and run package within Anaconda terminal. Also, in order to ensure dependencies and paths are maintained correctly, it is recommended to create a new Anaconda environment. Then, continue with the following steps.
cd gwas-project
pip install -r requirements.txt
python setup.py install
After installing the package, you can use gwas-tools-cli.py to perform GWAS on your data.
gwas-tools-cli --vcf <path_to_vcf_file> --pheno <path_to_phenotype_file> --out <output_file_prefix>
Replace <path_to_vcf_file> with the path to your VCF file containing genotype data, <path_to_phenotype_file> with the path to your phenotype file, and <output_file_prefix> with the desired name for the output files.
--maf <maf_threshold>
--h OR --help
Adjust MAF threshold for filtering SNPs as needed; the default is 0.05. Use --help for a list of valid arguments that can be used
- <output_file_prefix>_results.csv: CSV file containing the results of the linear regression analysis.
- <output_file_prefix>_manhattan_plot.png: Manhattan plot visualizing the results of the GWAS analysis.
- <output_file_prefix>_QQ_plot.png: QQ plot visualizing the results of the GWAS analysis.
gwas-tools-cli --vcf subset_lab3_gwas_CHR_18_19_20.vcf.gz --pheno subset_lab3_gwas_CHR_18_19_20.phen --out gwas_results
This command will perform GWAS on the subsections of genotype and phenotype data files from Lab 3 (ie: data from chromosomes 18, 19, 20), and save the results into 3 gwas_results files.
- pandas
- numpy
- pyvcf3
- statsmodels
- matplotlib