G2P

A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation

Authors:

You Tang and Xiaolei Liu

Contact:

[[email protected]](Xiaolei Liu)

Installation
- Environment Setup
- Windows
- MAC
- Linux
Data Preparation
- ped
- map
- pop
Genotype Simulation
Phenotype Simulation
- Phenotype _ GUI
- Phenotype _ Pipeline
Population Structure
- Population structure _ GUI
- Population structure _ Pipeline
GWAS
- GWAS _ GUI
- GWAS _ Pipeline
Method Evaluation
- Method Evaluation _ GUI
- Method Evaluation _ Pipeline
FAQ and Hints

Installation

back to top

Environment Setup

back to top
JDK1.8 should be installed and environment variables must be configured before using G2P (http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)

Windows

back to top
GUI
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/gG2P_win_64 and double click the .jar file
Pipeline
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/kG2P_win_64

Mac

back to top
GUI
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/gG2P_mac and double click the .jar file
Pipeline
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/kG2P_mac
permission setting

$ chmod 777 gemma oldplink plink

Linux

back to top
GUI
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/gG2P_linux_x86_64 and run

$ Java -jar gG2P.jar

Pipeline
Download all files from https://github.com/XiaoleiLiuBio/G2P/tree/master/kG2P_linux_x86_64
permission setting

$ chmod 777 gemma oldplink plink

Data Preparation

All files should be prepared with the same prefix

ped

details see http://zzz.bwh.harvard.edu/plink/data.shtml#ped
back to top

Family ID	Individual ID	Trait	marker 2	marker 3	marker 4	marker 5	marker 6
1	33-16	2	A A	A A	A G	A G	A G
1	38-11	2	A G	A G	A A	A G	A G
1	4226	2	A G	A A	A A	A G	A G
1	4722	2	A G	A G	A A	A G	A G
1	A188	2	A A	A A	A A	A G	A G
1	A214N	2	A G	A A	A G	A A	A G
1	A239	2	A A	A A	A G	A G	A A

Family ID	Individual ID	Trait	marker 2	marker 3	marker 4	marker 5	marker 6
1	33-16	2	1 1	1 1	1 3	1 3	1 3
1	38-11	2	1 3	1 3	1 1	1 3	1 3
1	4226	2	1 3	1 1	1 1	1 3	1 3
1	4722	2	1 3	1 3	1 1	1 3	1 3
1	A188	2	1 1	1 1	1 1	1 3	1 3
1	A214N	2	1 3	1 1	1 3	1 1	1 3
1	A239	2	1 1	1 1	1 3	1 3	1 1

map

details see http://zzz.bwh.harvard.edu/plink/data.shtml#map
back to top

Chromosome ID	Marker ID	Physical Distance
1	PZB00859.1	157104
1	PZA01271.1	1947984
1	PZA03613.2	2914066
1	PZA03613.1	2914171
1	PZA03614.2	2915078
1	PZA03614.1	2915242
1	PZA00258.3	2973508

pop

back to top
new samples will be generated using samples within sub-population

Sample ID	sub-Population ID
33-16	1
38_11	1
4226	1
4722	2
A188	2
A214N	2
A239	2
A272	2
A441-5	2
A554	3
A556	3
A6	3
A619	3

qtn

back to top
each column represents simulated QTNs for each phenotype

Phenotype 1	Phenotype 2	Phenotype 3	Phenotype 4	Phenotype 5
66	67	80	83	90
9	15	52	59	135
90	96	143	147	174
3	3	15	58	89
89	118	185	203	212
69	72	72	84	110
46	59	125	204	207
14	15	19	29	39
9	23	65	111	131
19	52	74	179	194

Genotype Simulation

Single Population _ GUI

back to top

Ped: ped file
Map: map file
Path for output Ped/Map: path for output ped and map file
Block: Yes or No, if "Yes", the whole genome will be divided into blocks and exchange to generate new samples
Number of SNPs in each block: Number of SNPs in each block
Imputation: if TRUE, major allele will be used to impute missing values
Population size: simulated sample size

Single Population _ Pipeline

back to top

Windows

java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100 --block 4 --impute

Linux/Mac

java -jar kG2P.jar --ped /root/data/AG.ped --map /root/data/AG.map --outgen /root/data/output --rn 100 --block 4 –impute
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100 --block 4
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100 --impute
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --outgen D:\data\output --rn 100

jar: executive software
ped: ped file
map: map file
outgen: output path
block: number of SNPs in each block
rn: simulated sample size
impute: if 'impute' is added, major allele will be used to impute missing values

Multi Populations _ GUI

back to top

Ped: ped file
Map: map file
Pop: pop file
Path for output Ped/Map: path for output ped and map file
Block: Yes or No, if "Yes", the whole genome will be divided into blocks and exchange to generate new samples
Number of SNPs in each block: Number of SNPs in each block
Imputation: if TRUE, major allele will be used to impute missing values
Sample size of each population: sample size of each new generated population 
Population size: number or vector, simulated sample size

Multi Populations _ Pipeline

back to top

Windows

java -jar kG2P.jar --ped D:\data\AG.ped –map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --block 4 --rn 100

Linux/Mac

java -jar kG2P.jar --ped /root/data/AG.ped --map /root/data/AG.map --pop /root/data/AG.pop --outgen /root/data/output --impute --block 4 --rn 100
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --block 4 --rn 100
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --rn 100
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --impute --rn 100
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --impute --block 4 --rn 100,200,300,400
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --block 4 --rn 100,200,300,400
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --rn 100,200,300,400
java -jar kG2P.jar --ped D:\data\AG.ped --map D:\data\AG.map --pop D:\data\AG.pop --outgen D:\data\output --impute --rn 100,200,300,400

jar: executive software
ped: ped file
map: map file
pop: pop file
outgen: output path
block: number of SNPs in each block
rn: simulated sample size
impute: if 'impute' is added, major allele will be used to impute missing values

Random Simulation _ GUI

back to top

Number of Chromosomes: total number of Chromosomes for each new generated sample
Marker size of each Chromosome: vector, marker size of each Chromosome
Population size: Sample size of new generated population
Physical distance of neighbor markers: Physical distance of neighbor markers
Output ped file path: output path of ped file
Output map file path: output path of map file

Random Simulation _ Pipeline

back to top

Windows

java -jar kG2P.jar --sample 100 --chr 5 --marker 100,200,300,400,500 --d 500 --outgen D:\data\output

Linux/Mac

java -jar kG2P.jar --sample 100 --chr 5 --marker 100,200,300,400,500 --d 500 --outgen /root/data/output

jar: executive software
sample: simulated sample size
chr: Number of Chromosomes
marker: SNP markers for each Chromosome
d: phsical distance (base pairs) between nearby markers
outgen: output path

Phenotype Simulation

back to top

Phenotype _ GUI

Normal distribution

Geometry distribution

Ped file path:
QTN area: the genomic area that used to select QTNs
Range: if 'QTN area' is 'Yes', 'range' can be used to set the 'QTN area'
Distribution of QTN effects: two options, 'Normal' and 'Geometry'
Mean: mean value of the normal distribution, length of 'Mean' should be the same with 'Variance'
Variance: variance of the normal distribution
Formats: phenotype formats of 'GEMMA', 'Plink', and 'FaST-LMM' softwares
Number of simulated phenotypes: number of simulated phenotypes
Number of QTNs: number of QTNs, if it is a vector, effect of different QTN group will follow different distribution; length of nqtn, m, and v should be same
Heritability: heritability
Output file path: output file path

Phenotype _ Pipeline

Windows

java -jar kG2P.jar --ped D:\data\AG.ped --outgen D:\data\output --rep 100 --dis geo 0.99 --h2 0.5 --nqtn 100 --QTNarea 1-500,1000-1500

Linux/Mac

java -jar kG2P.jar --ped /root/data/AG.ped --outgen /root/data/output --rep 100 --dis geo 0.99 --h2 0.5 --nqtn 100 --QTNarea 1-500,1000-1500
java -jar kG2P.jar --ped D:\data\AG.ped --outgen D:\data\Part2out --rep 100 --dis geo 0.99 --h2 0.5 --nqtn 100
java -jar .\kG2P.jar --ped D:\data\AG.ped --outgen D:\data\output --rep 10 --dis geo 0.99,0.88 --h2 0.5 --nqtn 10,20 --QTNarea 1-500,1000-1500
java -jar KG2P.jar --ped D:\data\AG.ped --outgen D:\data\Part2out --rep 100 --dis nor --m 0,0.1 --v 0.99,0.98 --h2 0.5 --nqtn 10,20 --QTNarea 1-500,1000-1500

jar: executive software
rep: number of simulated phenotypes
dis: distribution of QTN effects, two options, 'nor' and 'geo'
m: mean value of the normal distribution
v: variance of the normal distribution
QTNarea: the genomic area that used to select QTNs
h2: heritability
nqtn: number of QTNs, if it is a vector, effect of different QTN group will follow different distribution; length of nqtn, m, and v should be same

Population Structure

back to top

Population structure _ GUI

Genotype (bed/bim/fam, ped/map): select genotype file
PCA: select if you want to calculate principle components
Number of PCs: number of PCs will be calculated and generated
Kinship: select if you want to calculate Kinship matrix

Population structure _ Pipeline

PC _ Windows

java -jar kG2P.jar --pre "plink --bfile D:\data\AG --pca 3 --out D:\data\AG"

PC _ Linux/Mac

java -jar kG2P.jar --pre "./plink --bfile /root/data/AG --pca 3 --out /root/data/AG"

Kinship _ Linux/Mac

java -jar kG2P.jar --pre "./gemma -bfile /root/data/AG -gk -o testgemma"

jar: executive software
pre: pipeline of the software you want to use, attention that the software should be put in the same path as kG2P.jar

GWAS

back to top

GWAS _ GUI

Genotype (bed/bim/fam, ped/map): select genotype file
Phenotype file path: select the first phenotype file, all phenotypes in the same path will be analyzed one by one; name of the phenotype file must include a continuous order number, e.g., 'phenotype1.txt', 'phenotype2.txt', 'phenotype3.txt'
Results output file path: select output file path
Command: command for running gwas of the first phenotype, user-specific covariates files and kinship file can also included in the command line

GWAS _ Pipeline

Plink _ Windows

java -jar kG2P.jar  --GWAS "plink --bfile D:\data\AG --fam D:\data\out\171104010413\Plink\Plink_snps1.fam --assoc --out D:\data\g2ptemp" --sp Plink_snps1.fam

Plink _ Linux/Mac

java -jar kG2P.jar  --GWAS "./plink --bfile /root/data/AG --fam /root/data/output/171104010413/Plink/Plink_snps1.fam --assoc --out /root/data/g2ptemp" --sp Plink_snps1.fam

Gemma _ Linux/Mac

java -jar kG2P.jar  --GWAS "./gemma -bfile /root/data/AG -p /root/data/out/171104030401/GEMMA/GEMMA_phenotype1.txt -k /root/output/testgemma.cXX.txt -lmm 4 -o g2ptemp" --sp GEMMA_phenotype1.txt

jar: executive software
GWAS: command line used for running gwas of the first phenotype
sp: the first phenotype file, the file path is not needed

Method Evaluation

back to top

Method Evaluation _ GUI

Map file: map file
QTN file: qtn file
GWAS result files path: path of gwas results
Column number of P values: column number of P values
Output file path: output path of power/fdr evaluation results

Method Evaluation _ Pipeline

Plink _ Windows

java -jar kG2P.jar --map D:\data\AG.map --qtn D:\data\output\170106093742\qtn\test.qtn --gwas D:\data\output\Plink_snps1.qassoc --pv 9 --out D:\data\output

Plink _ Linux/Mac

java -jar kG2P.jar --map /root/data/AG.map --qtn /root/data/output/171104030401/qtn/test.qtn --gwas /root/data/output/Plink_snps1.qassoc --pv 9 --out /root/data/output

Gemma _ Linux/Mac

java -jar kG2P.jar --map /root/data/AG.map --qtn /root/data/output/171104030401/qtn/test.qtn --gwas /root/output/GEMMA_phenotype1.assoc.txt --pv 9 --out /root/data/output

jar: executive software
map: map file
qtn: qtn file
gwas: the first gwas result file, there is a one-to-one mapping between gwas result files and columns in qtn file
pv: column number of P values
out: output file path

Citations

For G2P: Hope it will be coming soon
For principle components analysis: 
    if you use plink, please cite: Purcell S, et.al. "PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage  Analyses." American Journal of Human Genetics, 81.3(2007):559-575.
For calculating kinship matrix:
    if you use gemma, please cite: Zhou, X., et.al. "Genome-wide Efficient Mixed Model Analysis for Association Studies." Nature Genetics, 44.7(2012):821.
Please cite all soft wares you used for GWAS and evaluation!

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
data		data
gG2P_linux_x86_64		gG2P_linux_x86_64
gG2P_mac		gG2P_mac
gG2P_win_64		gG2P_win_64
kG2P_linux_x86_64		kG2P_linux_x86_64
kG2P_mac		kG2P_mac
kG2P_win_64		kG2P_win_64
results		results
README.md		README.md

Phenotype 1	Phenotype 2	Phenotype 3	Phenotype 4	Phenotype 5
66	67	80	83	90
9	15	52	59	135
90	96	143	147	174
3	3	15	58	89
89	118	185	203	212
69	72	72	84	110
46	59	125	204	207
14	15	19	29	39
9	23	65	111	131
19	52	74	179	194

Phenotype 1	Phenotype 2	Phenotype 3	Phenotype 4	Phenotype 5
66	67	80	83	90
9	15	52	59	135
90	96	143	147	174
3	3	15	58	89
89	118	185	203	212
69	72	72	84	110
46	59	125	204	207
14	15	19	29	39
9	23	65	111	131
19	52	74	179	194

gyf9835/G2P

Folders and files

Latest commit

History

Repository files navigation

G2P

A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation

Authors:

Contact:

Contents

Installation

Environment Setup

Windows

Mac

Linux

Data Preparation

ped

map

pop

qtn

Genotype Simulation

Single Population _ GUI

Single Population _ Pipeline

Windows

Linux/Mac

Multi Populations _ GUI

Multi Populations _ Pipeline

Windows

Linux/Mac

Random Simulation _ GUI

Random Simulation _ Pipeline

Windows

Linux/Mac

Phenotype Simulation

Phenotype _ GUI

Normal distribution

Geometry distribution

Phenotype _ Pipeline

Windows

Linux/Mac

Population Structure

Population structure _ GUI

Population structure _ Pipeline

PC _ Windows

PC _ Linux/Mac

Kinship _ Linux/Mac

GWAS

GWAS _ GUI

GWAS _ Pipeline

Plink _ Windows

Plink _ Linux/Mac

Gemma _ Linux/Mac

Method Evaluation

Method Evaluation _ GUI

Method Evaluation _ Pipeline

Plink _ Windows

Plink _ Linux/Mac

Gemma _ Linux/Mac

Citations

FAQ and Hints

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages

Phenotype 1	Phenotype 2	Phenotype 3	Phenotype 4	Phenotype 5
66	67	80	83	90
9	15	52	59	135
90	96	143	147	174
3	3	15	58	89
89	118	185	203	212
69	72	72	84	110
46	59	125	204	207
14	15	19	29	39
9	23	65	111	131
19	52	74	179	194