YEAR: 2019 +COPYRIGHT HOLDER: Gibran Hemani ++ +
Copyright (c) 2019 Gibran Hemani
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+A data frame of the instruments for an exposure is required. Each +line has the information for one variant for one exposure. The minimum +information required for MR analysis is the following:
+SNP
- rs IDbeta
- The effect size. If the trait is binary then
+log(OR) should be usedse
- The standard error of the effect sizeeffect_allele
- The allele of the SNP which has the
+effect marked in beta
+Other information that is useful for MR can also be provided:
+other_allele
- The non-effect alleleeaf
- The effect allele frequencyPhenotype
- The name of the phenotype for which the SNP
+has an effectYou can also provide the following extra information:
+chr
- Physical position of variant (chromosome)position
- Physical position of variant (position)samplesize
- Sample size for estimating the effect
+sizencase
- Number of casesncontrol
- Number of controlspval
- The P-value for the SNP’s association with the
+exposureunits
- The units in which the effects are
+presentedgene
- The gene or other annotation for the the
+SNPThe data can be read in from a text file using the
+read_exposure_data
function. The file must have a header
+with column names corresponding to the columns described above.
An example of a text file with the default column names is provided +as part of the package, the first few rows look like this:
+Phenotype SNP beta se effect_allele other_allele eaf pval units gene samplesize
+BMI rs10767664 0.19 0.0306122448979592 A T 0.78 5e-26 kg/m2 BDNF 225238
+BMI rs13078807 0.1 0.0204081632653061 G A 0.2 4e-11 kg/m2 CADM2 221431
+BMI rs1514175 0.07 0.0204081632653061 A G 0.43 8e-14 kg/m2 TNNI3K 207641
+BMI rs1558902 0.39 0.0204081632653061 A T 0.42 5e-120 kg/m2 FTO 222476
+BMI rs10968576 0.11 0.0204081632653061 G A 0.31 3e-13 kg/m2 LRRN6C 247166
+BMI rs2241423 0.13 0.0204081632653061 G A 0.78 1e-18 kg/m2 LBXCOR1 227886
+The exact path to the file will be different on everyone’s computer, +but it can be located like this:
+
+bmi_file <- system.file("extdata", "bmi.txt", package = "TwoSampleMR")
You can read the data in like this:
+
+bmi_exp_dat <- read_exposure_data(bmi_file)
+head(bmi_exp_dat)
+#> SNP beta.exposure se.exposure effect_allele.exposure
+#> 1 rs10767664 0.19 0.03061224 A
+#> 2 rs13078807 0.10 0.02040816 G
+#> 3 rs1514175 0.07 0.02040816 A
+#> 4 rs1558902 0.39 0.02040816 A
+#> 5 rs10968576 0.11 0.02040816 G
+#> 6 rs2241423 0.13 0.02040816 G
+#> other_allele.exposure eaf.exposure pval.exposure units.exposure gene.exposure
+#> 1 T 0.78 5e-26 kg/m2 BDNF
+#> 2 A 0.20 4e-11 kg/m2 CADM2
+#> 3 G 0.43 8e-14 kg/m2 TNNI3K
+#> 4 T 0.42 5e-120 kg/m2 FTO
+#> 5 A 0.31 3e-13 kg/m2 LRRN6C
+#> 6 A 0.78 1e-18 kg/m2 LBXCOR1
+#> samplesize.exposure exposure mr_keep.exposure pval_origin.exposure
+#> 1 225238 BMI TRUE reported
+#> 2 221431 BMI TRUE reported
+#> 3 207641 BMI TRUE reported
+#> 4 222476 BMI TRUE reported
+#> 5 247166 BMI TRUE reported
+#> 6 227886 BMI TRUE reported
+#> units.exposure_dat id.exposure data_source.exposure
+#> 1 kg/m2 ImbABK textfile
+#> 2 kg/m2 ImbABK textfile
+#> 3 kg/m2 ImbABK textfile
+#> 4 kg/m2 ImbABK textfile
+#> 5 kg/m2 ImbABK textfile
+#> 6 kg/m2 ImbABK textfile
The output from this function is a new data frame with standardised +column names:
+SNP
exposure
beta.exposure
se.exposure
effect_allele.exposure
other_allele.exposure
eaf.exposure
mr_keep.exposure
pval.exposure
pval_origin.exposure
id.exposure
data_source.exposure
units.exposure
gene.exposure
samplesize.exposure
The function attempts to match the columns to the ones it expects. It +also checks that the data type is as expected.
+If the required data for MR to be performed is not present (SNP name,
+effect size, standard error, effect allele) for a particular SNP, then
+the column mr_keep.exposure
will be FALSE
.
If the text file does not have default column names, this can still +be read in as follows. Here are the first few rows of an example:
+rsid,effect,SE,a1,a2,a1_freq,p-value,Units,Gene,n
+rs10767664,0.19,0.030612245,A,T,0.78,5.00E-26,kg/m2,BDNF,225238
+rs13078807,0.1,0.020408163,G,A,0.2,4.00E-11,kg/m2,CADM2,221431
+rs1514175,0.07,0.020408163,A,G,0.43,8.00E-14,kg/m2,TNNI3K,207641
+rs1558902,0.39,0.020408163,A,T,0.42,5.00E-120,kg/m2,FTO,222476
+Note that this is a CSV file, with commas separating fields. The file +is located here:
+
+bmi2_file <- system.file("extdata/bmi.csv", package = "TwoSampleMR")
To read in this data:
+
+bmi_exp_dat <- read_exposure_data(
+ filename = bmi2_file,
+ sep = ",",
+ snp_col = "rsid",
+ beta_col = "effect",
+ se_col = "SE",
+ effect_allele_col = "a1",
+ other_allele_col = "a2",
+ eaf_col = "a1_freq",
+ pval_col = "p-value",
+ units_col = "Units",
+ gene_col = "Gene",
+ samplesize_col = "n"
+)
+#> No phenotype name specified, defaulting to 'exposure'.
+head(bmi_exp_dat)
+#> SNP beta.exposure se.exposure effect_allele.exposure
+#> 1 rs10767664 0.19 0.03061224 A
+#> 2 rs13078807 0.10 0.02040816 G
+#> 3 rs1514175 0.07 0.02040816 A
+#> 4 rs1558902 0.39 0.02040816 A
+#> 5 rs10968576 0.11 0.02040816 G
+#> 6 rs2241423 0.13 0.02040816 G
+#> other_allele.exposure eaf.exposure pval.exposure units.exposure gene.exposure
+#> 1 T 0.78 5e-26 kg/m2 BDNF
+#> 2 A 0.20 4e-11 kg/m2 CADM2
+#> 3 G 0.43 8e-14 kg/m2 TNNI3K
+#> 4 T 0.42 5e-120 kg/m2 FTO
+#> 5 A 0.31 3e-13 kg/m2 LRRN6C
+#> 6 A 0.78 1e-18 kg/m2 LBXCOR1
+#> samplesize.exposure exposure mr_keep.exposure pval_origin.exposure
+#> 1 225238 exposure TRUE reported
+#> 2 221431 exposure TRUE reported
+#> 3 207641 exposure TRUE reported
+#> 4 222476 exposure TRUE reported
+#> 5 247166 exposure TRUE reported
+#> 6 227886 exposure TRUE reported
+#> units.exposure_dat id.exposure data_source.exposure
+#> 1 kg/m2 Ku3B84 textfile
+#> 2 kg/m2 Ku3B84 textfile
+#> 3 kg/m2 Ku3B84 textfile
+#> 4 kg/m2 Ku3B84 textfile
+#> 5 kg/m2 Ku3B84 textfile
+#> 6 kg/m2 Ku3B84 textfile
If the Phenotype
column is not provided (as is the case
+in this example) then it will assume that the phenotype’s name is simply
+“exposure”. This is entered in the exposure
column. It can
+be renamed manually:
+bmi_exp_dat$exposure <- "BMI"
If the data already exists as a data frame in R then it can be
+converted into the correct format using the format_data()
+function. For example, here is some randomly created data:
+random_df <- data.frame(
+ SNP = c("rs1", "rs2"),
+ beta = c(1, 2),
+ se = c(1, 2),
+ effect_allele = c("A", "T")
+)
+random_df
+#> SNP beta se effect_allele
+#> 1 rs1 1 1 A
+#> 2 rs2 2 2 T
This can be formatted like so:
+
+random_exp_dat <- format_data(random_df, type = "exposure")
+#> No phenotype name specified, defaulting to 'exposure'.
+#> Warning in format_data(random_df, type = "exposure"): The following columns are not present but are helpful for harmonisation
+#> other_alleleeaf
+#> Inferring p-values
+random_exp_dat
+#> SNP beta.exposure se.exposure effect_allele.exposure exposure
+#> 1 rs1 1 1 A exposure
+#> 2 rs2 2 2 T exposure
+#> mr_keep.exposure pval.exposure pval_origin.exposure id.exposure
+#> 1 TRUE 0.3173105 inferred 4nec1Z
+#> 2 TRUE 0.3173105 inferred 4nec1Z
+#> other_allele.exposure eaf.exposure
+#> 1 NA NA
+#> 2 NA NA
A number of sources of instruments have already been curated and are
+available for use. They are provided as data objects in the
+MRInstruments
package. To install:
+remotes::install_github("MRCIEU/MRInstruments")
This package contains a number of data.frames, each of which is a +repository of SNP-trait associations. How to access the data frames is +detailed below:
+The NHGRI-EBI GWAS catalog contains a catalog of significant +associations obtained from GWASs. This version of the data is filtered +and harmonised to contain associations that have the required data to +perform MR, to ensure that the units used to report effect sizes from a +particular study are all the same, and other data cleaning +operations.
+To use the GWAS catalog:
+
+library(MRInstruments)
+data(gwas_catalog)
+head(gwas_catalog)
+#> Phenotype_simple
+#> 1 Eosinophil percentage of white cells
+#> 2 Eosinophil counts
+#> 3 Medication use (agents acting on the renin-angiotensin system)
+#> 4 Post bronchodilator FEV1
+#> 5 DNA methylation variation (age effect)
+#> 6 Ankylosing spondylitis
+#> MAPPED_TRAIT_EFO
+#> 1 eosinophil percentage of leukocytes
+#> 2 eosinophil count
+#> 3 Agents acting on the renin-angiotensin system use measurement
+#> 4 forced expiratory volume, response to bronchodilator
+#> 5 DNA methylation
+#> 6 ankylosing spondylitis
+#> MAPPED_TRAIT_EFO_URI
+#> 1 http://www.ebi.ac.uk/efo/EFO_0007991
+#> 2 http://www.ebi.ac.uk/efo/EFO_0004842
+#> 3 http://www.ebi.ac.uk/efo/EFO_0009931
+#> 4 http://www.ebi.ac.uk/efo/EFO_0004314, http://purl.obolibrary.org/obo/GO_0097366
+#> 5 http://purl.obolibrary.org/obo/GO_0006306
+#> 6 http://www.ebi.ac.uk/efo/EFO_0003898
+#> Initial_sample_description
+#> 1 172,378 European ancestry individuals
+#> 2 172,275 European ancestry individuals
+#> 3 62,752 European ancestry cases, 174,778 European ancestry controls
+#> 4 10,094 European ancestry current and former smoker individuals, 3,260 African American current and former smoker individuals, 178 current and former smoker individuals
+#> 5 Up to 954 individuals
+#> 6 921 Turkish ancestry cases, 907 Turkish ancestry controls, 422 Iranian ancestry cases, 754 Iranian ancestry controls
+#> Replication_sample_description STUDY.ACCESSION
+#> 1 <NA> GCST004600
+#> 2 <NA> GCST004606
+#> 3 <NA> GCST007930
+#> 4 <NA> GCST003262
+#> 5 <NA> GCST006660
+#> 6 <NA> GCST007844
+#> Phenotype Phenotype_info
+#> 1 Eosinophil percentage of white cells
+#> 2 Eosinophil counts
+#> 3 Medication use (agents acting on the renin-angiotensin system)
+#> 4 Post bronchodilator FEV1
+#> 5 DNA methylation variation (age effect)
+#> 6 Ankylosing spondylitis
+#> PubmedID Author Year SNP chr bp_ens_GRCh38 Region gene
+#> 1 27863252 Astle WJ 2016 rs1000005 21 33060745 21q22.11 AP000282.2
+#> 2 27863252 Astle WJ 2016 rs1000005 21 33060745 21q22.11 AP000282.2
+#> 3 31015401 Wu Y 2019 rs1000010 3 11562645 3p25.3 VGLL4
+#> 4 26634245 Lutz SM 2015 rs10000225 4 144312789 4q31.21 Intergenic
+#> 5 30348214 Zhang Q 2018 rs10000513 4 160334994 4q32.1 NR
+#> 6 30946743 Li Z 2019 rs10000518 4 11502867 4p15.33 HS3ST1
+#> Gene_ens effect_allele other_allele beta se pval
+#> 1 AP000282.2,LINC00945 C G -0.02652552 0.003826531 2e-13
+#> 2 AP000282.2,LINC00945 C G -0.02481715 0.003571429 7e-12
+#> 3 G A -0.03724189 0.006377551 6e-09
+#> 4 Intergenic A T -0.04400000 0.009420188 3e-06
+#> 5 NR <NA> <NA> NA NA 4e-08
+#> 6 G A 0.73396926 NA 6e-06
+#> units eaf date_added_to_MRBASE
+#> 1 unit decrease 0.589400 2019-08-29
+#> 2 unit decrease 0.589400 2019-08-29
+#> 3 unit decrease 0.351806 2019-08-29
+#> 4 NR unit decrease 0.350000 2019-08-29
+#> 5 <NA> NA 2019-08-29
+#> 6 <NA> NA 2019-08-29
For example, to obtain instruments for body mass index using the +Speliotes et al 2010 study:
+
+bmi_gwas <-
+ subset(gwas_catalog,
+ grepl("Speliotes", Author) &
+ Phenotype == "Body mass index")
+bmi_exp_dat <- format_data(bmi_gwas)
Independent top hits from GWASs on 121 metabolites in whole blood are
+stored in the metab_qtls
data object. Use
+?metab_qtls
to get more information.
+data(metab_qtls)
+head(metab_qtls)
+#> phenotype chromosome position SNP effect_allele other_allele eaf
+#> 1 AcAce 8 9181395 rs2169387 G A 0.870251
+#> 2 AcAce 11 116648917 rs964184 C G 0.857715
+#> 3 Ace 6 12042473 rs6933521 C T 0.120256
+#> 4 Ala 2 27730940 rs1260326 C T 0.638817
+#> 5 Ala 2 65220910 rs2160387 C T 0.403170
+#> 6 Ala 12 47201814 rs4554975 G A 0.644059
+#> beta se pval n_studies n
+#> 1 0.085630 0.015451 3.61e-08 11 19257
+#> 2 -0.096027 0.014624 6.71e-11 11 19261
+#> 3 -0.091667 0.015885 8.10e-09 14 24742
+#> 4 -0.104582 0.009940 7.40e-26 13 22569
+#> 5 -0.071001 0.009603 1.49e-13 14 24793
+#> 6 -0.069135 0.009598 6.12e-13 14 24792
For example, to obtain instruments for Alanine:
+
+ala_exp_dat <- format_metab_qtls(subset(metab_qtls, phenotype == "Ala"))
Independent top hits from GWASs on 47 protein levels in whole blood
+are stored in the proteomic_qtls
data object. Use
+?proteomic_qtls
to get more information.
+data(proteomic_qtls)
+head(proteomic_qtls)
+#> analyte chr position SNP gene location annotation other_allele
+#> 1 CFHR1 1 196698945 rs12144939 CFH cis missense T
+#> 2 IL6r 1 154425456 rs12126142 IL6R cis missense A
+#> 3 ApoA4 11 116677723 rs1263167 APOA4 cis intergenic G
+#> 4 SELE 9 136149399 rs507666 ABO trans intronic A
+#> 5 FetuinA 3 186335941 rs2070633 AHSG cis missense T
+#> 6 ACE 17 61566031 rs4343 ACE cis synonymous A
+#> effect_allele eaf maf pval beta se
+#> 1 G 0.643 0.357 8.99e-143 -1.108 0.04355258
+#> 2 G 0.608 0.392 1.81e-106 0.850 0.03878364
+#> 3 A 0.803 0.197 2.64e-54 -0.919 0.05922332
+#> 4 G 0.809 0.191 1.01e-52 -0.882 0.05771545
+#> 5 C 0.676 0.324 2.88e-44 -0.629 0.04506925
+#> 6 G 0.508 0.492 6.66e-44 0.493 0.03547679
For example, to obtain instruments for the ApoH protein:
+
+apoh_exp_dat <-
+ format_proteomic_qtls(subset(proteomic_qtls, analyte == "ApoH"))
Independent top hits from GWASs on 32432 gene identifiers and in 44
+tissues are available from the GTEX study in gtex_eqtl
. Use
+?gtex_eqtl
to get more information.
+data(gtex_eqtl)
+head(gtex_eqtl)
+#> tissue gene_name gene_start SNP snp_position
+#> 1 Adipose Subcutaneous RP4-669L17.10 1:317720 rs2519065 1:787151
+#> 2 Adipose Subcutaneous RP11-206L10.1 1:661611 rs11804171 1:723819
+#> 3 Adipose Subcutaneous RP11-206L10.3 1:677193 rs149110718 1:759227
+#> 4 Adipose Subcutaneous RP11-206L10.2 1:700306 rs148649543 1:752796
+#> 5 Adipose Subcutaneous RP11-206L10.9 1:714150 rs12184279 1:717485
+#> 6 Adipose Subcutaneous RP11-206L10.8 1:736259 rs10454454 1:754954
+#> effect_allele other_allele beta se pval n
+#> 1 A G 0.551788 0.0747180 2.14627e-12 298
+#> 2 A T -0.917475 0.1150060 4.99967e-14 298
+#> 3 T C 0.807571 0.1776530 8.44694e-06 298
+#> 4 T C 0.745393 0.0958531 1.82660e-13 298
+#> 5 A C 1.927250 0.2247390 9.55098e-16 298
+#> 6 A G 1.000400 0.1787470 5.61079e-08 298
For example, to obtain instruments for the IRAK1BP1 gene expression +levels in subcutaneous adipose tissue:
+
+irak1bp1_exp_dat <-
+ format_gtex_eqtl(subset(
+ gtex_eqtl,
+ gene_name == "IRAK1BP1" & tissue == "Adipose Subcutaneous"
+ ))
+#> Warning in format_data(gtex_eqtl_subset, type = type, phenotype_col = type, : The following columns are not present but are helpful for harmonisation
+#> eaf
+#> Inferring p-values
Independent top hits from GWASs on 0 DNA methylation levels in whole
+blood across 5 time points are available from the ARIES study in
+aries_mqtl
. Use ?aries_mqtl
to get more
+information.
+data(aries_mqtl)
+head(aries_mqtl)
+#> SNP timepoint cpg beta pval se snp_chr
+#> 1 esv2656832 1 cg21826606 0.3459 1.60408e-26 0.03265336 1
+#> 2 esv2658098 1 cg22681495 -0.6263 1.55765e-66 0.03643240 15
+#> 3 esv2660043 1 cg24276624 -0.5772 3.16370e-26 0.05481823 11
+#> 4 esv2660043 1 cg11157765 -0.5423 1.33928e-22 0.05583777 11
+#> 5 esv2660673 1 cg05832925 -0.5919 2.88011e-50 0.03982467 11
+#> 6 esv2660769 1 cg05859533 -0.6224 1.49085e-58 0.03868158 16
+#> snp_pos effect_allele other_allele eaf sex age units
+#> 1 25591901 I R 0.3974 mixed Birth SD units
+#> 2 86057007 D R 0.2076 mixed Birth SD units
+#> 3 69982552 D R 0.1450 mixed Birth SD units
+#> 4 69982552 D R 0.1450 mixed Birth SD units
+#> 5 74024905 D R 0.1671 mixed Birth SD units
+#> 6 57725395 D R 0.2136 mixed Birth SD units
+#> island_location cpg_chr cpg_pos gene gene_location cis_trans
+#> 1 N_Shore 1 25593055 cis
+#> 2 15 86058755 AKAP13 Body cis
+#> 3 11 69982941 ANO1 Body cis
+#> 4 11 69982996 ANO1 Body cis
+#> 5 S_Shelf 11 74026371 cis
+#> 6 16 57727230 CCDC135 TSS1500 cis
For example, to obtain instruments for cg25212131 CpG DNA methylation +levels in at birth:
+
+cg25212131_exp_dat <-
+ format_aries_mqtl(subset(aries_mqtl, cpg == "cg25212131" &
+ age == "Birth"))
The IEU OpenGWAS database contains the entire summary statistics for +thousands of GWASs. You can browse them here: https://gwas.mrcieu.ac.uk/
+You can use this database to define the instruments for a particular +exposure. You can also use this database to obtain the effects for +constructing polygenic risk scores using different p-value +thresholds.
+You can check the status of the API:
+
+ieugwasr::api_status()
To obtain a list and details about the available GWASs do the +following:
+
+ao <- available_outcomes()
+head(ao)
For information about authentication see https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication.
+The available_outcomes()
function returns a table of all
+the available studies in the database. Each study has a unique ID. e.g.,
+You might obtain
+head(subset(ao, select = c(trait, id)))
+#> trait id
+#> 1 Schizophrenia ieu-b-5103
+#> 2 Schizophrenia ieu-b-5102
+#> 3 Schizophrenia ieu-b-5101
+#> 4 Schizophrenia ieu-b-5100
+#> 5 Schizophrenia ieu-b-5099
+#> 6 Schizophrenia ieu-b-5098
To extract instruments for a particular trait using a particular +study, for example to obtain SNPs for body mass index using the Locke et +al. 2015 GIANT study, you specify the study ID as follows:
+
+bmi2014_exp_dat <- extract_instruments(outcomes = 'ieu-a-2')
+str(bmi2014_exp_dat)
+#> 'data.frame': 79 obs. of 15 variables:
+#> $ pval.exposure : num 2.18e-08 4.57e-11 5.06e-14 5.45e-10 1.88e-28 ...
+#> $ samplesize.exposure : num 339152 339065 313621 338768 338123 ...
+#> $ chr.exposure : chr "1" "1" "1" "1" ...
+#> $ se.exposure : num 0.003 0.0031 0.0087 0.0029 0.003 0.0037 0.0031 0.003 0.0038 0.003 ...
+#> $ beta.exposure : num -0.0168 0.0201 0.0659 0.0181 0.0331 0.0497 -0.0227 0.0221 0.0209 0.0175 ...
+#> $ pos.exposure : int 47684677 78048331 110082886 201784287 72837239 177889480 49589847 96924097 164567689 181550962 ...
+#> $ id.exposure : chr "ieu-a-2" "ieu-a-2" "ieu-a-2" "ieu-a-2" ...
+#> $ SNP : chr "rs977747" "rs17381664" "rs7550711" "rs2820292" ...
+#> $ effect_allele.exposure: chr "G" "C" "T" "C" ...
+#> $ other_allele.exposure : chr "T" "T" "C" "A" ...
+#> $ eaf.exposure : num 0.5333 0.425 0.0339 0.5083 0.6083 ...
+#> $ exposure : chr "Body mass index || id:ieu-a-2" "Body mass index || id:ieu-a-2" "Body mass index || id:ieu-a-2" "Body mass index || id:ieu-a-2" ...
+#> $ mr_keep.exposure : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
+#> $ pval_origin.exposure : chr "reported" "reported" "reported" "reported" ...
+#> $ data_source.exposure : chr "igd" "igd" "igd" "igd" ...
This returns a set of LD clumped SNPs that are GWAS significant for +BMI. You can specify various parameters for this function:
+p1
= P-value threshold for keeping a SNPclump
= Whether or not to return independent SNPs only
+(default is TRUE
)r2
= The maximum LD R-square allowed between returned
+SNPskb
= The distance in which to search for LD R-square
+valuesBy changing changing the p1
parameter it is possible to
+obtain SNP effects for constructing polygenic risk scores.
For standard two sample MR it is important to ensure that the +instruments for the exposure are independent. Once instruments have been +identified for an exposure variable, the IEU OpenGWAS database can be +used to perform clumping.
+You can provide a list of SNP IDs, the SNPs will be extracted from +1000 genomes data, LD calculated between them, and amongst those SNPs +that have LD R-square above the specified threshold only the SNP with +the lowest P-value will be retained. To do this, use the following +command:
+
+bmi_exp_dat <- clump_data(bmi2014_exp_dat)
+str(bmi_exp_dat)
+#> 'data.frame': 30 obs. of 16 variables:
+#> $ SNP : chr "rs10767664" "rs13078807" "rs1514175" "rs1558902" ...
+#> $ beta.exposure : num 0.19 0.1 0.07 0.39 0.11 0.13 0.06 0.09 0.13 0.06 ...
+#> $ se.exposure : num 0.0306 0.0204 0.0204 0.0204 0.0204 ...
+#> $ effect_allele.exposure: chr "A" "G" "A" "A" ...
+#> $ other_allele.exposure : chr "T" "A" "G" "T" ...
+#> $ eaf.exposure : num 0.78 0.2 0.43 0.42 0.31 0.78 0.41 0.24 0.21 0.21 ...
+#> $ pval.exposure : num 5e-26 4e-11 8e-14 5e-120 3e-13 ...
+#> $ units.exposure : chr "kg/m2" "kg/m2" "kg/m2" "kg/m2" ...
+#> $ gene.exposure : chr "BDNF" "CADM2" "TNNI3K" "FTO" ...
+#> $ samplesize.exposure : int 225238 221431 207641 222476 247166 227886 209051 218439 209849 220081 ...
+#> $ exposure : chr "BMI" "BMI" "BMI" "BMI" ...
+#> $ mr_keep.exposure : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
+#> $ pval_origin.exposure : chr "reported" "reported" "reported" "reported" ...
+#> $ units.exposure_dat : chr "kg/m2" "kg/m2" "kg/m2" "kg/m2" ...
+#> $ id.exposure : chr "FXhiAH" "FXhiAH" "FXhiAH" "FXhiAH" ...
+#> $ data_source.exposure : chr "textfile" "textfile" "textfile" "textfile" ...
The clump_data()
function takes any data frame that has
+been formatted to be an exposure data type of data frame. Note that for
+the instruments in the MRInstruments package the SNPs are already LD
+clumped.
Note: The LD reference panel only includes SNPs (no +INDELs). There are five super-populations from which LD can be +calculated, by default European samples are used. Only SNPs with MAF +> 0.01 within-population are available.
+NOTE: If a variant is dropped from your unclumped +data it could be because it is absent from the reference panel. For more +flexibility, including using your own LD reference data, see here: https://mrcieu.github.io/ieugwasr/
+vignettes/gwas2020.Rmd
+ gwas2020.Rmd
This document details changes and new features specifically relating +to the TwoSampleMR R package and the GWAS database behind it.
+We have made a new system for naming datasets, and all datasets are
+organised into data batches. Either new datasets are uploaded one at a
+time in which case they are added to the ieu-a
data batch,
+or there is a bulk upload in which case a new batch is created. For
+example, ukb-a
is a bulk upload of the first round of the
+Neale lab UKBiobank GWAS, and ukb-b
is the IEU GWAS
+analysis of the UKBiobank data. In most cases, a dataset is then
+numbered arbitrarily within the batch. For example, the Locke et al 2014
+BMI analysis was previously known as 2
, and it is now known
+as ieu-a-2
.
There is backward compatibility built into the R packages that access +the data, so if you use an ‘old’ ID, it will automatically translate +that to the new one. But it will give you a warning, and we urge you to +update your scripts to reflect this change.
+Previously you would automatically be asked to authenticate any query +to the database, through google. Now, we are making authentication +voluntary - something that you do at the start of a session only if you +need access to specific private datasets on the database. For the vast +majority of use cases this is not required.
+Another change is that the R package that managed the authentication +has updated, and the file tokens generated are slightly different. For +full information on how to deal with this, see here: https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication
+We conducted a large GWAS analysis using a pipeline that +systematically analysed every PHESANT phenotype +in UK Biobank. There were previously ~20k traits with complete GWAS +data, but a majority of these were binary traits based on very few +numbers of cases. We have now filtered out unreliable datasets, there +are 2514 traits remaining, with any binary traits removed that had fewer +than 1000 cases. Another issue is the combination of small numbers of +cases and allele frequency - here minor allele count (MAC) for a +particular association could be very small which would lead to high +false positives when using Bolt-LMM. The remaining traits have been +filtered to only retain associations where the MAC > 90.
+Document detailing this investigation here: https://htmlpreview.github.io/?https://raw.githubusercontent.com/MRCIEU/ukbb-gwas-analysis/master/docs/ldsc_clumped_analysis.html?token=AAOV6TBQXEXEPT7SUXXLWMC6DWP3O
+Previously the data were QC’d to remove malformed results and then +deposited as we found them. We are now also pre-harmonising all the +data. This means that all alleles are coded on the forward strand, and +the non-effect allele is always aligned to the human genome reference +sequence B37 (so the effect allele is the non-reference allele). This +does mean that sometimes variants have been removed if they did not map +to the human genome, and for most datasets the effect allele has been +switched for approximately half of all sites. When an effect allele +changes we do of course switch the sign of the effect size, so it should +not impact any MR results.
+We have updated the LD reference panel to be harmonised against human +genome build 37, and as a consequence a few variants have been lost from +the version that was previously used.
+Previously we were pre-clumping the tophits and storing them in the +MRInstruments R package, and there was often a delay in updating the +MRInstruments R package after new datasets were uploaded to the +database. We have moved away from this model. Everything dataset is +pre-clumped, but that is stored in the database. If you request default +clumping values when extracting the tophits of a dataset, it will still +be fast but it is retrieving the data from the server, and not from the +MRInstruments package. You can continue to use the MRInstruments package +for GWAS hits from e.g. GTEx or the EBI GWAS catalog.
+We have a new home for the GWAS summary data: https://gwas.mrcieu.ac.uk/.
+All variants have been mapped to chromosome and position
+(hg19/build37). You can query based on chromosome position coordinates.
+This means either a list of <chr:pos>
values, or a
+list of <chr:pos1-pos2>
ranges.
Previously we were excluding these, but they are now retained. Be +warned that if you extract a variant that has multiple alleles then you +may get more than one row for that variant.
+Automated download from the EBI repository, and an automated upload +system and batch data processing system means that more data can be +added faster to keep the database current.
+Previously if a query to the database failed it didn’t give a reason, +hopefully there is more clarity regarding what is happening now. You can +also check the status of the server here: https://api.opengwas.io/api/
+We are trying to make it as flexible as possible to access the data. +The TwoSampleMR R package was previously the only programmatic way to +access the data, now we have the following options:
+It is now possible to perform clumping, or create LD matrices, using +your own local LD reference dataset. You can download the one that we +have been using here: https://github.com/mrcieu/gwasglue#reference-datasets, +or create your own plink format dataset e.g. with larger samples or for +different ancestries. See the LD clumping functions in the ieugwasr package for more +details.
+Previously the data was only accessible through the database. Now the +data can be downloaded in “GWAS VCF” format from here https://gwas.mrcieu.ac.uk/. (IEU members can access all +the data on RDSF or bluecrystal4 directly). This means that if you want +to perform very large or numerous operations, you can do it on HPC or +locally in a more performant manner by using the data files directly. +Please see the gwasvcf R +package on how to work with these data.
+Either the data in the database, or the GWAS VCF files, can be +queried and the results translated into the formats for a bunch of +different R packages for MR, colocalisation, fine mapping, etc. Have a +look at the gwasglue R +package, to see what is available and how to do this. It’s still +under construction, but feel free to try it, make suggestions, and +contribute code.
+We have setup a github issues page here: https://github.com/MRCIEU/opengwas-requests/issues
+Please visit here to make a log of new data requests, or to +contribute new data.
+To install the new version of TwoSampleMR, perform as normal:
+
+install.packages("remotes")
+remotes::install_github("MRCIEU/TwoSampleMR")
To update the package just run the
+remotes::install_github("MRCIEU/TwoSampleMR")
command
+again.
We recommend using this new version going forwards but for a limited +time we are enabling backwards compatibility, in case you are in the +middle of analysis or need to reproduce old analysis. In order to use +the legacy version of the package and the database, install using:
+
+install.packages("remotes")
+remotes::install_github("MRCIEU/TwoSampleMR@0.4.26")
The exposure data and outcome data are now obtained, e.g.:
+
+bmi_exp_dat <- extract_instruments(outcomes = 'ieu-a-2')
+chd_out_dat <- extract_outcome_data(snps = bmi_exp_dat$SNP, outcomes = 'ieu-a-7')
but it is important to harmonise the effects. This means that the +effect of a SNP on the exposure and the effect of that SNP on the +outcome must each correspond to the same allele.
+Note: The IEU GWAS database contains data that is +already harmonised, meaning that the non-effect allele is aligned to the +human genome reference sequence (build 37). It’s still recommended to +harmonise, but in principle everything should be on the forward strand +and effect alleles always relating to the same allele. Some +discrepancies could arise if there are multi-allelic variants that are +represented as different bi-allelic variants in different studies.
+To harmonise the exposure and outcome data, do the following:
+
+dat <- harmonise_data(
+ exposure_dat = bmi_exp_dat,
+ outcome_dat = chd_out_dat
+)
+#> Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)
This creates a new data frame that has the exposure data and outcome +data combined.
+If there were 3 exposure traits and 3 outcome traits then there will +be 9 sets of harmonisations being performed - harmonising the SNP +effects of exposure trait 1 against outcome trait 1; exposure trait 1 +against outcome trait 2; and so on.
+Recent GWASs typically present the effects of a SNP in reference to +the allele on the forward strand. But as reference panels are updated +the forward strand sometimes changes, and GWASs from a few years ago +aren’t guaranteed to be using forward strand conventions.
+Some examples are shown below:
+exposure effect = 0.5
+effect allele = A
+other allele = G
+
+outcome effect = 0.05
+effect allele = A
+other allele = G
+Here the effect allele on the exposure and the outcome is the +same
+exposure effect = 0.5
+effect allele = A
+other allele = G
+
+outcome effect = -0.05
+effect allele = C
+other allele = T
+Here the outcome GWAS is presenting the effect for the alternate +allele on the reverse strand. We need to flip the outcome effect to 0.05 +to correspond to the same allele as the exposure GWAS on the forward +strand.
+exposure effect = 0.5
+effect allele = A
+other allele = G
+
+outcome effect = -0.05
+effect allele = A
+other allele = C
+Here the alleles do not correspond for the same SNP, so this SNP will +be discarded from the analysis.
+exposure effect = 0.5
+effect allele = A
+other allele = T
+effect allele frequency = 0.11
+
+outcome effect = -0.05
+effect allele = A
+other allele = T
+effect allele frequency = 0.91
+Here the alleles correspond, but it is a palindromic SNP, such that +the alleles on the forward strand are the same as on the reverse strand +(A/T on forward is T/A on the reverse). However, the allele frequency of +the effect allele gives us information - if the outcome effect allele +(A) were on the forward strand we would expect it to have a low allele +frequency, but given it has a high frequency (0.91) we infer that the +outcome GWAS is presenting the effect on the reverse strand for the +alternative allele. We would flip the effect to 0.05 for the outcome +GWAS.
+exposure effect = 0.5
+effect allele = A
+other allele = T
+effect allele frequency = 0.50
+
+outcome effect = -0.05
+effect allele = A
+other allele = T
+effect allele frequency = 0.50
+This is similar to the above, except the allele frequency no longer +gives us information about the strand. We would discard this SNP. This +is done for any palindromic SNPs that have minor allele frequency above +0.42.
+There are three options to harmonising the data.
+By default, the harmonise_data
function uses option 2,
+but this can be modified using the action
argument,
+e.g. harmonise_data(exposure_dat, outcome_dat, action = 3)
.
After data harmonisation, users may find that their dataset contains +duplicate exposure-outcome summary sets. This can arise, for example, +when a GWAS consortium has released multiple results from separate GWAS +analyses for the same trait. For example, there are multiple GWAS +summary datasets for body mass index and coronary heart disease:
+
+ao <- available_outcomes()
+ao[ao$trait == "Body mass index", c("trait", "id", "pmid", "author", "sample_size", "nsnp")]
+#> trait id pmid author
+#> 3958 Body mass index ebi-a-GCST90103751 35051171 Wong HS
+#> 4015 Body mass index ebi-a-GCST90095039 35399580 Fern<U+00E1>ndez-Rhodes L
+#> 4020 Body mass index ebi-a-GCST90095034 35399580 Fern<U+00E1>ndez-Rhodes L
+#> 6032 Body mass index ebi-a-GCST90029007 29892013 Loh PR
+#> 6821 Body mass index ebi-a-GCST90025994 34226706 Barton AR
+#> 7045 Body mass index ebi-a-GCST90018947 34594039 Sakaue S
+#> 7259 Body mass index ebi-a-GCST90018727 34594039 Sakaue S
+#> 10738 Body mass index ieu-a-94 23754948 Randall JC
+#> 12717 Body mass index ieu-a-2 25673413 Locke AE
+#> 14254 Body mass index ieu-a-95 23754948 Randall JC
+#> 16676 Body mass index ieu-a-974 25673413 Locke AE
+#> 19343 Body mass index bbj-a-3 28892062 Ishigaki K
+#> 26475 Body mass index ebi-a-GCST006368 30108127 Hoffmann TJ
+#> 28065 Body mass index ieu-b-4815 NA Howe LJ
+#> 28419 Body mass index bbj-a-2 28892062 Ishigaki K
+#> 32371 Body mass index ieu-b-4816 NA Howe LJ
+#> 33869 Body mass index ieu-a-785 25673413 Locke AE
+#> 39217 Body mass index ebi-a-GCST002783 25673413 Locke AE
+#> 40065 Body mass index bbj-a-1 28892062 Ishigaki K
+#> 43262 Body mass index ebi-a-GCST004904 28892062 Akiyama M
+#> 43743 Body mass index ebi-a-GCST006802 26961502 Wood AR
+#> 47894 Body mass index ieu-a-835 25673413 Locke AE
+#> 48917 Body mass index ebi-a-GCST008025 31217584 Wojcik GL
+#> 49208 Body mass index ieu-a-1089 26961502 Wood
+#> sample_size nsnp
+#> 3958 21930 6370138
+#> 4015 330793 2401077
+#> 4020 56161 8764141
+#> 6032 532396 11973091
+#> 6821 457756 4238669
+#> 7045 359983 19066885
+#> 7259 163835 12502877
+#> 10738 60586 2736876
+#> 12717 339224 2555511
+#> 14254 73137 2736876
+#> 16676 171977 2494613
+#> 19343 72390 6108953
+#> 26475 315347 27854527
+#> 28065 51852 NA
+#> 28419 85894 6108953
+#> 32371 99998 7191606
+#> 33869 152893 2477659
+#> 39217 236781 2529499
+#> 40065 158284 5961600
+#> 43262 158284 5952516
+#> 43743 119688 8580466
+#> 47894 322154 2554668
+#> 48917 21955 34343880
+#> 49208 120286 8654252
+ao[ao$trait == "Coronary heart disease", c("trait", "id", "pmid", "author", "ncase", "ncontrol", "nsnp")]
+#> trait id pmid author ncase
+#> 14897 Coronary heart disease ieu-a-7 26343387 Nikpay 60801
+#> 23614 Coronary heart disease ieu-a-9 23202125 Deloukas 63746
+#> 27414 Coronary heart disease ebi-a-GCST000998 21378990 Schunkert H 22233
+#> 38602 Coronary heart disease ieu-a-8 21378990 Schunkert H 22233
+#> 45294 Coronary heart disease ieu-a-6 21378988 Peden 15420
+#> ncontrol nsnp
+#> 14897 123504 9455779
+#> 23614 130681 79129
+#> 27414 64762 2415020
+#> 38602 64762 2420361
+#> 45294 15062 540233
There are therefore multiple potential combinations of body mass +index and coronary heart disease, which would likely lead to duplicate +MR analyses. We recommend that users prune their datasets so that only +the exposure-outcome combination with the highested expected power is +retained. This can be done by selecting the exposure-outcome summary set +with the largest sample size for the outcome, using the power_prune +function:
+
+dat <- power_prune(dat, method = 1, dist.outcome = "binary")
This drops the duplicate exposure-outcome sets with the smaller +outcome sample size (number of cases for binary outcomes). Remaining +duplicates are then dropped on the basis of the exposure sample size. +However, if there are a large number of SNPs available to instrument an +exposure, the outcome GWAS with the better SNP coverage may provide +better power than the outcome GWAS with the larger sample size. This can +occur, for example, if the larger outcome GWAS has used a targeted +genotyping array. In such instances, it may be better to prune studies +on the basis of instrument strength (i.e. variation in exposure +explained by the instrumental SNPs) as well as sample size. This can be +done by setting the method argument to 2:
+
+dat <- power_prune(dat, method = 2, dist.outcome = "binary")
This procedure drops duplicate exposure-outcome sets on the basis of +instrument strength and sample size, and assumes that the SNP-exposure +effects correspond to a continuous trait with a normal distribution +(i.e. exposure should not be binary). The SNP-outcome effects can +correspond to either a binary or continuous trait (default behaviour is +to assume a binary distribution). If the exposure is binary then method +1 should be used.
+The following pages provide detailed illustrations of the TwoSampleMR package.
Two sample Mendelian randomisation (2SMR) is a method to estimate the +causal effect of an exposure on an outcome using only summary statistics +from genome wide association studies (GWAS). Though conceptually +straightforward, there are a number of steps that are required to +perform the analysis properly, and they can be cumbersome. The +TwoSampleMR package aims to make this easy by combining three important +components
+The general principles (G. Davey Smith and +Ebrahim 2003; George Davey Smith and Hemani 2014), and +statistical methods (Pierce and Burgess 2013; +Bowden, Davey Smith, and Burgess 2015) can be found elsewhere, +here we will just outline how to use the R package.
+This package uses the ieugwasr package to +connect to the database of thousands of complete GWAS summary data.
+To install directly from the GitHub repository do the following:
+
+library(remotes)
+install_github("MRCIEU/TwoSampleMR")
If you don’t have the remotes
package install it from
+CRAN using install.packages("remotes")
.
The workflow for performing MR is as follows:
+A diagrammatic overview is shown here:
+A basic analysis, e.g. the causal effect of body mass index on +coronary heart disease, looks like this:
+
+library(TwoSampleMR)
+
+# List available GWASs
+ao <- available_outcomes()
+
+# Get instruments
+exposure_dat <- extract_instruments("ieu-a-2")
+
+# Get effects of instruments on outcome
+outcome_dat <- extract_outcome_data(snps=exposure_dat$SNP, outcomes = "ieu-a-7")
+
+# Harmonise the exposure and outcome data
+dat <- harmonise_data(exposure_dat, outcome_dat)
+
+# Perform MR
+res <- mr(dat)
Each step is documented on other pages in the documentation.
+The statistical methods in TwoSampleMR can be used on any data, but +there are a number of functions that connect to the OpenGWAS database +for data extraction. These OpenGWAS data access functions require +authentication.
+Authentication is changing The main differences are +that:
+Detailed information is given here: https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication.
+Once instruments for the exposure trait have been specified, those +variants need to be extracted from the outcome trait.
+The IEU GWAS database (IGD) contains complete GWAS summary statistics +from a large number of studies. You can browse them here:
+ +To obtain details about the available GWASs programmatically do the +following:
+
+ao <- available_outcomes()
+head(ao)
+#> id trait ncase group_name year author consortium
+#> 1 ieu-b-5103 Schizophrenia 1234 public 2022 Trubetskoy V PGC
+#> 2 ieu-b-5102 Schizophrenia 52017 public 2022 Trubetskoy V PGC
+#> 3 ieu-b-5101 Schizophrenia 12305 public 2022 Trubetskoy V PGC
+#> 4 ieu-b-5100 Schizophrenia 64322 public 2022 Trubetskoy V PGC
+#> 5 ieu-b-5099 Schizophrenia 76755 public 2022 Trubetskoy V PGC
+#> 6 ieu-b-5098 Schizophrenia 5998 public 2022 Trubetskoy V PGC
+#> sex pmid population unit
+#> 1 Males and Females 35396580 Hispanic or Latin American logOR
+#> 2 Males and Females 35396580 European logOR
+#> 3 Males and Females 35396580 East Asian logOR
+#> 4 Males and Females 35396580 Mixed logOR
+#> 5 Males and Females 35396580 Mixed logOR
+#> 6 Males and Females 35396580 African American or Afro-Caribbean logOR
+#> sample_size build ncontrol category subcategory ontology
+#> 1 4324 HG19/GRCh37 3090 Disease NA MONDO:0005090
+#> 2 127906 HG19/GRCh37 75889 Disease NA MONDO:0005090
+#> 3 27363 HG19/GRCh37 15058 Disease NA MONDO:0005090
+#> 4 155269 HG19/GRCh37 90947 Disease NA MONDO:0005090
+#> 5 320404 HG19/GRCh37 243649 Disease NA MONDO:0005090
+#> 6 9824 HG19/GRCh37 3826 Disease NA MONDO:0005090
+#> note mr
+#> 1 <NA> NA
+#> 2 <NA> NA
+#> 3 <NA> NA
+#> 4 Core - East Asian and European meta analysis NA
+#> 5 Primary - meta analysis of Eur, East Asian, African American and Latino NA
+#> 6 <NA> NA
+#> nsnp doi coverage study_design priority sd
+#> 1 NA <NA> <NA> <NA> NA NA
+#> 2 NA <NA> <NA> <NA> NA NA
+#> 3 NA <NA> <NA> <NA> NA NA
+#> 4 NA <NA> <NA> <NA> NA NA
+#> 5 NA <NA> <NA> <NA> NA NA
+#> 6 NA <NA> <NA> <NA> NA NA
For information about authentication see https://mrcieu.github.io/ieugwasr/articles/guide.html#authentication.
+The available_outcomes
function returns a table of all
+the available studies in the database. Each study has a unique ID.
+e.g.
If we want to perform MR of BMI against coronary heart disease, we +need to identify the SNPs that influence the BMI, and then extract those +SNPs from a GWAS on coronary heart disease.
+Let’s get the Locke et al 2014 instruments for BMI as an example:
+
+bmi_exp_dat <- extract_instruments(outcomes = 'ieu-a-2')
+head(bmi_exp_dat)
+#> pval.exposure samplesize.exposure chr.exposure se.exposure beta.exposure
+#> 1 2.18198e-08 339152 1 0.0030 -0.0168
+#> 2 4.56773e-11 339065 1 0.0031 0.0201
+#> 3 5.05941e-14 313621 1 0.0087 0.0659
+#> 4 5.45205e-10 338768 1 0.0029 0.0181
+#> 5 1.88018e-28 338123 1 0.0030 0.0331
+#> 6 2.28718e-40 339078 1 0.0037 0.0497
+#> pos.exposure id.exposure SNP effect_allele.exposure
+#> 1 47684677 ieu-a-2 rs977747 G
+#> 2 78048331 ieu-a-2 rs17381664 C
+#> 3 110082886 ieu-a-2 rs7550711 T
+#> 4 201784287 ieu-a-2 rs2820292 C
+#> 5 72837239 ieu-a-2 rs7531118 C
+#> 6 177889480 ieu-a-2 rs543874 G
+#> other_allele.exposure eaf.exposure exposure
+#> 1 T 0.5333 Body mass index || id:ieu-a-2
+#> 2 T 0.4250 Body mass index || id:ieu-a-2
+#> 3 C 0.0339 Body mass index || id:ieu-a-2
+#> 4 A 0.5083 Body mass index || id:ieu-a-2
+#> 5 T 0.6083 Body mass index || id:ieu-a-2
+#> 6 A 0.2667 Body mass index || id:ieu-a-2
+#> mr_keep.exposure pval_origin.exposure data_source.exposure
+#> 1 TRUE reported igd
+#> 2 TRUE reported igd
+#> 3 TRUE reported igd
+#> 4 TRUE reported igd
+#> 5 TRUE reported igd
+#> 6 TRUE reported igd
We now need to find a suitable GWAS for coronary heart disease. We +can search the available studies:
+
+ao[grepl("heart disease", ao$trait), ]
+#> id
+#> 7958 finn-b-I9_CHD
+#> 10829 ukb-b-3983
+#> 12268 ukb-e-I25_AFR
+#> 14221 ukb-b-2205
+#> 14897 ieu-a-7
+#> 15028 finn-b-I9_SECONDRIGHT_EXNONE
+#> 16883 ukb-b-7436
+#> 18046 ukb-a-534
+#> 18048 ukb-e-I25_CSA
+#> 20207 finn-b-I9_OTHHEART
+#> 22249 finn-b-I9_VHD_EXNONE
+#> 22725 finn-b-I9_PULMHEART
+#> 23614 ieu-a-9
+#> 24603 finn-b-FG_OTHHEART
+#> 27414 ebi-a-GCST000998
+#> 33264 finn-b-FG_PULMHEART
+#> 35220 ukb-d-I9_CHD
+#> 35231 finn-b-I9_OTHILLHEART
+#> 36931 ukb-b-8184
+#> 37302 finn-b-I9_OTHILLHEART_EXNONE
+#> 37894 ukb-b-1668
+#> 38319 finn-b-I9_IHD
+#> 38448 finn-b-I9_RHEUFEV
+#> 38602 ieu-a-8
+#> 41994 finn-b-I9_ISCHHEART
+#> 42095 ukb-d-I9_IHD
+#> 42882 finn-b-I9_SECONDRIGHT
+#> 43747 ukb-d-I9_CHD_NOREV
+#> 45294 ieu-a-6
+#> 46126 finn-b-I9_VHD
+#> 46689 ukb-b-16606
+#> trait
+#> 7958 Major coronary heart disease event
+#> 10829 Diagnoses - main ICD10: I25.9 Chronic ischaemic heart disease, unspecified
+#> 12268 I25 Chronic ischaemic heart disease
+#> 14221 Diagnoses - secondary ICD10: Z82.4 Family history of ischaemic heart disease and other diseases of the circulatory system
+#> 14897 Coronary heart disease
+#> 15028 Secondary right heart disease (no controls excluded)
+#> 16883 Diagnoses - secondary ICD10: I25.1 Atherosclerotic heart disease
+#> 18046 Diagnoses - main ICD10: I25 Chronic ischaemic heart disease
+#> 18048 I25 Chronic ischaemic heart disease
+#> 20207 Other heart diseases (I9_OTHHEART)
+#> 22249 Valvular heart disease including rheumatic fever (no controls excluded)
+#> 22725 Pulmonary heart disease, diseases of pulmonary circulation
+#> 23614 Coronary heart disease
+#> 24603 Other heart diseases (FG_OTHHEART)
+#> 27414 Coronary heart disease
+#> 33264 Pulmonary heart disease
+#> 35220 Major coronary heart disease event
+#> 35231 Other or ill-defined heart diseases
+#> 36931 Diagnoses - secondary ICD10: I25.9 Chronic ischaemic heart disease, unspecified
+#> 37302 Other or ill-defined heart diseases (no controls excluded)
+#> 37894 Diagnoses - main ICD10: I25.1 Atherosclerotic heart disease
+#> 38319 Ischaemic heart disease, wide definition
+#> 38448 Rheumatic fever incl heart disease
+#> 38602 Coronary heart disease
+#> 41994 Ischemic heart diseases
+#> 42095 Ischaemic heart disease, wide definition
+#> 42882 Secondary right heart disease
+#> 43747 Major coronary heart disease event excluding revascularizations
+#> 45294 Coronary heart disease
+#> 46126 Valvular heart disease including rheumatic fever
+#> 46689 Diagnoses - secondary ICD10: I25.8 Other forms of chronic ischaemic heart disease
+#> ncase group_name year author consortium sex
+#> 7958 21012 public 2021 NA NA Males and Females
+#> 10829 1195 public 2018 Ben Elsworth MRC-IEU Males and Females
+#> 12268 302 public 2020 Pan-UKB team NA Males and Females
+#> 14221 9330 public 2018 Ben Elsworth MRC-IEU Males and Females
+#> 14897 60801 public 2015 Nikpay CARDIoGRAMplusC4D Males and Females
+#> 15028 428 public 2021 NA NA Males and Females
+#> 16883 5771 public 2018 Ben Elsworth MRC-IEU Males and Females
+#> 18046 8755 public 2017 Neale Neale Lab Males and Females
+#> 18048 1205 public 2020 Pan-UKB team NA Males and Females
+#> 20207 62081 public 2021 NA NA Males and Females
+#> 22249 38209 public 2021 NA NA Males and Females
+#> 22725 4564 public 2021 NA NA Males and Females
+#> 23614 63746 public 2013 Deloukas CARDIoGRAMplusC4D Males and Females
+#> 24603 58173 public 2021 NA NA Males and Females
+#> 27414 22233 public 2011 Schunkert H NA NA
+#> 33264 4185 public 2021 NA NA Males and Females
+#> 35220 10157 public 2018 Neale lab NA Males and Females
+#> 35231 713 public 2021 NA NA Males and Females
+#> 36931 5861 public 2018 Ben Elsworth MRC-IEU Males and Females
+#> 37302 713 public 2021 NA NA Males and Females
+#> 37894 12171 public 2018 Ben Elsworth MRC-IEU Males and Females
+#> 38319 31640 public 2021 NA NA Males and Females
+#> 38448 573 public 2021 NA NA Males and Females
+#> 38602 22233 public 2011 Schunkert H CARDIoGRAM Males and Females
+#> 41994 30952 public 2021 NA NA Males and Females
+#> 42095 20857 public 2018 Neale lab NA Males and Females
+#> 42882 428 public 2021 NA NA Males and Females
+#> 43747 10157 public 2018 Neale lab NA Males and Females
+#> 45294 15420 public 2011 Peden C4D Males and Females
+#> 46126 38209 public 2021 NA NA Males and Females
+#> 46689 5738 public 2018 Ben Elsworth MRC-IEU Males and Females
+#> pmid population unit sample_size
+#> 7958 NA European NA NA
+#> 10829 NA European SD 463010
+#> 12268 NA African American or Afro-Caribbean NA 6636
+#> 14221 NA European SD 463010
+#> 14897 26343387 Mixed log odds 184305
+#> 15028 NA European NA NA
+#> 16883 NA European SD 463010
+#> 18046 NA European SD 337199
+#> 18048 NA South Asian NA 8876
+#> 20207 NA European NA NA
+#> 22249 NA European NA NA
+#> 22725 NA European NA NA
+#> 23614 23202125 Mixed log odds 194427
+#> 24603 NA European NA NA
+#> 27414 21378990 European logOR 86995
+#> 33264 NA European NA NA
+#> 35220 NA European NA 361194
+#> 35231 NA European NA NA
+#> 36931 NA European SD 463010
+#> 37302 NA European NA NA
+#> 37894 NA European SD 463010
+#> 38319 NA European NA NA
+#> 38448 NA European NA NA
+#> 38602 21378990 European log odds 86995
+#> 41994 NA European NA NA
+#> 42095 NA European NA 361194
+#> 42882 NA European NA NA
+#> 43747 NA European NA 361194
+#> 45294 21378988 Mixed log odds 30482
+#> 46126 NA European NA NA
+#> 46689 NA European SD 463010
+#> build ncontrol category subcategory ontology
+#> 7958 HG19/GRCh37 197780 Binary NA NA
+#> 10829 HG19/GRCh37 461815 Binary NA NA
+#> 12268 HG19/GRCh37 6334 Binary NA NA
+#> 14221 HG19/GRCh37 453680 Binary NA NA
+#> 14897 HG19/GRCh37 123504 Disease Cardiovascular NA
+#> 15028 HG19/GRCh37 218364 Binary NA NA
+#> 16883 HG19/GRCh37 457239 Binary NA NA
+#> 18046 HG19/GRCh37 328444 NA NA NA
+#> 18048 HG19/GRCh37 7671 Binary NA NA
+#> 20207 HG19/GRCh37 156711 Binary NA NA
+#> 22249 HG19/GRCh37 180583 Binary NA NA
+#> 22725 HG19/GRCh37 214228 Binary NA NA
+#> 23614 HG19/GRCh37 130681 Disease Cardiovascular NA
+#> 24603 HG19/GRCh37 160619 Binary NA NA
+#> 27414 HG19/GRCh37 64762 NA NA NA
+#> 33264 HG19/GRCh37 214607 Binary NA NA
+#> 35220 HG19/GRCh37 351037 Binary NA NA
+#> 35231 HG19/GRCh37 156711 Binary NA NA
+#> 36931 HG19/GRCh37 457149 Binary NA NA
+#> 37302 HG19/GRCh37 218079 Binary NA NA
+#> 37894 HG19/GRCh37 450839 Binary NA NA
+#> 38319 HG19/GRCh37 187152 Binary NA NA
+#> 38448 HG19/GRCh37 218219 Binary NA NA
+#> 38602 HG19/GRCh37 64762 Disease Cardiovascular NA
+#> 41994 HG19/GRCh37 187840 Binary NA NA
+#> 42095 HG19/GRCh37 340337 Binary NA NA
+#> 42882 HG19/GRCh37 214228 Binary NA NA
+#> 43747 HG19/GRCh37 351037 Binary NA NA
+#> 45294 HG19/GRCh37 15062 Disease Cardiovascular NA
+#> 46126 HG19/GRCh37 156711 Binary NA NA
+#> 46689 HG19/GRCh37 457272 Binary NA NA
+#> note
+#> 7958 I9_CHD
+#> 10829 41202#I259: Output from GWAS pipeline using Phesant derived variables from UKBiobank
+#> 12268 NA
+#> 14221 41204#Z824: Output from GWAS pipeline using Phesant derived variables from UKBiobank
+#> 14897 <NA>
+#> 15028 I9_SECONDRIGHT_EXNONE
+#> 16883 41204#I251: Output from GWAS pipeline using Phesant derived variables from UKBiobank
+#> 18046 NA
+#> 18048 NA
+#> 20207 I9_OTHHEART
+#> 22249 I9_VHD_EXNONE
+#> 22725 I9_PULMHEART
+#> 23614 <NA>
+#> 24603 FG_OTHHEART
+#> 27414 NA
+#> 33264 FG_PULMHEART
+#> 35220 NA
+#> 35231 I9_OTHILLHEART
+#> 36931 41204#I259: Output from GWAS pipeline using Phesant derived variables from UKBiobank
+#> 37302 I9_OTHILLHEART_EXNONE
+#> 37894 41202#I251: Output from GWAS pipeline using Phesant derived variables from UKBiobank
+#> 38319 I9_IHD
+#> 38448 I9_RHEUFEV
+#> 38602 <NA>
+#> 41994 I9_ISCHHEART
+#> 42095 NA
+#> 42882 I9_SECONDRIGHT
+#> 43747 NA
+#> 45294 <NA>
+#> 46126 I9_VHD
+#> 46689 41204#I258: Output from GWAS pipeline using Phesant derived variables from UKBiobank
+#> mr nsnp doi coverage study_design priority sd
+#> 7958 1 16380466 <NA> <NA> <NA> 0 NA
+#> 10829 1 9851867 <NA> <NA> <NA> 1 NA
+#> 12268 1 15478580 <NA> <NA> <NA> 0 NA
+#> 14221 1 9851867 <NA> <NA> <NA> 1 NA
+#> 14897 1 9455779 <NA> <NA> <NA> 1 NA
+#> 15028 1 16380466 <NA> <NA> <NA> 0 NA
+#> 16883 1 9851867 <NA> <NA> <NA> 1 NA
+#> 18046 1 10894596 <NA> <NA> <NA> 1 NA
+#> 18048 1 9811287 <NA> <NA> <NA> 0 NA
+#> 20207 1 16380466 <NA> <NA> <NA> 0 NA
+#> 22249 1 16380466 <NA> <NA> <NA> 0 NA
+#> 22725 1 16380466 <NA> <NA> <NA> 0 NA
+#> 23614 1 79129 <NA> <NA> <NA> 1 NA
+#> 24603 1 16380466 <NA> <NA> <NA> 0 NA
+#> 27414 1 2415020 <NA> <NA> <NA> 0 NA
+#> 33264 1 16380466 <NA> <NA> <NA> 0 NA
+#> 35220 1 13295130 <NA> <NA> <NA> 0 NA
+#> 35231 1 16380177 <NA> <NA> <NA> 0 NA
+#> 36931 1 9851867 <NA> <NA> <NA> 1 NA
+#> 37302 1 16380466 <NA> <NA> <NA> 0 NA
+#> 37894 1 9851867 <NA> <NA> <NA> 1 NA
+#> 38319 1 16380466 <NA> <NA> <NA> 0 NA
+#> 38448 1 16380466 <NA> <NA> <NA> 0 NA
+#> 38602 1 2420361 <NA> <NA> <NA> 2 NA
+#> 41994 1 16380466 <NA> <NA> <NA> 0 NA
+#> 42095 1 13586589 <NA> <NA> <NA> 0 NA
+#> 42882 1 16380459 <NA> <NA> <NA> 0 NA
+#> 43747 1 13295130 <NA> <NA> <NA> 0 NA
+#> 45294 1 540233 <NA> <NA> <NA> 3 NA
+#> 46126 1 16380358 <NA> <NA> <NA> 0 NA
+#> 46689 1 9851867 <NA> <NA> <NA> 1 NA
The most recent CARDIOGRAM GWAS is ID number ieu-a-7
. We
+can extract the BMI SNPs from this GWAS as follows:
+chd_out_dat1 <- extract_outcome_data(
+ snps = bmi_exp_dat$SNP,
+ outcomes = 'ieu-a-7'
+)
The extract_outcome_data()
function is flexible. The
+snps
argument only requires an array of rsIDs, and the
+outcomes
argument can be a vector of outcomes, e.g.
+chd_out_dat2 <- extract_outcome_data(
+ snps = c("rs234", "rs17097147"),
+ outcomes = c('ieu-a-2', 'ieu-a-7')
+)
will extract the two SNPs from each of the outcomes
+ieu-a-2
and ieu-a-7
.
By default if a particular requested SNP is not present in the +outcome GWAS then a SNP (proxy) that is in LD with the requested SNP +(target) will be searched for instead. LD proxies are defined using 1000 +genomes European sample data. The effect of the proxy SNP on the outcome +is returned, along with the proxy SNP, the effect allele of the proxy +SNP, and the corresponding allele (in phase) for the target SNP.
+The parameters for handling LD proxies are as follows:
+proxies
= TRUE or FALSE (TRUE by default)rsq
= numeric value of minimum rsq to find a proxy.
+Default is 0.8, minimum is 0.6palindromes
= Allow palindromic SNPs? Default is 1
+(yes)maf_threshold
= If palindromes allowed then what is the
+maximum minor allele frequency of palindromes allowed? Default is
+0.3.If you have GWAS summary data that is not present in IEU GWAS +database, this can still be used to perform analysis.
+Supposing there is a GWAS summary file called “gwas_summary.csv” with +e.g. 2 million rows and it looks like this:
+rsid,effect,SE,a1,a2,a1_freq,p-value,Units,Gene,n
+rs10767664,0.19,0.030612245,A,T,0.78,5.00E-26,kg/m2,BDNF,225238
+rs13078807,0.1,0.020408163,G,A,0.2,4.00E-11,kg/m2,CADM2,221431
+rs1514175,0.07,0.020408163,A,G,0.43,8.00E-14,kg/m2,TNNI3K,207641
+rs1558902,0.39,0.020408163,A,T,0.42,5.00E-120,kg/m2,FTO,222476
+...
+...
+To extract the exposure SNPs from this data, we would use the +following command:
+
+outcome_dat <- read_outcome_data(
+ snps = bmi_exp_dat$SNP,
+ filename = "gwas_summary.csv",
+ sep = ",",
+ snp_col = "rsid",
+ beta_col = "effect",
+ se_col = "SE",
+ effect_allele_col = "a1",
+ other_allele_col = "a2",
+ eaf_col = "a1_freq",
+ pval_col = "p-value",
+ units_col = "Units",
+ gene_col = "Gene",
+ samplesize_col = "n"
+)
This returns an outcome data frame with only the SNPs that were +requested (if those SNPs were present in the “gwas_summary.csv” +file).
+The extract_outcome_data
function returns a table of SNP
+effects for the requested SNPs on the requested outcomes. The format of
+the data is similar to the exposure data format, except the main columns
+are as follows:
SNP
beta.outcome
se.outcome
samplesize.outcome
ncase.outcome
ncontrol.outcome
pval.outcome
eaf.outcome
effect_allele.outcom
other_allele.outcome
units.outcome
outcome
consortium.outcome
year.outcome
pmid.outcome
id.outcome
originalname.outcome
proxy.outcome
target_snp.outcome
proxy_snp.outcome
target_a1.outcome
target_a2.outcome
proxy_a1.outcome
proxy_a2.outcome
mr_keep.outcome
data_source.outcome
We have developed a summary data format called “GWAS VCF”, which is +designed to store GWAS results in a strict and performant way. It is +possible to use this format with the TwoSampleMR package. Going down +this avenue also allows you to use LD proxy functionality using your own +LD reference files (or ones that we provide). For more details, see this +package that explains the format and how to query it in R:
+https://github.com/mrcieu/gwasvcf
+and this package for how to connect the data to other packages +including TwoSampleMR
+ +vignettes/perform_mr.Rmd
+ perform_mr.Rmd
Let’s continue with the example of BMI on CHD:
+
+bmi_exp_dat <- extract_instruments(outcomes = 'ieu-a-2')
+chd_out_dat <- extract_outcome_data(snps = bmi_exp_dat$SNP, outcomes = 'ieu-a-7')
+dat <- harmonise_data(bmi_exp_dat, chd_out_dat)
+#> Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)
Once the exposure and outcome data are harmonised, we have effects +and standard errors for each instrument SNP available for the exposure +and outcome traits. We can use this information to perform Mendelian +randomisation. To do this, simply run:
+
+res <- mr(dat)
+#> Analysing 'ieu-a-2' on 'ieu-a-7'
+res
+#> id.exposure id.outcome outcome
+#> 1 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 2 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 3 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 4 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 5 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> exposure method nsnp b
+#> 1 Body mass index || id:ieu-a-2 MR Egger 79 0.5024935
+#> 2 Body mass index || id:ieu-a-2 Weighted median 79 0.3870065
+#> 3 Body mass index || id:ieu-a-2 Inverse variance weighted 79 0.4459091
+#> 4 Body mass index || id:ieu-a-2 Simple mode 79 0.3401554
+#> 5 Body mass index || id:ieu-a-2 Weighted mode 79 0.3790910
+#> se pval
+#> 1 0.14396056 8.012590e-04
+#> 2 0.07639889 4.071091e-07
+#> 3 0.05898302 4.032020e-14
+#> 4 0.15001315 2.612742e-02
+#> 5 0.10221374 3.881092e-04
This returns a data frame of estimates of the causal effect of the +exposure on the outcome for a range of different MR methods.
+If there were multiple exposures against multiple outcomes in
+dat
, the mr()
function will perform each MR
+method for each combination of exposure-outcome traits.
The list of available MR methods can be obtained:
+
+mr_method_list()
+#> obj
+#> 1 mr_wald_ratio
+#> 2 mr_two_sample_ml
+#> 3 mr_egger_regression
+#> 4 mr_egger_regression_bootstrap
+#> 5 mr_simple_median
+#> 6 mr_weighted_median
+#> 7 mr_penalised_weighted_median
+#> 8 mr_ivw
+#> 9 mr_ivw_radial
+#> 10 mr_ivw_mre
+#> 11 mr_ivw_fe
+#> 12 mr_simple_mode
+#> 13 mr_weighted_mode
+#> 14 mr_weighted_mode_nome
+#> 15 mr_simple_mode_nome
+#> 16 mr_raps
+#> 17 mr_sign
+#> 18 mr_uwr
+#> name PubmedID
+#> 1 Wald ratio
+#> 2 Maximum likelihood
+#> 3 MR Egger 26050253
+#> 4 MR Egger (bootstrap) 26050253
+#> 5 Simple median
+#> 6 Weighted median
+#> 7 Penalised weighted median
+#> 8 Inverse variance weighted
+#> 9 IVW radial
+#> 10 Inverse variance weighted (multiplicative random effects)
+#> 11 Inverse variance weighted (fixed effects)
+#> 12 Simple mode
+#> 13 Weighted mode
+#> 14 Weighted mode (NOME)
+#> 15 Simple mode (NOME)
+#> 16 Robust adjusted profile score (RAPS)
+#> 17 Sign concordance test
+#> 18 Unweighted regression
+#> Description use_by_default
+#> 1 TRUE
+#> 2 FALSE
+#> 3 TRUE
+#> 4 FALSE
+#> 5 FALSE
+#> 6 TRUE
+#> 7 FALSE
+#> 8 TRUE
+#> 9 FALSE
+#> 10 FALSE
+#> 11 FALSE
+#> 12 TRUE
+#> 13 TRUE
+#> 14 FALSE
+#> 15 FALSE
+#> 16 FALSE
+#> 17 Tests for concordance of signs between exposure and outcome FALSE
+#> 18 Doesn't use any weights FALSE
+#> heterogeneity_test
+#> 1 FALSE
+#> 2 TRUE
+#> 3 TRUE
+#> 4 FALSE
+#> 5 FALSE
+#> 6 FALSE
+#> 7 FALSE
+#> 8 TRUE
+#> 9 TRUE
+#> 10 FALSE
+#> 11 FALSE
+#> 12 FALSE
+#> 13 FALSE
+#> 14 FALSE
+#> 15 FALSE
+#> 16 FALSE
+#> 17 FALSE
+#> 18 TRUE
To perform them, they can be specified in the mr()
+function, e.g. to only perform MR Egger regression and Inverse variance
+weighted methods,
+mr(dat, method_list = c("mr_egger_regression", "mr_ivw"))
+#> Analysing 'ieu-a-2' on 'ieu-a-7'
+#> id.exposure id.outcome outcome
+#> 1 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 2 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> exposure method nsnp b
+#> 1 Body mass index || id:ieu-a-2 MR Egger 79 0.5024935
+#> 2 Body mass index || id:ieu-a-2 Inverse variance weighted 79 0.4459091
+#> se pval
+#> 1 0.14396056 8.01259e-04
+#> 2 0.05898302 4.03202e-14
By default, all the methods that are labelled TRUE
in
+the use_by_default
column are used by the mr()
+function.
Some of the MR methods can also perform tests for heterogeneity. To +obtain those statistics:
+
+mr_heterogeneity(dat)
+#> id.exposure id.outcome outcome
+#> 1 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 2 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> exposure method Q Q_df
+#> 1 Body mass index || id:ieu-a-2 MR Egger 143.3046 77
+#> 2 Body mass index || id:ieu-a-2 Inverse variance weighted 143.6508 78
+#> Q_pval
+#> 1 6.841585e-06
+#> 2 8.728420e-06
As with the mr()
function, the
+mr_heterogeneity()
function can take an argument to only
+perform heterogeneity tests using specified methods, e.g.
+mr_heterogeneity(dat, method_list = c("mr_egger_regression", "mr_ivw"))
+#> id.exposure id.outcome outcome
+#> 1 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 2 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> exposure method Q Q_df
+#> 1 Body mass index || id:ieu-a-2 MR Egger 143.3046 77
+#> 2 Body mass index || id:ieu-a-2 Inverse variance weighted 143.6508 78
+#> Q_pval
+#> 1 6.841585e-06
+#> 2 8.728420e-06
The intercept term in MR Egger regression can be a useful indication +of whether directional horizontal pleiotropy is driving the results of +an MR analysis. This can be obtained as follows:
+
+mr_pleiotropy_test(dat)
+#> id.exposure id.outcome outcome
+#> 1 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> exposure egger_intercept se pval
+#> 1 Body mass index || id:ieu-a-2 -0.001719304 0.003985962 0.6674266
To obtain the MR estimates using each of the SNPs singly we can do +the following:
+
+res_single <- mr_singlesnp(dat)
This returns a data.frame of results that is similar to the output
+from mr()
except it performs the analysis multiple times
+for each exposure-outcome combination - each time using a different
+single SNP to perform the analysis.
The method used to perform the single SNP MR is the Wald ratio by +default, though this can be changed, e.g. to use the fixed effects meta +analysis method instead:
+
+res_single <- mr_singlesnp(dat, single_method = "mr_meta_fixed")
The mr_singlesnp()
function calculates the full MR using
+all available SNPs as well, and by default it uses the IVW and MR Egger
+methods. This can be specified as so:
+res_single <- mr_singlesnp(dat, all_method = "mr_two_sample_ml")
will perform only the maximum likelihood method for the combined +test.
+It is possible to perform a leave-one-out analysis, where the MR is +performed again but leaving out each SNP in turn, to identify if a +single SNP is driving the association.
+
+res_loo <- mr_leaveoneout(dat)
By default the method used is the inverse variance weighted method,
+but this can be changed by using the method
argument.
There are a few ways to visualise the results, listed below
+We can depict the relationship of the SNP effects on the exposure +against the SNP effects on the outcome using a scatter plot.
+
+res <- mr(dat)
+#> Analysing 'ieu-a-2' on 'ieu-a-7'
+p1 <- mr_scatter_plot(res, dat)
A scatter plot is created for each exposure-outcome test, and stored
+in p1
as a list of plots. For example, to plot the first
+scatter plot:
+p1[[1]]
And to see how many plots there are:
+
+length(p1)
+#> [1] 1
Lines are drawn for each method used in mr(dat)
, the
+slope of the line corresponding to the estimated causal effect. To limit
+which lines are drawn, simply specify the desired methods, e.g. to only
+draw MR Egger and IVW:
+res <- mr(dat, method_list = c("mr_egger_regression", "mr_ivw"))
+#> Analysing 'ieu-a-2' on 'ieu-a-7'
+p1 <- mr_scatter_plot(res, dat)
It is possible to save this plot using the ggsave()
+function from the ggplot2
package, e.g. to save as a
+pdf
+ggsave(p1[[1]], file = "filename.pdf", width = 7, height = 7)
or save as a png
+
+ggsave(p1[[1]], file = "filename.png", width = 7, height = 7)
See ?ggplot2::ggsave
for more info.
Use the mr_forest_plot()
function to compare the MR
+estimates using the different MR methods against the single SNP
+tests.
+res_single <- mr_singlesnp(dat)
+p2 <- mr_forest_plot(res_single)
+p2[[1]]
Here, the plot shows the causal effect as estimated using each of the +SNPs on their own, and comparing against the causal effect as estimated +using the methods that use all the SNPs.
+To get plots that use different methods, specify them in the
+mr_singlesnp()
function:
+res_single <- mr_singlesnp(dat, all_method = c("mr_ivw", "mr_two_sample_ml"))
+p2 <- mr_forest_plot(res_single)
+p2[[1]]
Use the mr_leaveoneout_plot()
function to visualise the
+leave-one-out analysis:
+res_loo <- mr_leaveoneout(dat)
+p3 <- mr_leaveoneout_plot(res_loo)
+p3[[1]]
Specify the test to use
+e.g. mr_leaveoneout(dat, method = mr_egger_regression)
to
+use MR-Egger regression.
Asymmetry in a funnel plot is useful for gauging the reliability of a +particular MR analysis. Funnel plots can be produced using the single +SNP results as follows:
+
+res_single <- mr_singlesnp(dat)
+p4 <- mr_funnel_plot(res_single)
+p4[[1]]
A 1-to-many MR analysis interrogates the effect of a single exposure
+on multiple outcomes or multiple exposures on a single outcome. The
+results of this analysis can be visualised using the 1-to-many forest
+plot, with or without stratification on a categorical variable. From a
+visual point of view, the function works best for 50 or fewer results
+and is not really designed to handle more than a 100 results. If your
+number of results is much greater than 50, it may be better to split
+these across two separate plots. For example, if you have 100 sets of
+results you could divide these equally across two plots and then combine
+the two plots together in another programme like Powerpoint. The
+function assumes the results are already in the right order for
+plotting. As such, users are advised to sort their results according to
+how they would like them to appear in the plot. Users can use their own
+code to do this or they can use the sort_1_to_many()
+function.
+exp_dat <- extract_instruments(outcomes = c(2, 100, 1032, 104, 1, 72, 999))
+table(exp_dat$exposure)
+chd_out_dat <- extract_outcome_data(
+ snps = exp_dat$SNP,
+ outcomes = 7
+)
+
+dat2 <- harmonise_data(
+ exposure_dat = exp_dat,
+ outcome_dat = chd_out_dat
+)
+res <- mr(dat2)
In this example we wish to plot results from an MR analysis of the +effect of multiple exposures on coronary heart disease, with results +sorted by decreasing effect size (largest effect at the top of the plot) +and with one MR method for each unique exposure-outcome combination. We +will also make the size of each point estimate proportional to its +inverse variance. This is a useful way to draw attention towards the +most reliable results and away from results with very wide confidence +intervals. To specify the size of the point estimate, set the weight +argument to the name of the column in the data with the weight +information.
+
+res <- subset_on_method(res) # default is to subset on either the IVW method (>1 instrumental SNP) or Wald ratio method (1 instrumental SNP).
+res <- sort_1_to_many(res, b = "b", sort_action = 4) # this sorts results by decreasing effect size (largest effect at top of the plot)
+res <- split_exposure(res) # to keep the Y axis label clean we exclude the exposure ID labels from the exposure column
+res$weight <- 1/res$se
+
+min(exp(res$b - 1.96*res$se)) # identify value for 'lo' in forest_plot_1_to_many
+max(exp(res$b + 1.96*res$se)) # identify value for 'up' in forest_plot_1_to_many
+
+forest_plot_1_to_many(
+ res,
+ b = "b",
+ se = "se",
+ exponentiate = TRUE,
+ ao_slc = FALSE,
+ lo = 0.3,
+ up = 2.5,
+ TraitM = "exposure",
+ col1_width = 2,
+ by = NULL,
+ trans = "log2",
+ xlab = "OR for CHD per SD increase in risk factor (95% confidence interval)",
+ weight = "weight"
+)
It is also possible to add additional columns and column titles and +to choose the size of the text in the columns:
+
+res$pval<-formatC(res$pval, format = "e", digits = 2)
+forest_plot_1_to_many(
+ res,
+ b = "b",
+ se = "se",
+ exponentiate = TRUE,
+ ao_slc = FALSE,
+ lo = 0.3,
+ up = 2.5,
+ TraitM = "exposure",
+ by = NULL,
+ trans = "log2",
+ xlab = "OR for CHD per SD increase in risk factor (95% CI)",
+ weight = "weight",
+ subheading_size = 11,
+ col1_title = "Risk factor",
+ col1_width = 2.5,
+ col_text_size = 4,
+ addcols = c("nsnp", "pval"),
+ addcol_widths = c(1.0, 1.0),
+ addcol_titles = c("No. SNPs", "P-val")
+)
In my own workflow I prefer to to keep the plot free of axis and +column titles and to add them separately in a program like +powerpoint:
+
+forest_plot_1_to_many(
+ res,
+ b = "b",
+ se = "se",
+ exponentiate = TRUE,
+ ao_slc = FALSE,
+ lo = 0.3,
+ up = 3.0,
+ TraitM = "exposure",
+ col1_width = 2.0,
+ by = NULL,
+ trans = "log2",
+ xlab = "",
+ addcols = c("nsnp", "pval"),
+ weight = "weight",
+ col_text_size = 4,
+ addcol_widths = c(0.5, 1.0),
+ addcol_titles = c("", "")
+)
In this next example we plot the results from an analysis of the +effect of multiple exposures on coronary heart disease using multiple +methods, with results grouped by exposure. We also want the result for +the IVW method to be given priority and to go above the other methods. +We also want the exposure with the largest IVW effect size to go the top +of the plot. We also set the TraitM argument to the column describing +the MR method. This is because we are grouping the results on the +exposures. Normally the row labels would correspond to the exposures but +in this example we want the row names to correspond to the MR +method.
+
+res <- mr(dat2)
+res <- split_exposure(res) # to keep the Y axis label clean we exclude the exposure ID labels from the exposure column
+
+res <-
+ sort_1_to_many(
+ res,
+ group = "exposure",
+ sort_action = 3,
+ priority = "Inverse variance weighted",
+ trait_m = "method"
+ )
+
+forest_plot_1_to_many(
+ res,
+ b = "b",
+ se = "se",
+ exponentiate = TRUE,
+ trans = "log2",
+ ao_slc = FALSE,
+ lo = 0.03,
+ up = 22,
+ col1_width = 2,
+ by = "exposure",
+ TraitM = "method",
+ xlab = "OR for CHD per SD increase in risk factor (95% confidence interval)",
+ subheading_size = 12,
+ col_text_size = 4
+)
In this next example we plot the same results as above but with +results stratified by a grouping variable. We also select one MR method +for each unique exposure-outcome combination and sort the results by +decreasing effect size within each group (i.e. largest effect at the +top).
+
+res <- mr(dat2)
+res <- split_exposure(res)
+res <- subset_on_method(res)
+res$subcategory[res$exposure %in% c("Adiponectin", "Hip circumference", "Waist circumference")] <- "Group 1"
+res$subcategory[is.na(res$subcategory)] <- "Group 2"
+res$weight <- 1/res$se
+res <- sort_1_to_many(res, sort_action = 1, group = "subcategory")
+
+forest_plot_1_to_many(
+ res,
+ b = "b",
+ se = "se",
+ exponentiate = TRUE,
+ trans = "log2",
+ ao_slc = FALSE,
+ lo = 0.3,
+ up = 2.5,
+ TraitM = "exposure",
+ col_text_size = 4,
+ col1_width = 1.5,
+ by = "subcategory",
+ xlab = "OR for CHD per SD increase in risk factor (95% confidence interval)",
+ subheading_size = 14,
+ weight = "weight"
+)
In the above example we made up an arbitrary grouping variable called +“subcategory” with values “Group 1” and “Group 2”. Typically, however, +the grouping variable might correspond to something like a trait +ontology (e.g. anthropometric and glycemic traits) or study design +(e.g. MR and observational studies).
+The plot function works best with 50 or fewer rows and is not really +designed to handle more than a 100. Visualising a single-column forest +plot with 100 results is also quite difficult. If your number of results +is much greater than 50, it is advisable to split the results across two +different plots. In the example below we select BMI as the exposure and +test this against 103 diseases in the IEU GWAS database:
+
+exp_dat <- extract_instruments(outcomes = 2) # extract instruments for BMI
+ao <- available_outcomes()
+ao <- ao[ao$category == "Disease", ] # identify diseases
+ao <- ao[which(ao$ncase > 100), ]
+
+dis_dat <- extract_outcome_data(
+ snps = exp_dat$SNP,
+ outcomes = ao$id
+)
+
+dat3 <- harmonise_data(
+ exposure_dat = exp_dat,
+ outcome_dat = dis_dat
+)
+
+res <- mr(dat3, method_list = c("mr_wald_ratio", "mr_ivw"))
+res <- split_outcome(res) # to keep the Y axis label clean we exclude the exposure ID labels from the exposure column
+
+res <- sort_1_to_many(res, b = "b", sort_action = 4) # this sorts results by decreasing effect size (largest effect at top of the plot)
MR results for 103 diseases can be difficult to visualise in a
+single-column forest plot. In my own workflow I would split these across
+two plots and then join them together in a separate program, such as
+Powerpoint, and do further refinements there. I typically save my plots
+using the pdf()
graphics device. In this particular example
+the disease labels probably require some cleaning up (some are a bit
+long) or alternatively the column text size could be made smaller. It is
+also possible to change the colour of the plot and the shape of the
+point estimates. Type ?forest_plot_1_to_many
for further
+details.
+res1 <- res[1:52, ]
+res2 <- res[53:103, ]
+
+plot1 <- forest_plot_1_to_many(
+ res1,
+ b = "b",
+ se = "se",
+ exponentiate = TRUE,
+ trans = "log2",
+ ao_slc = FALSE,
+ lo = 0.004,
+ up = 461,
+ col1_width = 2,
+ TraitM = "outcome",
+ col_text_size = 3,
+ xlab = ""
+)
+
+plot2 <- forest_plot_1_to_many(
+ res2,
+ b = "b",
+ se = "se",
+ exponentiate = TRUE,
+ trans = "log2",
+ ao_slc = FALSE,
+ lo = 0.004,
+ up = 461,
+ col1_width = 2,
+ TraitM = "outcome",
+ subheading_size = 11,
+ col_text_size = 3,
+ xlab = ""
+)
+
+plot1
+plot2
+
+pdf("plot1.pdf", height = 10, width = 8)
+plot1
+dev.off()
MR.RAPS (Robust Adjusted Profile Score) is a recently proposed method +that considers the measurement error in SNP-exposure effects, is +unbiased when there are many (e.g. hundreds of) weak instruments, and is +robust to systematic and idiosyncratic pleiotropy. See the arXiv preprint for more +detail about the statistical methodology.
+MR.RAPS is implemented in the R package mr.raps that is +available on CRAN. It can be directly called from TwoSampleMR by
+ +MR.RAPS comes with two main options: over.dispersion
+(whether the method should consider systematic pleiotropy) and
+loss.function
(either "l2"
,
+"huber"
, or "tukey"
). The latter two loss
+functions are robust to idiosyncratic pleiotropy. The default option is
+over.dispersion = TRUE
and
+loss.function = "tukey"
. To change these options, modify
+the parameters
argument of mr()
by (for
+example)
+res <-
+ mr(
+ dat,
+ method_list = c("mr_raps"),
+ parameters = list(over.dispersion = FALSE, loss.function = "l2")
+ )
A report can be generated that performs all MR analyses, sensitivity +analyses, and plots, and presents them in a single self-contained html +web page, word document, or pdf document.
+
+mr_report(dat)
By default this produces a html file in the current working +directory, but see the help pages on how to modify this.
+This function will create a separate report file for every
+exposure-outcome combination that is present in the dat
+object.
This is an implementation of the method described here:
+ +In MR it is assumed that the instruments influence the exposure first +and then the outcome through the exposure. But sometimes this is +difficult to evaluate, for example is a cis-acting SNP influencing gene +expression levels or DNA methylation levels first? The causal direction +between the hypothesised exposure and outcomes can be tested using the +Steiger test (Hemani, Tilling, and Davey Smith +2017). For example:
+
+out <- directionality_test(dat)
+#> r.exposure and/or r.outcome not present.
+#> Calculating approximate SNP-exposure and/or SNP-outcome correlations, assuming all are quantitative traits. Please pre-calculate r.exposure and/or r.outcome using get_r_from_lor() for any binary traits
+knitr::kable(out)
id.exposure | +id.outcome | +exposure | +outcome | +snp_r2.exposure | +snp_r2.outcome | +correct_causal_direction | +steiger_pval | +
---|---|---|---|---|---|---|---|
ieu-a-2 | +ieu-a-7 | +Body mass index || id:ieu-a-2 | +Coronary heart disease || id:ieu-a-7 | +0.0158082 | +0.0013505 | +TRUE | +0 | +
It calculates the variance explained in the exposure and the outcome +by the instrumenting SNPs, and tests if the variance in the outcome is +less than the exposure.
+This test is, like many others, liable to give inaccurate causal +directions under some measurement error parameters in the exposure and +the outcome (e.g. if the outcome has much lower measurement precision +then its proportion of variance explained will be underestimated). +Sensitivity can be applied to evaluate the extent to which the inferred +causal direction is liable to measurement error, in two ways.
+These tests are obtained using:
+
+mr_steiger(
+ p_exp = dat$pval.exposure,
+ p_out = dat$pval.outcome,
+ n_exp = dat$samplesize.exposure,
+ n_out = dat$samplesize.outcome,
+ r_xxo = 1,
+ r_yyo = 1,
+ r_exp=0
+)
When SNPs instrument multiple potential exposures, for example in the +case of different lipid fractions, one method for overcoming this +problem is to estimate the influence of each lipid conditioning on the +effects of the SNPs on the other lipids. Multivariable MR can be +performed using the R package as follows. Practically speaking, this is +the process that needs to occur from the perspective of generating the +data in the correct format:
+Example - The GWAS IDs for HDL, LDL and total cholesterol are
+ieu-a-299
, ieu-a-300
and
+ieu-a-302
. The GWAS ID for coronary heart disease (CHD) is
+ieu-a-7
. In this example we will estimate the multivariable
+effects of HDL, LDL and total cholesterol on CHD.
+id_exposure <- c("ieu-a-299", "ieu-a-300", "ieu-a-302")
+id_outcome <- "ieu-a-7"
First obtain the instruments for each lipid fraction. This entails +obtaining a combined set of SNPs including all instruments, and getting +those SNPs for each lipid fraction. Therefore, if there are e.g. 20 +instruments for each of 3 lipid fractions, but combined there are 30 +unique SNPs, then we need to extract each of the 30 SNPs from each lipid +fraction (exposure).
+
+mv_exposure_dat <- mv_extract_exposures(id_exposure)
Next, also extract those SNPs from the outcome.
+
+mv_outcome_dat <- extract_outcome_data(exposure_dat$SNP, id_outcome)
Once the data has been obtained, harmonise so that all are on the +same reference allele.
+
+mvdat <- mv_harmonise_data(mv_exposure_dat, mv_outcome_dat)
+#> Harmonising HDL cholesterol || id:ieu-a-299 (ieu-a-299) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)
Finally, perform the multivariable MR analysis
+
+res <- mv_multiple(mvdat)
This generates a table of results.
+There are several different ways in which this analysis can be +formulated. e.g. consider 3 exposures against one outcome, one +could:
+mv_multiple()
function but the latter was how MV MR was
+originally described by Burgess et al 2015 and can be done with
+mv_residual()
.With these three different parameters there are eight different ways +to do MV analysis. We recommend the default settings as described +above.
+Plots can be generated using the plots = TRUE
argument
+for mv_multiple()
and mv_residual()
.
The current plots being generated are not necessarily adequate +because while they show the slope through the raw points, they do not +demonstrate that the raw points might be effectively different between +plots because they are conditional on the other exposures.
+If you want to perform analysis with your local summary data
+(i.e. not in the OpenGWAS database) then use then look up the
+mv_extract_exposures_local()
function instead of the
+mv_extract_exposures()
function.
In the examples shown so far it is assumed that instruments are +independent (i.e. they are not in linkage disequilibrium, LD). This is +to avoid ‘double counting’ effects. An alternative approach is to +estimate the MR effects accounting for the correlation between +variants.
+The TwoSampleMR package has not implemented this yet, but the MendelianRandomization +R package by Olena Yavorska and Stephen Burgess does have this +functionality. We can use the TwoSampleMR package to extract, format and +harmonise data, and then convert to the format required by the +MendelianRandomization package. The IEU GWAS database server has the +individual level genetic data for ~500 Europeans in 1000 genomes data, +and can obtain the LD matrix for a set of SNPs using these data. For +example:
+ +
+ld_mat
+#> rs234_A_G rs1205_T_C
+#> rs234_A_G 1.0000000 0.0797023
+#> rs1205_T_C 0.0797023 1.0000000
Here ld_matrix()
returns the LD correlation values (not
+R2) for each pair of variants present in the 1000 genomes
+data set.
+dat <- harmonise_data(
+ exposure_dat = bmi_exp_dat,
+ outcome_dat = chd_out_dat
+)
+#> Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)
Convert to the MRInput
format for the
+MendelianRandomization package:
+dat2 <- dat_to_MRInput(dat)
+#> Converting:
+#> - exposure: Body mass index || id:ieu-a-2
+#> - outcome: Coronary heart disease || id:ieu-a-7
This produces a list of MRInput
objects that can be used
+with the MendelianRandomization functions, e.g.
+MendelianRandomization::mr_ivw(dat2[[1]])
+#>
+#> Inverse-variance weighted method
+#> (variants uncorrelated, random-effect model)
+#>
+#> Number of Variants : 79
+#>
+#> ------------------------------------------------------------------
+#> Method Estimate Std Error 95% CI p-value
+#> IVW 0.446 0.059 0.330, 0.562 0.000
+#> ------------------------------------------------------------------
+#> Residual standard error = 1.357
+#> Heterogeneity test statistic (Cochran's Q) = 143.6508 on 78 degrees of freedom, (p-value = 0.0000). I^2 = 45.7%.
+#> F statistic = 65.6.
Alternatively, convert to the MRInput
format but also
+obtaining the LD matrix for the instruments
+dat2 <- try(dat_to_MRInput(dat, get_correlation = TRUE))
+#> Converting:
+#> - exposure: Body mass index || id:ieu-a-2
+#> - outcome: Coronary heart disease || id:ieu-a-7
+#> - obtaining LD matrix
+#> Please look at vignettes for options on running this locally if you need to run many instances of this command.
+#> Warning in ieugwasr::ld_matrix(variants = snps, with_alleles = with_alleles, : The following variants are not present in the LD reference panel
+#> rs2033529
+if (class(dat2) != "try-error") MendelianRandomization::mr_ivw(dat2[[1]], correl = TRUE)
+#>
+#> Inverse-variance weighted method
+#> (variants correlated, random-effect model)
+#>
+#> Number of Variants : 78
+#>
+#> ------------------------------------------------------------------
+#> Method Estimate Std Error 95% CI p-value
+#> IVW 0.441 0.056 0.331, 0.551 0.000
+#> ------------------------------------------------------------------
+#> Residual standard error = 1.414
+#> Heterogeneity test statistic (Cochran's Q) = 153.8519 on 77 degrees of freedom, (p-value = 0.0000). I^2 = 50.0%.
+#> F statistic = 80.3.
+#>
+#> (Estimates with correlated variants are sensitive to the signs in the correlation matrix
+#> - please ensure that your correlations are expressed with respect to the same effect alleles as your summarized association estimates.)
We recently developed MR-MoE, a method to choose the most appropriate +amongst several MR tests using a machine learning algorithm. Note that +the method is still under review, but full details are described here: +https://doi.org/10.1101/173682.
+MR-MoE operates by taking a set of harmonised data, inferring some +characteristics about the dataset, and using those characteristics to +predict how well each of the different MR methods will perform on the +dataset, in terms of maximising power while minimising false discovery +rates.
+In order to run the analysis you must download an RData object that +contains the trained random forests that are used to predict the +efficacy of each method. This can be downloaded from here:
+ +Caution: this is a large file (approx 167Mb)
+Once downloaded, read in the object and use the mr_moe()
+function to perform the analysis. An example is shown here, estimating
+the causal effect of BMI on coronary heart disease:
+# Extact instruments for BMI
+exposure_dat <- extract_instruments("ieu-a-2")
+
+# Get corresponding effects for CHD
+outcome_dat <- extract_outcome_data(exposure_dat$SNP, "ieu-a-7")
+
+# Harmonise
+dat <- harmonise_data(exposure_dat, outcome_dat)
+
+# Load the downloaded RData object. This loads the rf object
+load("rf.rdata")
+
+# Obtain estimates from all methods, and generate data metrics
+res_all <- mr_wrapper(dat)
+
+# MR-MoE - predict the performance of each method
+res_moe <- mr_moe(res_all, rf)
The function does the following:
+For every exposure / outcome combination in the dat
+object, the MR-MoE method is applied. The function returns a list which
+is as long as the number of exposure / outcome combinations. In this
+case, it will be of length 1, containing the result for BMI on CHD.
The result object itself is a list with the following elements:
+estimates
(results from each MR)heterogeneity
(results from heterogeneity for different
+filtering approaches)directional_pleiotropy
(egger intercepts)info
(metrics used to generate MOE)Looking at the estimates
, we see that there is a column
+called MOE
which is the predicted AUROC curve performance
+of each method.
The TwoSampleMR package also provides the following functions for +managing or editing MR results.
+The outcome column in the output of mr() combines the original +outcome name with the outcome trait ID.
+
+head(res)
+#> $result
+#> id.exposure exposure id.outcome
+#> 1 ieu-a-299 HDL cholesterol || id:ieu-a-299 ieu-a-7
+#> 2 ieu-a-300 LDL cholesterol || id:ieu-a-300 ieu-a-7
+#> 3 ieu-a-302 Triglycerides || id:ieu-a-302 ieu-a-7
+#> outcome nsnp b se pval
+#> 1 Coronary heart disease || id:ieu-a-7 79 -0.08919724 0.05970552 1.351879e-01
+#> 2 Coronary heart disease || id:ieu-a-7 68 0.37853543 0.04976846 2.828614e-14
+#> 3 Coronary heart disease || id:ieu-a-7 42 0.13584165 0.06738291 4.380354e-02
The outcome column can be split into separate columns for the id and +outcome name using the split_outcome function:
+
+res <- mr(dat)
+#> Analysing 'ieu-a-2' on 'ieu-a-7'
+split_outcome(res)
+#> id.exposure id.outcome outcome exposure
+#> 1 ieu-a-2 ieu-a-7 Coronary heart disease Body mass index || id:ieu-a-2
+#> 2 ieu-a-2 ieu-a-7 Coronary heart disease Body mass index || id:ieu-a-2
+#> 3 ieu-a-2 ieu-a-7 Coronary heart disease Body mass index || id:ieu-a-2
+#> 4 ieu-a-2 ieu-a-7 Coronary heart disease Body mass index || id:ieu-a-2
+#> 5 ieu-a-2 ieu-a-7 Coronary heart disease Body mass index || id:ieu-a-2
+#> method nsnp b se pval
+#> 1 MR Egger 79 0.5024935 0.14396056 8.012590e-04
+#> 2 Weighted median 79 0.3870065 0.07717818 5.318417e-07
+#> 3 Inverse variance weighted 79 0.4459091 0.05898302 4.032020e-14
+#> 4 Simple mode 79 0.3401554 0.15276076 2.885059e-02
+#> 5 Weighted mode 79 0.3790910 0.10761091 7.173381e-04
Similarly to the outcome column, the exposure column in the output of
+mr()
combines the original exposure name with the exposure
+trait ID. This can be split into separate columns for the id and
+exposure name using the split_exposure function.
Users can convert log odds ratios into odds ratios with 95% +confidence intervals using:
+
+generate_odds_ratios(res)
+#> id.exposure id.outcome outcome
+#> 1 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 2 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 3 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 4 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> 5 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> exposure method nsnp b
+#> 1 Body mass index || id:ieu-a-2 MR Egger 79 0.5024935
+#> 2 Body mass index || id:ieu-a-2 Weighted median 79 0.3870065
+#> 3 Body mass index || id:ieu-a-2 Inverse variance weighted 79 0.4459091
+#> 4 Body mass index || id:ieu-a-2 Simple mode 79 0.3401554
+#> 5 Body mass index || id:ieu-a-2 Weighted mode 79 0.3790910
+#> se pval lo_ci up_ci or or_lci95 or_uci95
+#> 1 0.14396056 8.012590e-04 0.22033081 0.7846562 1.652838 1.246489 2.191653
+#> 2 0.07717818 5.318417e-07 0.23573724 0.5382757 1.472566 1.265842 1.713051
+#> 3 0.05898302 4.032020e-14 0.33030238 0.5615158 1.561909 1.391389 1.753328
+#> 4 0.15276076 2.885059e-02 0.04074434 0.6395665 1.405166 1.041586 1.895659
+#> 5 0.10761091 7.173381e-04 0.16817361 0.5900084 1.460956 1.183142 1.804004
It is sometimes useful to subset results on MR method, so that there +is one unique result for each exposure-outcome combination:
+
+subset_on_method(res)
+#> id.exposure id.outcome outcome
+#> 3 ieu-a-2 ieu-a-7 Coronary heart disease || id:ieu-a-7
+#> exposure method nsnp b
+#> 3 Body mass index || id:ieu-a-2 Inverse variance weighted 79 0.4459091
+#> se pval
+#> 3 0.05898302 4.03202e-14
The default is to subset on the IVW method when >1 SNP is +available and to use the Wald ratio method when a single SNP is +available. Users can specify which multi-SNP method to subset on.
+It is often useful to combine all results and study level
+characterists into a single dataframe or table, e.g. for sharing results
+with collaborators or when the user wishes to present all results in a
+single table or figure. This can be done using the
+combine_all_mrresults()
function:
+res <- mr(dat)
+het <- mr_heterogeneity(dat)
+plt <- mr_pleiotropy_test(dat)
+sin <- mr_singlesnp(dat)
+all_res <-
+ combine_all_mrresults(
+ res,
+ het,
+ plt,
+ sin,
+ ao_slc = TRUE,
+ Exp = TRUE,
+ split.exposure = FALSE,
+ split.outcome = TRUE
+ )
+head(all_res[, c(
+ "Method",
+ "outcome",
+ "exposure",
+ "nsnp",
+ "b",
+ "se",
+ "pval",
+ "intercept",
+ "intercept_se",
+ "intercept_pval",
+ "Q",
+ "Q_df",
+ "Q_pval",
+ "consortium",
+ "ncase",
+ "ncontrol",
+ "pmid",
+ "population"
+)])
This combines all results from mr()
,
+mr_heterogeneity()
, mr_pleiotropy_test()
and
+mr_singlesnp()
into a single dataframe. It also merges the
+results with outcome study level characteristics from the
+available_outcomes()
function, including sample size
+characteristics. If requested, it also exponentiates results (e.g. if
+the user wants log odds ratio converted into odds ratios with 95 percent
+confidence intervals).