Mendelian Randomization Analysis in Observational Epidemiology

Lee, Kwan; Lim, Chi-Yeon

doi:10.12997/jla.2019.8.2.67

J Lipid Atheroscler. 2019 Sep;8(2):67-77. English.
Published online Sep 17, 2019.
https://doi.org/10.12997/jla.2019.8.2.67

Review

Mendelian Randomization Analysis in Observational Epidemiology

Kwan Lee

,¹ and Chi-Yeon Lim

²

Author information

Author notes

Copyright and License

- ¹Department of Preventive Medicine, Dongguk University College of Medicine, Goyang, Korea.
- ²Department of Biostatistics, Dongguk University College of Medicine, Goyang, Korea.
Correspondence to Chi-Yeon Lim. Department of Biostatistics, Dongguk University College of Medicine, 123 Dongdae-ro, Goyang 10326, Korea. Email: rachun@hanmail.net

Received July 28, 2019; Accepted September 01, 2019.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Mendelian randomization (MR) in epidemiology is the use of genetic variants as instrumental variables (IVs) in non-experimental design to make causality of a modifiable exposure on an outcome or disease. It assesses the causal effect between risk factor and a clinical outcome. The main reason to approach MR is to avoid the problem of residual confounding. There is no association between the genotype of early pregnancy and the disease, and the genotype of an individual cannot be changed. For this reason, it results with randomly assigned case-control studies can be set by regressing the measurements. IVs in MR are used genetic variants for estimating the causality. Usually an outcome is a disease and an exposure is risk factor, intermediate phenotype which may be a biomarker. The choice of the genetic variable as IV (Z) is essential to a successful in MR analysis. MR is named ‘Mendelian deconfounding’ as it gives to estimate of the causality free from biases due to confounding (C). To estimate unbiased estimation of the causality of the exposure (X) on the clinically relevant outcome (Y), Z has the 3 core assumptions (A1-A3). A1) Z is independent of C; A2) Z is associated with X; and A3) Z is independent of Y given X and C; The purpose of this review provides an overview of the MR analysis and is to explain that using an IV is proposed as an alternative statistical method to estimate causal effect of exposure and outcome under controlling for a confounder.

Keywords

Mendelian randomization analysis; Genetic epidemiology; Instrument; Causality; Confounding factors

INTRODUCTION

One of an analytical approach to establish the causal relationships of observed associations between a modifiable exposure or potential risk factor and a clinically relevant outcome is Mendelian randomization.1 Randomized controlled trials (RCTs) generalize the observed results from a sample obtained through sampling to reduce bias in the planning phase and establish causal relationship. The main goal of statistics is to infer the characteristics of the populations of interest based on the information obtained from the sample. The rationale of this reasoning is the randomization concept like random assignment.2 However, this is difficult to do in observational epidemiological studies.

The ‘gold standard’ for empirically testing scientific hypotheses in clinical research is a RCT. This design involves randomly assigning different treatments to experimental units (e.g., individuals) of a population. In the simplest form, one ‘active group’ (e.g., intervention on a risk factor) is compared against a ‘control group.’3

Making inferences about causal effects based on data from observational study using instrumental variables (IVs) as genetic variants is known as Mendelian randomization. Mendelian randomization is named ‘Mendelian deconfounding’ because it aims to provide an estimate of causality free from biases caused by confounding.4 Mendelian randomization was first proposed by Gray and Whitley in 1986 and is a method of estimating the causality of a disease without prior randomization. Recently, Mendelian randomization is used to estimate causality of genetic products. Because genetic polymorphism is conceptually randomized, that is, the genotype of individual cannot be changed and interpersonal covariates are perfectly balanced among people with theoretically different polymorphisms. Thus, an approach to Mendelian randomization was created to theoretically eliminate all confounding about any relationship between genetic polymorphism and disease.5, 6

One area of recent research in which Mendelian randomization analysis have studied at coronary heart disease or cardiovascular disease7, 8, 9 and the causality between lipid metabolism and insulin resistance using Mendelian randomization with pleiotropy and lack of information on genotype and ethnicity.10 Mendelian randomization is a good design to control reverse causation and confounding, which are often encountered in epidemiological studies. In other words, it is a method to test or estimate the causal effect from observation data with confounding.10

In epidemiological studies, it is not easy to see if a particular disease and its potential risk factors influence it. Observational epidemiological studies are not easy to observe results while mediating on exposure, such as RCTs.

It is difficult to say that the genetic variation of an individual inherited from his or her parents is completely random, but it can be assumed that the genetic variation of a population is distributed randomly in exposure, which is a potential risk factor for a particular result (a disease of interest). Thus, the Mendelian random assignment method was attempted to address these limitations. In observational epidemiological studies, it always looks for a complete approach to assess causality with the goal of simulating random contrast tests in observational studies. One special example that can be observed with this complete randomization is the case of Mendelian randomization in genetic epidemiology.11

OBSERVATIONAL STUDY LIMITATION

Types of observational studies include cohort studies, case-control studies, and cross-sectional studies. From the point of view of Mendelian randomization studies, prospective cohort studies define disease of interest and observe what phenotype is formed when tracking down the population groups being studied. In addition, in the case-control study, the patients were divided into 2 groups according to their phenotype.

The limitations of the observational study are as follows: X is the cause of Y, but Y is the cause of X. The reverse causation is such that the measured value is used in the retrospective observation study, or difficulty in controlling confounders. There is also a limit to uncertainty about the sequential occurrence of diseases or the difficulty in following up the occurrence of events. In addition, for phenotypes with low incidence, statistical test may be difficult when the sample size is small.

MENDELIAN RANDOMIZATION

The method of using an IV has been used in economics,12 but it has also been introduced and used in recent epidemiological studies. Mendelian randomization studies using instruments can avoid the limitation of observational studies. Issues such as generalization, feasibility, ethics and high budget, which are the limitations of RCTs research, make Mendelian randomization more attractive.13, 14

Mendelian randomization is a method of testing causal effects from observational data with potential risk factors or allowing them to be evaluated in certain disease. Assuming that mate selection is not associated with the genotype. Since genetic types are randomly assigned when transmitted from parent to offspring in the meiosis process, population genetic distributions are typically associated with confounding factors that adversely affect observational epidemiologic studies. In this regard, Mendelian randomization can be considered a naturally RCT. Because a polymorphism is an IV, Mendelian randomization relies on previous genetic associations to provide good candidate genes for response to intermediate phenotype.

Mendelian randomization is the using genetic variants in non-experimental design to infer causality about an exposure effect on an outcome.3

Mendelian randomization may also be used to identify causality in epidemiological studies. Some examples of Mendelian randomization are shown in Table 1 with the exposure and outcome by type of exposure.

Table 1
Epidemiological evidence for causal relationships assessed by Mendelian randomization

Click for larger image
Click for full table
Download as Excel file

NOTATION AND GLOSSARY FOR MENDELIAN RANDOMIZATION

The word “exposure (X)” throughout this paper represents risk factors, an intermediate phenotype, which can be a biomarker or potential risk factor that may affect the outcome (Y, disease). IVs (Z) in Mendelian randomization analysis are used genetic variants for estimating the causality. The choice of the genetic variable as IV (Z) is essential to a successful in Mendelian randomization analysis. Mendelian randomization is named ‘Mendelian deconfounding’ as it gives to estimate of the causality free from biases due to confounding (C). Before introducing more detail concepts required to establish IVs, there are shown in some notation and glossary in Table 2.

Table 2
Some notation and glossary used in Mendelian randomization approach

Click for larger image
Click for full table
Download as Excel file

EXPLANATION OF VARIABLES

It is easy to understand some statistical terms for Mendelian randomization analysis: confounders and instrument variables. Let the IV is Z, the variable that represents the cause is X, and the variable Y is the outcome or disease. When it needs to confirm that Y is affected by X, a variable Z can be added to the model that affects only X and does not affect Y. Then X is a causal variable that really affects Y. The variable Z used at this time is called the IV. Fig. 1 is reflected the conceptual description of the Mendelian randomization with 3 core assumptions (A1, A2, and A3).

Fig. 1
Conceptual description of the Mendelian randomization. (A) A1: Z is independent of C. A2: Z is associated with X; (B) Z is independent of Y given X and C.

IVs

Mendelian randomization can be called random allocation using the genetic variants as IVs for assessing the causal effect of the exposure on the outcome.28, 29

IVs are associated with exposure, but are not associated with confounding variables of exposure-outcome associations, nor are causal pathway to outcome other than exposure.30

Let Z be the IV or instrument. It is associated with exposures (X), also called risk factors or intermediate phenotype but not associated with an outcome (Y) except through its association with exposures (X).

The choice of genetic variants as IV (Z) plays an important role to success Mendelian randomization.1 The genetic IV can be identified by searching published databases or reports estimating genetic associations with an exposure of interest. Genome-wide association study (GWAS) has been used for Mendelian randomization analysis over the past decade.31

SELECTION PROCESS AND CORE ASSUMPTIONS OF IVs

The most important consideration is the validation of genetic variants to be used as an IV in Mendelian randomization. Genotyping should be done to diagnose diseases caused by genetic variants and provide information on the development of treatments. Genotyping plays an important role in genetic analysis, which enables understanding and prevention of genetic diseases and development of personalized medicine. Before studying Mendelian randomization, it is necessary to choose the single nucleotide polymorphisms (SNPs) that define and affect diseases of interest. For example, a GWAS is a research method that collectively explores the genetic factors of disease and drug reactivity. The results of the GWAS research are reported in the National Human Genome Research Institute. It is listed in the GWAS catalog (https://www.ebi.ac.uk/gwas).32, 33

In order to be used as IVs (SNPs), the following 3 assumptions must be satisfied. The following assumptions can be validated by statistical methods, but above all, validity should be considered in using specific genetic variants as an IV.34 Fig. 1 shows this graphical concept.

A1) IV (Z, SNP) should be related to exposure (X, phenotype).
A2) IVs (Z, SNPs) should not be directly related to any confounding (C). That is, IV (Z, SNP) should be unrelated to any other risk factor/confounding (C) that may affect the outcome variable (Y, disease).
A3) IVs (Z, SNPs) should not have a direct association with outcome (Y, disease).

However, the A2 is often considered fulfilled because of the randomization of alleles to gametes.35

The A3 have also been proposed to address specific threats for statistical test.36

An overview of selected issues that can lead to the deviation of the 3 core assumptions (A1–A3) is provided in Table 3.1, 37

Table 3
Selected issues to inference from Mendelian randomization

Click for larger image
Click for full table
Download as Excel file

EXPLANATION OF IVs

Each of the 3 assumptions of an IV can be interpreted as follows;

A1) The genetic variants as an IV (Z) must be related to the exposure (X, phenotype). The stronger the association, the more likely the difference in the exposure depending on the subgroup (genotype) of the IV (Z).
A2) The distribution of demographic and confounding should not differ by genotype as IV (Z, SNPs).
A3) The conditional independence of the exposure (X, phenotype) and the confounding (C) of the exposure and results shall be satisfied.

STATISTICAL ANALYSIS IN GWAS

In GWAS, the objective of statistical analysis is to estimate and test. First of all, distribution and statistical test are carried out by using graphs, tables, and statistics for demographics and potential risk factors according to genotypes of SNP used as IVs. The odds ratio (OR) is estimated for the difference in the genetic frequency of SNP locus in GWAS, mainly for the patients and normal, or a large number of SNP locus are tested for the contingency table. If the environmental factors of the comparison target group differ, a regression model is used to describe environmental factors, such as age and gender, other than the SNP locus by introducing a statistical model. In addition, analysis of variance is performed to show an association between potential risk factors and IVs and various statistical analyses are carried out including the analysis of survival time and a machine learning model.

ASSOCIATION

Whether an exposure (X) and an outcome (Y) are interrelated, it needs to check the observational association. If there are predictive factors to confirm the association between X and Y, it can be analyzed by adjusting them as covariates. If a variable transformed is been made for a potential exposure (X) when checking the association with outcome (Y), then X used for Mendelian randomization analysis should be analyzed with the same variable transformed.

A relative risk is used in prospective cohort study and clinical trials as a measure of association and OR is used in retrospective case-control study.

R FUNCTIONS FOR MENDELIAN RANDOMIZATION

Yavorska and Staley38 described R program package available for the Mendelian randomization in R program version 3.0.1 or later using summarized data. This description expressed several methods for conducting Mendelian randomization with summarized data on summarized genetic associations between exposure and outcome was obtained from a consortium. This can be used to obtain causality using IV. Some useful functions are described in Table 4.39

Table 4
R functions for Mendelian randomization

Click for larger image
Click for full table
Download as Excel file

DISCUSSION

Although Mendelian randomization appears to be a perfect epidemiological approach to directly estimating the causal effect, but there are still limitations and assumptions in its application, as there are limitations in all research designs, including randomized controlled studies. The estimation and test of the magnitude of the causality is not always of interest or may not be available but still Mendelian randomization can be used to evaluate whether a causality. There are still the wider areas in genetic epidemiology and it should be considered methodological developments in Mendelian randomization using genetic variants as IVs.

CONCLUSION

A study using Mendelian randomization can be carried out in accordance with the procedure in Fig. 2 as a checklist. A well designed Mendelian randomization study which satisfies the assumptions often provides more reliable evidence than a conventional observational epidemiology study. But the findings must be careful when interpreting the results and compared with existing evidence from different study. It is very important to check whether the assumptions satisfying a Mendelian randomization analysis and a valid instrumental variable might not be available for any research because of lack of knowledge. Moreover, the relevance of the results for clinical decisions should be interpreted in light of other sources of evidence.

Fig. 2
Flow chart of process for Mendelian randomization analysis.

Notes

Funding:None.

Conflict of Interest:The authors have no conflicts of interest to declare.

References

1. Sekula P, Del Greco M F, Pattaro C, Köttgen A. Mendelian randomization as an approach to assess causality using observational data. J Am Soc Nephrol 2016;27:3253–3265.
  PubMed
  
  CrossRef
1. Lim CY, In J. Randomization in clinical studies. Korean J Anesthesiol 2019;72:221–232.
  PubMed
  
  CrossRef
1. Burgess S, Thompson SG. In: Mendelian randomization: methods for using genetic variants in causal estimation. Boca Raton (FL): CRC Press; 2015.
  CrossRef
1. Tobin MD, Minelli C, Burton PR, Thompson JR. Commentary: development of Mendelian randomization: from hypothesis test to ‘Mendelian deconfounding’. Int J Epidemiol 2004;33:26–29.
  PubMed
  
  CrossRef
1. Gray R, Wheatley K. How to avoid bias when comparing bone marrow transplantation with chemotherapy. Bone Marrow Transplant 1991;7 Suppl 3:9–12.
1. Wheatley K, Gray R. Commentary: Mendelian randomization--an update on its use to evaluate allogeneic stem cell transplantation in leukaemia. Int J Epidemiol 2004;33:15–17.
  PubMed
  
  CrossRef
1. Jansen H, Samani NJ, Schunkert H. Mendelian randomization studies in coronary artery disease. Eur Heart J 2014;35:1917–1924.
  PubMed
  
  CrossRef
1. Ference BA, Yoo W, Alesh I, Mahajan N, Mirowska KK, Mewada A, et al. Effect of long-term exposure to lower low-density lipoprotein cholesterol beginning early in life on the risk of coronary heart disease: a Mendelian randomization analysis. J Am Coll Cardiol 2012;60:2631–2639.
  PubMed
  
  CrossRef
1. Bu SY. Genetically mediated lipid metabolism and risk of insulin resistance: insights from Mendelian randomization studies. J Lipid Atheroscler. 2019
1. Smith GD. Mendelian randomization for strengthening causal inference in observational studies: application to gene × environment interactions. Perspect Psychol Sci 2010;5:527–545.
  PubMed
  
  CrossRef
1. Greenland S. An introduction to instrumental variables for epidemiologists. Int J Epidemiol 2000;29:722–729.
  PubMed
  
  CrossRef
1. Wright S. Appendix. In: Wright PG, editor. The tariff on animal and vegetable oils. New York (NY): Macmillan; 1928.
1. Goldberger AS. Structural equation methods in the social sciences. Econometrica 1972;40:979–1001.
  CrossRef
1. Ogbuanu IU, Zhang H, Karmaus W. Can we apply the Mendelian randomization methodology without considering epigenetic effects? Emerg Themes Epidemiol 2009;6:3.
  PubMed
  
  CrossRef
1. Almon R, Alvarez-Leon EE, Engfeldt P, Serra-Majem L, Magnuson A, Nilsson TK. Associations between lactase persistence and the metabolic syndrome in a cross-sectional study in the Canary Islands. Eur J Nutr 2010;49:141–146.
  PubMed
  
  CrossRef
1. Bech BH, Autrup H, Nohr EA, Henriksen TB, Olsen J. Stillbirth and slow metabolizers of caffeine: comparison by genotypes. Int J Epidemiol 2006;35:948–953.
  PubMed
  
  CrossRef
1. Chen L, Smith GD, Harbord RM, Lewis SJ. Alcohol intake and blood pressure: a systematic review implementing a Mendelian randomization approach. PLoS Med 2008;5:e52
  PubMed
  
  CrossRef
1. Mumby HS, Elks CE, Li S, Sharp SJ, Khaw KT, Luben RN, et al. Mendelian randomisation study of childhood BMI and early menarche. J Obes 2011;2011:180729
  PubMed
  
  CrossRef
1. von Hinke Kessler Scholder S, Smith GD, Lawlor DA, Propper C, Windmeijer F. In: Genetic markers as instrumental variables: an application to child fat mass and academic achievement. Bristol: The Centre for Market and Public Organisation; 2010.
  CrossRef
1. Ebrahim S, Davey Smith G. Mendelian randomization: can genetic epidemiology help redress the failures of observational epidemiology? Hum Genet 2008;123:15–33.
  PubMed
  
  CrossRef
1. Timpson NJ, Nordestgaard BG, Harbord RM, Zacho J, Frayling TM, Tybjærg-Hansen A, et al. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal Mendelian randomization. Int J Obes 2011;35:300–308.
  CrossRef
1. Kivimäki M, Lawlor DA, Smith GD, Kumari M, Donald A, Britton A, et al. Does high C-reactive protein concentration increase atherosclerosis? The Whitehall II Study. PLoS One 2008;3:e3013
  CrossRef
1. Allin KH, Nordestgaard BG, Zacho J, Tybjaerg-Hansen A, Bojesen SE. C-reactive protein and the risk of cancer: a mendelian randomization study. J Natl Cancer Inst 2010;102:202–206.
  PubMed
  
  CrossRef
1. Timpson NJ, Lawlor DA, Harbord RM, Gaunt TR, Day IN, Palmer LJ, et al. C-reactive protein and its role in metabolic syndrome: mendelian randomisation study. Lancet 2005;366:1954–1959.
  PubMed
  
  CrossRef
1. Trompet S, Jukema JW, Katan MB, Blauw GJ, Sattar N, Buckley B, et al. Apolipoprotein e genotype, plasma cholesterol, and cancer: a Mendelian randomization study. Am J Epidemiol 2009;170:1415–1421.
  PubMed
  
  CrossRef
1. Casas JP, Bautista LE, Smeeth L, Sharma P, Hingorani AD. Homocysteine and stroke: evidence on a causal link from mendelian randomisation. Lancet 2005;365:224–232.
  PubMed
  
  CrossRef
1. Ding W, Lehrer SF, Rosenquist JN, Audrain-McGovern J. The impact of poor health on academic performance: New evidence using genetic markers. J Health Econ 2009;28:578–597.
  PubMed
  
  CrossRef
1. Wehby GL, Ohsfeldt RL, Murray JC. ‘Mendelian randomization’ equals instrumental variable analysis with genetic instruments. Stat Med 2008;27:2745–2749.
  PubMed
  
  CrossRef
1. Thomas DC, Conti DV. Commentary: the concept of ‘Mendelian randomization’. Int J Epidemiol 2004;33:21–25.
  PubMed
  
  CrossRef
1. Burgess S, Small DS, Thompson SG. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res 2017;26:2333–2355.
  PubMed
  
  CrossRef
1. Pearl J. In: Causality. Cambridge: Cambridge University Press; 2000.
1. Nitsch D, Molokhia M, Smeeth L, DeStavola BL, Whittaker JC, Leon DA. Limits to causal inference based on Mendelian randomization: a comparison with randomized controlled trials. Am J Epidemiol 2006;163:397–403.
  PubMed
  
  CrossRef
1. Teumer A. Common methods for performing mendelian randomization. Front Cardiovasc Med 2018;5:51.
  PubMed
  
  CrossRef
1. Didelez V, Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res 2007;16:309–330.
  PubMed
  
  CrossRef
1. Lawlor DA, Harbord RM, Sterne JA, Timpson N, Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med 2008;27:1133–1163.
  PubMed
  
  CrossRef
1. Greco M FD, Minelli C, Sheehan NA, Thompson JR. Detecting pleiotropy in Mendelian randomisation studies with summary data and a continuous outcome. Stat Med 2015;34:2926–2940.
  PubMed
  
  CrossRef
1. Lim HR. In: Effect of blood lead concentration on attention deficit hyperactivity disorder in Korean children: a Mendelian randomization study [dissertation]. Seoul: Seoul National University; 2017. pp. 78.
1. Yavorska O, Staley J. Package ‘MendelianRandomization’ [Internet]. place unknown: CRAN; 2019 [accessed on 10 June 2019].
  Available from: https://cran.r-project.org/web/packages/MendelianRandomization/MendelianRandomization.pdf.
1. Yavorska OO, Burgess S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. Int J Epidemiol 2017;46:1734–1739.
  PubMed
  
  CrossRef