Published online Sep 17, 2019.
https://doi.org/10.12997/jla.2019.8.2.67
Mendelian Randomization Analysis in Observational Epidemiology
Abstract
Mendelian randomization (MR) in epidemiology is the use of genetic variants as instrumental variables (IVs) in non-experimental design to make causality of a modifiable exposure on an outcome or disease. It assesses the causal effect between risk factor and a clinical outcome. The main reason to approach MR is to avoid the problem of residual confounding. There is no association between the genotype of early pregnancy and the disease, and the genotype of an individual cannot be changed. For this reason, it results with randomly assigned case-control studies can be set by regressing the measurements. IVs in MR are used genetic variants for estimating the causality. Usually an outcome is a disease and an exposure is risk factor, intermediate phenotype which may be a biomarker. The choice of the genetic variable as IV (Z) is essential to a successful in MR analysis. MR is named ‘Mendelian deconfounding’ as it gives to estimate of the causality free from biases due to confounding (C). To estimate unbiased estimation of the causality of the exposure (X) on the clinically relevant outcome (Y), Z has the 3 core assumptions (A1-A3). A1) Z is independent of C; A2) Z is associated with X; and A3) Z is independent of Y given X and C; The purpose of this review provides an overview of the MR analysis and is to explain that using an IV is proposed as an alternative statistical method to estimate causal effect of exposure and outcome under controlling for a confounder.
INTRODUCTION
One of an analytical approach to establish the causal relationships of observed associations between a modifiable exposure or potential risk factor and a clinically relevant outcome is Mendelian randomization.1 Randomized controlled trials (RCTs) generalize the observed results from a sample obtained through sampling to reduce bias in the planning phase and establish causal relationship. The main goal of statistics is to infer the characteristics of the populations of interest based on the information obtained from the sample. The rationale of this reasoning is the randomization concept like random assignment.2 However, this is difficult to do in observational epidemiological studies.
The ‘gold standard’ for empirically testing scientific hypotheses in clinical research is a RCT. This design involves randomly assigning different treatments to experimental units (e.g., individuals) of a population. In the simplest form, one ‘active group’ (e.g., intervention on a risk factor) is compared against a ‘control group.’3
Making inferences about causal effects based on data from observational study using instrumental variables (IVs) as genetic variants is known as Mendelian randomization. Mendelian randomization is named ‘Mendelian deconfounding’ because it aims to provide an estimate of causality free from biases caused by confounding.4 Mendelian randomization was first proposed by Gray and Whitley in 1986 and is a method of estimating the causality of a disease without prior randomization. Recently, Mendelian randomization is used to estimate causality of genetic products. Because genetic polymorphism is conceptually randomized, that is, the genotype of individual cannot be changed and interpersonal covariates are perfectly balanced among people with theoretically different polymorphisms. Thus, an approach to Mendelian randomization was created to theoretically eliminate all confounding about any relationship between genetic polymorphism and disease.5, 6
One area of recent research in which Mendelian randomization analysis have studied at coronary heart disease or cardiovascular disease7, 8, 9 and the causality between lipid metabolism and insulin resistance using Mendelian randomization with pleiotropy and lack of information on genotype and ethnicity.10 Mendelian randomization is a good design to control reverse causation and confounding, which are often encountered in epidemiological studies. In other words, it is a method to test or estimate the causal effect from observation data with confounding.10
In epidemiological studies, it is not easy to see if a particular disease and its potential risk factors influence it. Observational epidemiological studies are not easy to observe results while mediating on exposure, such as RCTs.
It is difficult to say that the genetic variation of an individual inherited from his or her parents is completely random, but it can be assumed that the genetic variation of a population is distributed randomly in exposure, which is a potential risk factor for a particular result (a disease of interest). Thus, the Mendelian random assignment method was attempted to address these limitations. In observational epidemiological studies, it always looks for a complete approach to assess causality with the goal of simulating random contrast tests in observational studies. One special example that can be observed with this complete randomization is the case of Mendelian randomization in genetic epidemiology.11
OBSERVATIONAL STUDY LIMITATION
Types of observational studies include cohort studies, case-control studies, and cross-sectional studies. From the point of view of Mendelian randomization studies, prospective cohort studies define disease of interest and observe what phenotype is formed when tracking down the population groups being studied. In addition, in the case-control study, the patients were divided into 2 groups according to their phenotype.
The limitations of the observational study are as follows: X is the cause of Y, but Y is the cause of X. The reverse causation is such that the measured value is used in the retrospective observation study, or difficulty in controlling confounders. There is also a limit to uncertainty about the sequential occurrence of diseases or the difficulty in following up the occurrence of events. In addition, for phenotypes with low incidence, statistical test may be difficult when the sample size is small.
MENDELIAN RANDOMIZATION
The method of using an IV has been used in economics,12 but it has also been introduced and used in recent epidemiological studies. Mendelian randomization studies using instruments can avoid the limitation of observational studies. Issues such as generalization, feasibility, ethics and high budget, which are the limitations of RCTs research, make Mendelian randomization more attractive.13, 14
Mendelian randomization is a method of testing causal effects from observational data with potential risk factors or allowing them to be evaluated in certain disease. Assuming that mate selection is not associated with the genotype. Since genetic types are randomly assigned when transmitted from parent to offspring in the meiosis process, population genetic distributions are typically associated with confounding factors that adversely affect observational epidemiologic studies. In this regard, Mendelian randomization can be considered a naturally RCT. Because a polymorphism is an IV, Mendelian randomization relies on previous genetic associations to provide good candidate genes for response to intermediate phenotype.
Mendelian randomization is the using genetic variants in non-experimental design to infer causality about an exposure effect on an outcome.3
Mendelian randomization may also be used to identify causality in epidemiological studies. Some examples of Mendelian randomization are shown in Table 1 with the exposure and outcome by type of exposure.
Table 1
Epidemiological evidence for causal relationships assessed by Mendelian randomization
NOTATION AND GLOSSARY FOR MENDELIAN RANDOMIZATION
The word “exposure (X)” throughout this paper represents risk factors, an intermediate phenotype, which can be a biomarker or potential risk factor that may affect the outcome (Y, disease). IVs (Z) in Mendelian randomization analysis are used genetic variants for estimating the causality. The choice of the genetic variable as IV (Z) is essential to a successful in Mendelian randomization analysis. Mendelian randomization is named ‘Mendelian deconfounding’ as it gives to estimate of the causality free from biases due to confounding (C). Before introducing more detail concepts required to establish IVs, there are shown in some notation and glossary in Table 2.
Table 2
Some notation and glossary used in Mendelian randomization approach
EXPLANATION OF VARIABLES
It is easy to understand some statistical terms for Mendelian randomization analysis: confounders and instrument variables. Let the IV is Z, the variable that represents the cause is X, and the variable Y is the outcome or disease. When it needs to confirm that Y is affected by X, a variable Z can be added to the model that affects only X and does not affect Y. Then X is a causal variable that really affects Y. The variable Z used at this time is called the IV. Fig. 1 is reflected the conceptual description of the Mendelian randomization with 3 core assumptions (A1, A2, and A3).
Fig. 1
Conceptual description of the Mendelian randomization. (A) A1: Z is independent of C. A2: Z is associated with X; (B) Z is independent of Y given X and C.
IVs
Mendelian randomization can be called random allocation using the genetic variants as IVs for assessing the causal effect of the exposure on the outcome.28, 29
IVs are associated with exposure, but are not associated with confounding variables of exposure-outcome associations, nor are causal pathway to outcome other than exposure.30
Let Z be the IV or instrument. It is associated with exposures (X), also called risk factors or intermediate phenotype but not associated with an outcome (Y) except through its association with exposures (X).
The choice of genetic variants as IV (Z) plays an important role to success Mendelian randomization.1 The genetic IV can be identified by searching published databases or reports estimating genetic associations with an exposure of interest. Genome-wide association study (GWAS) has been used for Mendelian randomization analysis over the past decade.31
SELECTION PROCESS AND CORE ASSUMPTIONS OF IVs
The most important consideration is the validation of genetic variants to be used as an IV in Mendelian randomization. Genotyping should be done to diagnose diseases caused by genetic variants and provide information on the development of treatments. Genotyping plays an important role in genetic analysis, which enables understanding and prevention of genetic diseases and development of personalized medicine. Before studying Mendelian randomization, it is necessary to choose the single nucleotide polymorphisms (SNPs) that define and affect diseases of interest. For example, a GWAS is a research method that collectively explores the genetic factors of disease and drug reactivity. The results of the GWAS research are reported in the National Human Genome Research Institute. It is listed in the GWAS catalog (https://www.ebi.ac.uk/gwas).32, 33
In order to be used as IVs (SNPs), the following 3 assumptions must be satisfied. The following assumptions can be validated by statistical methods, but above all, validity should be considered in using specific genetic variants as an IV.34 Fig. 1 shows this graphical concept.
A1) IV (Z, SNP) should be related to exposure (X, phenotype).
A2) IVs (Z, SNPs) should not be directly related to any confounding (C). That is, IV (Z, SNP) should be unrelated to any other risk factor/confounding (C) that may affect the outcome variable (Y, disease).
A3) IVs (Z, SNPs) should not have a direct association with outcome (Y, disease).
However, the A2 is often considered fulfilled because of the randomization of alleles to gametes.35
The A3 have also been proposed to address specific threats for statistical test.36
An overview of selected issues that can lead to the deviation of the 3 core assumptions (A1–A3) is provided in Table 3.1, 37
Table 3
Selected issues to inference from Mendelian randomization
EXPLANATION OF IVs
Each of the 3 assumptions of an IV can be interpreted as follows;
A1) The genetic variants as an IV (Z) must be related to the exposure (X, phenotype). The stronger the association, the more likely the difference in the exposure depending on the subgroup (genotype) of the IV (Z).
A2) The distribution of demographic and confounding should not differ by genotype as IV (Z, SNPs).
A3) The conditional independence of the exposure (X, phenotype) and the confounding (C) of the exposure and results shall be satisfied.
STATISTICAL ANALYSIS IN GWAS
In GWAS, the objective of statistical analysis is to estimate and test. First of all, distribution and statistical test are carried out by using graphs, tables, and statistics for demographics and potential risk factors according to genotypes of SNP used as IVs. The odds ratio (OR) is estimated for the difference in the genetic frequency of SNP locus in GWAS, mainly for the patients and normal, or a large number of SNP locus are tested for the contingency table. If the environmental factors of the comparison target group differ, a regression model is used to describe environmental factors, such as age and gender, other than the SNP locus by introducing a statistical model. In addition, analysis of variance is performed to show an association between potential risk factors and IVs and various statistical analyses are carried out including the analysis of survival time and a machine learning model.
ASSOCIATION
Whether an exposure (X) and an outcome (Y) are interrelated, it needs to check the observational association. If there are predictive factors to confirm the association between X and Y, it can be analyzed by adjusting them as covariates. If a variable transformed is been made for a potential exposure (X) when checking the association with outcome (Y), then X used for Mendelian randomization analysis should be analyzed with the same variable transformed.
A relative risk is used in prospective cohort study and clinical trials as a measure of association and OR is used in retrospective case-control study.
R FUNCTIONS FOR MENDELIAN RANDOMIZATION
Yavorska and Staley38 described R program package available for the Mendelian randomization in R program version 3.0.1 or later using summarized data. This description expressed several methods for conducting Mendelian randomization with summarized data on summarized genetic associations between exposure and outcome was obtained from a consortium. This can be used to obtain causality using IV. Some useful functions are described in Table 4.39
Table 4
R functions for Mendelian randomization
DISCUSSION
Although Mendelian randomization appears to be a perfect epidemiological approach to directly estimating the causal effect, but there are still limitations and assumptions in its application, as there are limitations in all research designs, including randomized controlled studies. The estimation and test of the magnitude of the causality is not always of interest or may not be available but still Mendelian randomization can be used to evaluate whether a causality. There are still the wider areas in genetic epidemiology and it should be considered methodological developments in Mendelian randomization using genetic variants as IVs.
CONCLUSION
A study using Mendelian randomization can be carried out in accordance with the procedure in Fig. 2 as a checklist. A well designed Mendelian randomization study which satisfies the assumptions often provides more reliable evidence than a conventional observational epidemiology study. But the findings must be careful when interpreting the results and compared with existing evidence from different study. It is very important to check whether the assumptions satisfying a Mendelian randomization analysis and a valid instrumental variable might not be available for any research because of lack of knowledge. Moreover, the relevance of the results for clinical decisions should be interpreted in light of other sources of evidence.
Fig. 2
Flow chart of process for Mendelian randomization analysis.
Funding:None.
Conflict of Interest:The authors have no conflicts of interest to declare.
References
-
Burgess S, Thompson SG. In: Mendelian randomization: methods for using genetic variants in causal estimation. Boca Raton (FL): CRC Press; 2015.
-
-
Gray R, Wheatley K. How to avoid bias when comparing bone marrow transplantation with chemotherapy. Bone Marrow Transplant 1991;7 Suppl 3:9–12.
-
-
Bu SY. Genetically mediated lipid metabolism and risk of insulin resistance: insights from Mendelian randomization studies. J Lipid Atheroscler. 2019
-
-
Wright S. Appendix. In: Wright PG, editor. The tariff on animal and vegetable oils. New York (NY): Macmillan; 1928.
-
-
Goldberger AS. Structural equation methods in the social sciences. Econometrica 1972;40:979–1001.
-
-
von Hinke Kessler Scholder S, Smith GD, Lawlor DA, Propper C, Windmeijer F. In: Genetic markers as instrumental variables: an application to child fat mass and academic achievement. Bristol: The Centre for Market and Public Organisation; 2010.
-
-
Timpson NJ, Nordestgaard BG, Harbord RM, Zacho J, Frayling TM, Tybjærg-Hansen A, et al. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal Mendelian randomization. Int J Obes 2011;35:300–308.
-
-
Kivimäki M, Lawlor DA, Smith GD, Kumari M, Donald A, Britton A, et al. Does high C-reactive protein concentration increase atherosclerosis? The Whitehall II Study. PLoS One 2008;3:e3013
-
-
Pearl J. In: Causality. Cambridge: Cambridge University Press; 2000.
-
-
Lim HR. In: Effect of blood lead concentration on attention deficit hyperactivity disorder in Korean children: a Mendelian randomization study [dissertation]. Seoul: Seoul National University; 2017. pp. 78.
-
-
Yavorska O, Staley J. Package ‘MendelianRandomization’ [Internet]. place unknown: CRAN; 2019 [accessed on 10 June 2019].Available from: https://cran.r-
project.org/web/packages/MendelianRandomization/MendelianRandomization.pdf.
-