Machine learning based multi-modal prediction of future decline toward Alzheimer’s disease: An empirical study

Batuhan K. Karaman; Elizabeth C. Mormino; Mert R. Sabuncu; for the Alzheimer’s Disease Neuroimaging Initiative

doi:10.1371/journal.pone.0277322

Abstract

Alzheimer’s disease (AD) is a neurodegenerative condition that progresses over decades. Early detection of individuals at high risk of future progression toward AD is likely to be of critical significance for the successful treatment and/or prevention of this devastating disease. In this paper, we present an empirical study to characterize how predictable an individual subjects’ future AD trajectory is, several years in advance, based on rich multi-modal data, and using modern deep learning methods. Crucially, the machine learning strategy we propose can handle different future time horizons and can be trained with heterogeneous data that exhibit missingness and non-uniform follow-up visit times. Our experiments demonstrate that our strategy yields predictions that are more accurate than a model trained on a single time horizon (e.g. 3 years), which is common practice in prior literature. We also provide a comparison between linear and nonlinear models, verifying the well-established insight that the latter can offer a boost in performance. Our results also confirm that predicting future decline for cognitively normal (CN) individuals is more challenging than for individuals with mild cognitive impairment (MCI). Intriguingly, however, we discover that prediction accuracy decreases with increasing time horizon for CN subjects, but the trend is in the opposite direction for MCI subjects. Additionally, we quantify the contribution of different data types in prediction, which yields novel insights into the utility of different biomarkers. We find that molecular biomarkers are not as helpful for CN individuals as they are for MCI individuals, whereas magnetic resonance imaging biomarkers (hippocampus volume, specifically) offer a significant boost in prediction accuracy for CN individuals. Finally, we show how our model’s prediction reveals the evolution of individual-level progression risk over a five-year time horizon. Our code is available at https://github.com/batuhankmkaraman/mlbasedad.

Citation: Karaman BK, Mormino EC, Sabuncu MR, for the Alzheimer’s Disease Neuroimaging Initiative (2022) Machine learning based multi-modal prediction of future decline toward Alzheimer’s disease: An empirical study. PLoS ONE 17(11): e0277322. https://doi.org/10.1371/journal.pone.0277322

Editor: Kim Han Thung, University of North Carolina at Chapel Hill, UNITED STATES

Received: July 4, 2022; Accepted: October 24, 2022; Published: November 16, 2022

Copyright: © 2022 Karaman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data used in preparation of this article were obtained from the Alzheimer’s Disease euroimaging Initiative (ADNI) database http://adni.loni.usc.edu. As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf Others can use the ADNI website to access and download the data by submitting a data access request to ADNI. The authors had no special privileges and any researcher should be able to obtain the data the same way as them.

Funding: This study was funded by the NIH National Library of Medicine (grant number R01LM012719), the NIH National Institute on Aging (grant number R01AG053949), the National Science Foundation NeuroNex (grant number 1707312), and the National Science Foundation CAREER (grant number 1748377), all awarded to MRS.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Alzheimer’s Disease (AD) is the most common type of dementia among the elder population, accounting for nearly 70% of all dementia patients [1] and ranking as the seventh leading cause of death in the United States [2]. Many mechanisms in the development of AD have been uncovered by decades of experimental and clinical studies [3–5], but the jigsaw is still unsolved. In the realm of AD, public and private databases [6–10] serve as important resources for the application of machine learning algorithms that help characterize disease heterogeneity [11], guide therapy [12], and develop and evaluate potential treatments [13, 14]. An area where machine learning can play a crucial role is the prediction of future clinical decline at the individual level, which can inform clinical and other life decisions.

Clinically, in the context of Alzheimer’s disease, individuals are classically grouped into one of the following three stages: cognitively normal (CN), mild cognitive impairment (MCI), and AD dementia. MCI is considered a high-risk, transitionary stage between healthy aging and dementia. Future clinical decline toward dementia is considered to be more predictable in those with MCI than in CN individuals [15, 16]. As our results further corroborate, CN-to-MCI conversion is inherently a more difficult prediction problem than MCI-to-AD conversion. The vast majority of the published studies dealing with individual-level future decline predictions with machine learning focus on early detection of MCI-to-AD conversion [17].

There is growing literature showing factors, including certain biomarkers, that predict progression from CN to MCI [18, 19]. This work has converged to suggest that the risk associated with specific factors is relatively small, and requires a long follow-up to observe the effect. Furthermore, papers tackling the CN-to-MCI conversion prediction problem with machine learning have been relatively limited [20, 21]. Thus, there is a need to understand what combination of variables can yield accurate individual-level predictions; and whether the significance of the variables changes as a function of disease stage (e.g., CN or MCI at baseline).

Many prior studies have primarily focused on building models that predict future conversion within a specific time horizon, e.g., three years [22, 23]. Although some of these studies test their modeling strategy for variable follow-up years, they do this by training new models for each time horizon. Studies that utilize survival (event-time) models can, in theory, make predictions for any future time-point [24], yet, they need to make strong assumptions about the evolution of the underlying hazard function, which might potentially limit performance.

In this work, we rely on the neural network (deep learning) framework, which gives us the flexibility to explore the effect of various modeling choices, namely nonlinearity and predicting arbitrary time horizons, while holding other design parameters constant. We implement three different classifiers, two of which are trained to predict the clinical status (CN, MCI, or AD) at a single time point in the future. The first of these two models is linear and called the “Linear Single-year Model (LSM)”. The other one is nonlinear and referred to as the “Nonlinear Single-year Model (NSM)”. In the third model, we employ a nonlinear architecture and modify it to make it capable of predicting the clinical status at any time point in the future. We refer to this as the “Nonlinear Multi-year Model (NMM).” Comparing these models allow us to deduce the predictive importance of nonlinear models and accounting for different time horizons. For instance, our results verify that a linear model (LSM) can yield high-quality predictions for the relatively easy MCI-to-AD conversion prediction problem, whereas higher capacity nonlinear models are needed for making more accurate predictions about the more challenging task of predicting CN-to-MCI conversion.

Our analyses further offer some novel insights. In CN individuals, predicting who will convert to MCI within a year is easier than predicting for a 5-year time window. For individuals with MCI, however, the situation is reversed. It is harder to predict who will progress to AD in the shorter term. We train our models to handle arbitrary missingness patterns in the input data, which in turn, allows us to interrogate the predictive value of different data types. For example, our results suggest that the molecular biomarkers we consider in our study are very helpful in the MCI stage, but not as much in CN individuals. On the other hand, structural magnetic resonance imaging (MRI) biomarkers (hippocampus volume, specifically) offer a significant performance boost for CN-to-MCI conversion but not MCI-to-AD conversion. Finally, we present our model’s prediction of future conversion risk as a continuous function of time, which reveals different trajectories.

Materials and methods

In this section; we first present the dataset, which was derived from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database [25]. We then discuss important implementation details that were critical in handling missingness, and unbalanced classes at different time points.

Dataset

All participants used in this work are from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. ADNI aims to evaluate the structure and function of the brain across different disease states and uses clinical measures and biomarkers to monitor disease progression. We select the participants who did not have clinical AD at the baseline (screening) visit and had at least one follow-up diagnosis within the next five years (n = 1404). We excluded the CN baseline participants who converted to AD in the next five years because they are very few (n = 6), the CN baseline participants who converted to a later stage but reverted back to an earlier stage (n = 23), and the MCI baseline participants who either were diagnosed as CN in a later follow-up year (n = 87) or converted to AD and reverted back (n = 18) since these subjects might have been diagnosed incorrectly at some point. Including these individuals did not alter our main conclusions. After the exclusions, we are left with 1404 participants. We note that all individuals were either CN or MCI at baseline. Table 1 lists summary statistics for the participants; including sex, age, number of years of education completed, count of Apolipoprotein E4 (APOE4) allele, Clinical Dementia Rating (CDR), and Mini Mental State Examination (MMSE) scores at baseline.

Download:

Table 1. Summary statistics of the participants at baseline.

https://doi.org/10.1371/journal.pone.0277322.t001

A critical aspect of the data, as is common in many real-world longitudinal studies, is that there are missing follow-up visits, with imperfect timings, and many subjects drop out of the study before the planned completion. Table 2 shows the number of available subjects in each diagnostic group for annual follow-up visits. We note that Table 2 can be used to infer the number of subjects progressing from one stage to another during follow-up years of interest. In Table 2, and in all subsequent analyses, any subject who progressed (from CN to MCI, or from MCI to AD) before dropping out of the study was considered to remain MCI or AD until year 5. Non-converter subjects without a visit in a certain follow-up year are not used for either training or testing in that follow-up year. Some subject groups have different follow-up visit schedules. For example, in ADNI-2 and ADNI-3, CN baseline participants are only clinically evaluated every other follow-up year. Therefore years 1 and 3 have fewer CN diagnoses than years 2 and 4, respectively. ADNI includes a limited number of subjects who have been monitored for more than five years. However, due to the very limited number of visits beyond 5 years, we excluded these timepoints from our analyses.

Download:

Table 2. The number of available subjects in each diagnostic group for annual follow-up visits.

https://doi.org/10.1371/journal.pone.0277322.t002

Input features

We use the clinical data and biomarkers collected at baseline as our input features. Clinical data includes subject demographics (age, gender, number of years of education completed, ethnicity, race, marital status), genotype (number of APOE4 alleles), clinical assessments (clinical dementia rating, or CDR; Activities of Daily Living, or FAQ; Everyday Cognition, or ECog), cognitive assessments (Mini-Mental State Exam, or MMSE; Alzheimer’s Disease Assessment Scale, or ADAS-Cog; Montreal Cognitive Assessment, or MoCA; Rey Auditory Verbal Learning Test Trials 1–6; Logical Memory Delayed Recall; Trail Making Test Part B; Digit Symbol Substitution, Digit and Trails B versions of Preclinical Alzheimer’s Cognitive Composite score [26, 27]), and the baseline diagnosis (CN or MCI). The biomarkers are Cerebrospinal Fluid (CSF) measurements [28, 29] (Amyloid-Beta 1–42; Total Tau, or T-Tau-; Phosphorylated Tau, or P-Tau), Magnetic Resonance Imaging (MRI) volume measurements [30, 31] (Ventricles; Hippocampus; WholeBrain; Entorhinal; Fusiform; MidTemp; Intracranial Volume, or ICV; all computed using the FreeSurfer software [32–51]), and Positron Emission Tomography (PET) standardized uptake value ratio (SUVR) scores (for following tracers: Fluorine-18-Fluorodeoxyglucose, or FDG; Florbetapir, or AV45; Pittsburgh Compound, or PIB) [52, 53]. We note that CSF, FDG, and PIB biomarkers are referred to as molecular biomarkers. Furthermore, we employ single, global PET SUVR measurements instead of regional values.

The regional volume measurements derived from the MRI scans were computed, quality controlled and publicly made available by UCSF researchers. In this pipeline, the images are processed to implement the following steps: Talairach transform computation, intensity normalization, skull stripping, creation of the white-matter and pial surfaces, segmentation of the gray and white matter volumetric regions of interest and creation of the cortical parcellation as described in [54]. Note that we input ICV as a separate feature, instead of normalizing other volumetric MRI measurements with it.

The ADNI study consists multiple phases (1, GO, 2, and 3). Each phase implemented a slightly different data acquisition protocol. Additionally, as mentioned above, follow-up data collection was also heterogeneous, with missing visits and non-uniform visit intervals. The degree of missingness for the different baseline data modalities is shown in Table 3 for the 1404 participants we use. Rather than dropping out the subjects with incomplete baseline data, we substitute placeholder values for their missing features and keep them in our dataset. Our substitution procedure consists of two parts. First, we record the binary missingness mask for the feature set. Each participant has their own binary missingness mask indicating what variable is observed or not for that particular individual. Then, following prior work [55], we perform mode substitution for missing categorical variables (sex, ethnicity, race, marital status, APOE4), and mean substitution for missing numerical variables. It is true that when the class labels are unbalanced, these substituted values will be biased by the values in the majority class. However, we would like to emphasize that we consider these substituted values as dummy placeholders. In other words, we use mode/mean values solely to make sure that the substituted values are within an appropriate range and thus the numerical optimization is not compromised. By concatenating the missingness mask to the feature vector, we expect the model to learn to treat these placeholders appropriately. To prevent any information leakage, we substitute the mode and mean values computed in the non-missing portion of the training set as placeholders for the missing values in training, validation, and test sets. We compute a single mode/mean value for each feature, weighing all non-missing values in the training set equally.

Download:

Table 3. The degree of missingness (%) in different baseline data modalities for two patient groups.

https://doi.org/10.1371/journal.pone.0277322.t003

The categorical variables except the baseline diagnosis are one-hot encoded (i.e., are represented with dummy variables encoding presence) and numerical variables are z-score normalized in the last step of feature processing. We note that the z-score normalization is first performed on training data and mean and variance values are saved to be used for validation and test data later on, which is similar to what we do in the second step of the imputation procedure.

We have six discrete (categorical) variables, which are sex (encoded as a one-hot vector for either female or male), ethnicity (encoded as a one-hot vector for either Hispanic/Latino or not Hispanic/Latino), race (encoded as a one-hot vector for one of the following: Asian, Black, Hawaiian/Other PI, Indian/Alaskan, More than one, White), marital status (encoded as a one-hot vector for one of the following: divorced, married, never married, widowed), number of APOE4 alleles (encoded as 0, 1 or 2 copies of the E4 allele), and baseline diagnosis (a scalar that is 0 for CN and 1 for MCI). Real-valued variables are the number of years of education completed, clinical test scores, cognitive assessments, and biomarker values. All real-valued (numerical) features are scalars except for the clinical assessment of ECog, and the cognitive assessments of ADAS-Cog, and Rey Auditory Verbal Learning Test Trials 1–6, which are vectors with dimensionalities 14, 3, and 4, respectively. In total, we have 45 real-valued features. We note that we also compute a binary missingness mask for the input feature set. In the mask, there is no entry for the baseline diagnosis, since that variable has no missingness. Therefore, the binary missingness mask has a dimensionality of 50. Concatenating the categorical features, numerical features, and binary missingness mask yields a feature vector of length 113.

Models

We are interested in the prediction of an individual’s future diagnostic status (CN, MCI, or AD) based on the input features at a baseline visit. A large number of studies have looked at this question as a classification problem, often at a single follow-up time, e.g., three years after the baseline. However, this formulation has two drawbacks. First, many subjects might drop out of the study before that follow-up time, which means these subjects cannot be used for training. Secondly, this approach groups together subjects who convert after the intended follow-up time, with those who are stable through the study. To distinguish these two subject groups, one would need to train a new model from scratch corresponding to a different follow-up time. An alternative approach that addresses these issues involves survival modeling [56]. However, these methods require strong assumptions about the underlying hazard function that can constrain the model’s performance.

All our models follow the neural network architecture template depicted in Fig 1, which was designed based on optimizing empirical performance on validation data in a single split. The output is a length-3 probability vector, computed by a soft-max layer, that corresponds to the individual’s CN, MCI, and AD probabilities at the future time point. In the single-year models, the prediction is for a fixed follow-up time and thus time-to-follow-up is not an input feature. Therefore, the single-year models have an input layer width of 113, and the input layer width of the multi-year model is 114. We train a separate single-year model for each time horizon. The linear single-year model, LSM, is made up of linear (fully connected) layers, whereas its nonlinear counterpart, NSM, contains nonlinear activation functions, which are elementwise rectified linear units, or ReLUs, between linear layers. The most flexible model, the nonlinear multi-year model, or NMM, accepts the time-to-follow-up (in months) as an input feature and computes the output corresponding to that input value. Thus, we train a single NMM for all follow-up times. The NMM can be viewed as a family of models, parameterized by the follow-up time. We note that all three models have roughly the same number of learnable parameters.

Download:

Fig 1. Feed-forward, fully-connected neural network architecture.

Nonlinear models have rectified linear units (ReLU) between layers. Final layer implements a soft-max. L_l: Number of neurons in layer l. Input features include following. Demog.: Deomographics. Clinical Assess.: Clinical Assessments. Cog. Assess.: Cognitive Assessments. Baseline DX: Baseline Diagnosis. prob.: probability.

https://doi.org/10.1371/journal.pone.0277322.g001

We also experimented with two alternate models. The alternative single-year linear model was a standard linear regression model, implemented as a single fully connected layer neural network, with L2 penalty (weight decay) on the weights (coefficients). This is equivalent to a ridge regression approach. As the results presented in the Supplementary Material demonstrate this model performs no better than the LSM model described above. The second alternative is a slight modification of the NMM model, where the input time-to-follow-up feature is encoded as the closest annual visit time. This model (results in S1 and S2 Tables) performed very close to the NMM model we present here.

Loss function

We use categorical cross-entropy loss to train our neural networks: (1) where N is the number of training datapoints, 〈, 〉 is the inner product operator, y_i is the one-hot encoded vector for the ground truth label of sample i (i.e., is a dummy variable vector encoding presence), and is the probability vector of the sample point i that is computed by the classifier. The expression in Eq (1) represents the average of the losses across the entire training dataset, which implies that each sample has the same weight. This is undesirable in unbalanced problems due to the fact that the contribution of the majority class to the loss function will be higher compared to the minority class.

As can be seen in Table 2, there are two types of imbalance we need to consider.

The distribution of class labels varies significantly over the years. The number of CN baseline participants who convert to MCI is smaller than the number of non-converters over the five-year period, but the relative difference shrinks in later years. For participants who are MCI at baseline the situation is more drastic. Those who convert to AD represent a small minority in the early years, yet MCI-to-AD converters are the majority at the 5-year mark.
The number of available clinical labels decreases with each follow-up year, since individuals drop out of the longitudinal study.

Naively using the loss term in Eq (1) would encourage certain types of errors. For example, the model would care less about accurately classifying CN-to-MCI converters, particularly in earlier years. This would affect all three of our models. The second imbalance factor, on the other hand, would exclusively impact the performance of NMM. With the loss in Eq (1), NMM would pay less attention to later follow-up years than earlier ones which means the performance of NMM would suffer in longer time horizons. We note that the LSM and NSM approaches are not affected by this because a separate model is trained for each follow-up year.

There are well-established ways of addressing imbalance issues. Under-sampling the majority class, over-sampling the minority class, and using re-weighted loss functions are the most popular options. In this work, we employ a loss re-weighting scheme. In this approach, the model is penalized more for an error in the minority class than an error in the majority class, using sample-level weights. The loss function we use in this work is (2) where w_i is the weight of the sample point i. We propose a weighting scheme that accounts for both sources of imbalance we discussed above. Although converter CN baseline participants and non-converter MCI baseline participants belong to the same ground-truth class, we do not weigh those samples equally as they represent different prediction scenarios. Therefore, for a given follow-up year, we consider four possible categories of participants: CN baseline non-converters, CN baseline converters, MCI baseline non-converters, and MCI baseline converters. In the first step, we compute the weights for each category as one over the size of each group. In step 2, we scale these weights so that the total weight of each follow-up year is equal. By doing so, we are addressing the second imbalance.

In the single-year models, the ground-truth clinical status, e.g. for one year after the baseline, was the diagnosis made at the visit corresponding to that follow-time. We note that the timing of this visit is typically not exact and can deviate by several months. For example, a planned 1-year follow-up could have occurred around 15 months after the baseline visit. Thus, the ground-truth labels can be viewed as noisy. On the other hand, for the NMM model, since the follow-time is not fixed and is treated as an input variable (coded in months), the ground-truth label can be viewed as more accurate. That said, as we described above, we implemented a version of the NMM model that accepts the rounded annualized visit times as input (see Supplementary Material) and we observed no meaningful difference in results.

Experimental details

We implemented a randomized, diagnosis stratified 80–20 split of the data into train-test sets. We repeated this 80–20 split 200 times and all presented results are averaged over these repeats. For each 80–20 split, we also conducted a 5-fold cross-validation on the train sets, where the validation loss was used for early stopping. For each cross-validation fold and modeling choice, we trained 5 different models with different random initializations. Thus, for each test case, the final prediction is computed as the average of the 25 model predictions (5 cross-validation and 5 random initializations each). For the NMM model, we ensure that all longitudinal follow-up data for a participant is in the same partition. All three of our models use the same data splits.

The model architecture and hyperparameter values are fixed for NMM and single-year models, with the only difference being the missing input neuron of Δt in LSM and NSM. These choices were manually determined based on inspecting the validation loss in a single split. The architecture, illustrated in Fig 1, has 3 hidden layers with a width of 128, followed by 5 hidden layers of width 64 and 2 hidden layers of width 32. We perform early stopping during training based on the validation loss. We employ an L2 penalty loss on the weights and biases, with a weight of 10⁻⁶ in each hidden layer. We implement dropout after each hidden layer with a rate of 0.2. Our optimizer is Adam with a learning rate of 10⁻⁵. We use the softmax activation function at the output layer.

In order to explore the influence of the network architecture on results, we conduct two more experiments where we use the same training strategy, activation functions and hyperparameters as NMM, however we modify the architecture that is illustrated in Fig 1. The first experiment uses an elementary 3-layer nonlinear architecture with fewer parameters than NMM. The second experiment is a computationally expensive practice where we optimize the nonlinear architecture via a grid search over depth and width hyperparameters in each one of the 200 train/test splits. In this experiment, each test set has its optimal architecture, identified by the hyperparameter values that yield the best performance on a validation set. Results of both experiments are presented in S1 Fig. We note that the overall trend and pattern of the NMM’s results hold consistent across these architectural choices. The 3-layer model’s performance is slightly worse and the optimized architecture results are the best, as expected. The 11-layer model we present in Fig 1 under-performs slightly compared to the optimized architectures. We note that the 11-layer model was manually designed to optimize performance in a single train/validation split.

We analyze prediction accuracy in two different patient groups: CN at baseline and MCI at baseline. Therefore, each result we show has two parts, one corresponding to the CN-to-MCI conversion task and the other to the MCI-to-AD conversion task. In each task, clinical conversion is considered a positive event. For example, for the CN-to-MCI conversion task, a true positive sample refers to the subject progressing to MCI and the model predicting this correctly. Accordingly, the true positive rate is defined as the ratio of true positive samples against all converter subjects, and the false positive rate is the ratio of false positives against all stable subjects.

Due to the heavily unbalanced nature of the data, which can be seen in Table 2, we use the receiver operating characteristic (ROC) curve to inspect the performance of our models. Although there are no subjects who convert from CN to AD or from MCI to CN in our dataset, we do not implement any mechanism to prevent our models from making such predictions. Therefore, both CN-to-MCI conversion and MCI-to-AD conversion tasks are multi-class problems for our models. There are two different types of ROC analyses for multi-class problems: one-versus-one analysis and one-versus-rest analysis. As we demonstrate with our results, our models capture the disease progression dynamics sufficiently that both analyses give nearly identical ROC curves. In order words, the predicted AD probability for CN baseline subjects and the predicted CN probability for MCI baseline subjects are close to 0. Thus, we only share one-vs-rest results where the positive class for CN-to-MCI conversion is MCI and for MCI-to-AD conversion it is AD.

The area under the ROC curve (ROC AUC) is a scalar that summarizes the overall performance of a classifier. Our data has a time horizon of five follow-up years and we evaluate against annual diagnoses. For each of our three models, we compute an ROC AUC value corresponding to each follow-up year and each baseline group (CN baseline and MCI baseline). Therefore, each model has five ROC AUC values associated with each baseline group. To statistically compare the ROC AUC values achieved by two different models, we implement a pairwise permutation testing strategy, yielding a p-value for the null hypothesis that the two models’ predictions are indistinguishable. Our test statistic is the difference between the mean ROC AUC values (averaged over the annual follow-ups) for the two models in a given 80–20 train-test split. We then average this over all 200 random splits of our data. To create the null distribution of the test statistic, we randomly permute (10⁵ times) the two models when computing the ROC AUC difference for each split. Finally, the normalized rank of the observed (unpermuted) test statistic value among all sorted (permuted) test statistic values yields the p-value, which we denote with ρ.

Results

Impact of modeling choices

In CN-to-MCI conversion, we observe that there is a substantial difference between the linear and nonlinear models. For example, for the 1-year follow-up, LSM yields 83.88% ROC AUC, whereas its nonlinear counterpart, NSM, achieves 88.73%. This difference remains stable over all follow-up years and is statistically significant (ρ < 0.0001). The multi-year training strategy, on the other hand, further boosts prediction accuracy. For instance, for the 1-year follow-up, NMM achieves an ROC AUC of 90.40%. The difference with the NSM model is consistent over the follow-up years and statistically significant (ρ = 0.0001). Finally, we note that for CN-to-MCI conversion, all models tend to achieve worse performance as the time-horizon increases. For instance, the best-performing NMM model suffers more than a 6% drop in ROC AUC between 1- and 5-year follow-up predictions. This result suggests that it is easier to predict who will convert from CN to MCI in the relatively short term, say within a year, than in the longer term, say within 5 years.

We notice that the performance of LSM fluctuates as a function of the time horizon. There are two local minima, one at 2- and another at 4-year follow-up. This is likely because those two years include a higher percentage of CN subjects, due to the study design of ADNI 2 and 3, as can be seen in Table 2. We see that this affects the performance of the nonlinear single-year model, NSM, too. However, for the NMM the issue is mitigated, which is likely because the multi-year model can leverage the data from the other follow-up years to “smooth out” its predictions.

In MCI-to-AD conversion, there is an overall diminished difference between the performance of the three models. For the single-year models, the linear and nonlinear counterparts are statistically indistinguishable (ρ = 0.3735). The multi-year model, on the other hand, offers a statistically significant (ρ = 0.0004 against LSM, ρ = 0.0014 against NSM), yet subtle boost in ROC AUC, specifically for 1- and 2-year follow-ups. In the remaining follow-up years, all three models achieve essentially the same performance level. The most striking observation from the MCI-to-AD conversion results is that prediction accuracy improves for later years, and there is a very consistent increase in ROC AUC values across all modeling choices. This indicates that it is relatively easier to predict who will convert from MCI to AD in the 4–5 year horizon compared to the 1–2 year horizon. Fig 2 shows corresponding ROC curves of NMM for each follow-up year and each patient group.

Download:

Fig 2. ROC curves of NMM for CN-to-MCI and MCI-to-AD conversion in five-year time horizon.

Displayed are averages of 200 train-test splits.

https://doi.org/10.1371/journal.pone.0277322.g002

Contribution of different biomarkers

As mentioned above, our models are capable of handling missing values in the input. This allows us to inspect the contribution of different data types to prediction accuracy. We perform this analysis on our best performing model, NMM, where we focus only on test participants with complete baseline data and systematically mask each input feature, treating it as missing.

Our baseline scenario is where only clinical data (CD) is available. Fig 4 shows the difference in AUC ROC (Δ AUC ROC) achieved with the utilization of additional biomarkers: FDG PET (a single global marker of sugar metabolism), CSF (global markers of tau and amyloid burden), AV45 PET (a single global marker of brain amyloid load), MRI volumetric measurements (markers of brain atrophy). For MRI, we consider two scenarios. First, we only use the value of hippocampus volume, normalized by the intracranial volume (ICV) (CD+ICV normalized Hippocampus size in Fig 4). In the second scenario, we use seven MRI-derived AD-associated biomarkers (CD+MRI in Fig 4). As a reference, we also show the results for including all available biomarkers in these test subjects that have complete baseline data (CD+All Biomarkers in Fig 4). We were not able to quantify the contribution of PIB PET, as only a very limited number of participants have PIB PET scans.

In CN-to-MCI conversion, molecular biomarkers (FDG, CSF, and AV45), by themselves, do not significantly improve performance over the baseline CD-only scenario, particularly beyond the 1-year follow-up (ρ = 0.1898 for CD+FDG, ρ = 0.2082 for CD+CSF, ρ = 0.3001 for CD+AV45). However, we observe a substantial accuracy boost when MRI data are available (ρ < 0.0001), much of which can be attributed to the hippocampus volume (ρ < 0.0001). All biomarkers combined achieve the highest ROC AUC values (ρ < 0.0001). The performance gain grows over the years, suggesting that additional biomarkers are more useful for making longer-term predictions.

Overall, the performance gain offered by additional biomarkers is relatively smaller for the easier MCI-to-AD conversion problem. Here, MRI markers add around 1% ROC AUC to the CD-only baseline. FDG consistently yields a greater boost than the MRI biomarkers in each follow-up year, which is in contrast to what we observe in CN-to-MCI conversion. Crucially, we find that hippocampus volume does not provide a statistically significant performance boost (ρ = 0.2412), while FDG and MRI markers improve the model performance subtly but significantly (ρ < 0.0001 for CD+FDG, ρ = 0.0024 for CD+MRI). Beyond year 1, CSF consistently outperforms AV45 (ρ < 0.0010 for CD+CSF, ρ = 0.0253 for CD+AV45), where the latter yields a boost on par with MRI. This highlights the potential importance of tau markers, particularly in the MCI stage. Overall, however, a striking observation is that the model that has access to all the biomarkers is substantially more accurate than a model with a single biomarker type.

Disease progression risk predictions

Even though we consider the problem as a three-label classification task for a given follow-up time, the underlying process can be viewed as a continuous evolution of MCI and AD dementia risk [57]. Using our NMM model, we can compute a prediction for arbitrary time horizons for the test subjects and interpret the output probabilities as a longitudinal estimation of risk. The softmax outputs of the MCI channel for CN baseline participants and AD channel for the MCI baseline participants are shown in Figs 5 and 6, respectively. We average these values over test subjects who have the same conversion time profile.

For individuals who remain stable CN throughout the 5-year follow-up period, we observe that NMM’s MCI prediction is consistently less than 50%. Intriguingly, for those stable subjects who were last observed earlier, the predicted MCI probabilities tend to be higher. In fact, for stable CN subjects last seen before the end of year 2, average predicted MCI probabilities exceed 50% around the year 4 mark. We emphasize that the model has no access to follow-up information, as the only input is baseline data. For subjects who convert to MCI at year 1, average predicted MCI probabilities exceed 0.5 before the first annual follow-up visit. Similarly, for those who convert around the second year, the average predicted MCI probabilities exceed 0.5, between years one and two. One notable exception is the group of individuals who progress to MCI at the third-year visit. In this group, the NMM prediction is that MCI conversion will happen, on average, at around the 5-year mark.

For the MCI baseline subjects, we observe similar patterns. For the stable subjects, the predicted AD probabilities remain under 0.5 until the last follow-up visit. For MCI-to-AD converters, the average predicted AD probability exceeds 0.5 before the AD diagnosis, except for the subjects who convert at the 5-year follow-up, where the average predicted AD probability is slightly below 0.5 at the 5-year mark. On the other hand, when we examine the timing of the average predicted conversion, it seems to be less accurate than with CN baselines. In most scenarios (e.g. conversion at 2, 3, and 4 years), the average predicted AD probability exceeds 0.5 before the corresponding time interval. This suggests that the NMM’s predictions tend to estimate an earlier MCI-to-AD prediction than observed.

Discussion

In this work, we present an empirical study to characterize how predictable an individual subject’s future AD-associated clinical trajectory is, several years in advance, based on rich multi-modal data, and using modern deep learning methods. We present a novel machine learning strategy that can handle variable follow-up time queries, missingness patterns, and unbalanced class labels in the data, to make accurate predictions about the future decline in CN and MCI baseline participants.

Comparing the prediction accuracy for CN-to-MCI and MCI-to-AD conversions in Fig 3, our results verify that the CN-to-MCI conversion prediction is a harder task than the MCI-to-AD conversion prediction. On the other hand, we also confirm that more sophisticated modeling, such as a nonlinear multi-year (NMM) architecture, offers a larger boost for the harder CN-to-MCI conversion prediction task. This verifies that there is a bigger gap in performance between what a relatively simple model can achieve and the upper bound of what is achievable (also known as the Bayes-optimal performance) in the harder problem of CN-to-MCI conversion.

Download:

Fig 3. Predictive performance of different models for different follow-up years.

ROC AUC values are averaged across 200 80–20 data splits. Error bars indicate the standard error across these splits. LSM, Linear Single-year Model; NSM, Nonlinear Single-year Model; NMM, Nonlinear Multi-year Model.

https://doi.org/10.1371/journal.pone.0277322.g003

Five years is a relatively short time window for studying CN-to-MCI conversion. On the other hand, in many real-world clinical scenarios, 5 years is a useful horizon to consider. Moreover, we note that at year 5, around 30% of the baseline CN subjects who remained in the study had converted to MCI, as we show in Table 2. Our analysis demonstrates that the prediction of CN-to-MCI conversion gets harder for distant time horizons, and we achieve higher accuracy for shorter time frames. This insight might be useful in detecting those CN subjects who might be on the cusp of developing MCI.

Despite the missingness in the data, Fig 4 suggests that NMM does not rely solely on a single modality. Additional biomarkers, in general, do not make the prediction performance worse. This finding is in parallel with the fact that multi-modal data, such as different MRI sequences and various PET tracers are often combined in the literature for predicting MCI-to-AD conversion [58–60]. However, our results also demonstrate that the predictive value of each additional biomarker can vary. For example, for CN baseline participants, although there is a substantial accuracy increase with the use of MRI; molecular biomarkers (CSF, FDG, and AV45) do not offer a significant boost beyond the first year horizon. For the prediction of MCI-to-AD conversion, however, the situation is different—molecular biomarkers offer a significant boost. Furthermore, using the different MRI biomarkers together seems to be much more helpful for predicting MCI-to-AD conversion, rather than relying on a single MRI biomarker, namely the ICV-normalized hippocampal volume. These results highlight the importance of characterizing the diagnostic and predictive utility of different data types, at different stages of the disease process.

Download:

Fig 4. Δ ROC AUC values obtained with NMM by the addition of various biomarker combinations to the clinical data (participant demographics, clinical assessments, and cognitive assessments).

ROC AUC values are averaged across 200 80–20 data splits. Error bars indicate the standard error across these splits. +: Used together. CD, Clinical data; AV45, Florbetapir PET; CSF, Cerebrospinal Fluid; FDG, Fluorine-18-Fluorodeoxyglucose PET; MRI, Magnetic Resonance Imaging; ICV, Intracranial Volume.

https://doi.org/10.1371/journal.pone.0277322.g004

One interpretation of the patterns of results we present in this study might be that amyloid or tau-associated biomarker changes have a relatively longer timecourse than MRI derived measurements, such as hippocampal volume. Furthermore, MRI markers may be less specific and reflect a multitude of effects that result in atrophy, particularly at later ages. Thus, MRI might predict more proximal decline from CN to MCI, but its utility will be less during the MCI stage, where tau/amyloid markers might offer some specific insights into the Alzheimer’s pathology dynamics that will play out over the next several years.

The conversion risk predictions that we show in Figs 5 and 6 suggest that NMM captures the continuous disease dynamics. However, NMM’s predictions are not always exactly aligned with the timing of events. This issue can be related to various biases in subject recruitment and follow-up in the ADNI [61]. For example, the data suffer from a “temporal bias” [62] that is caused by the fact that baseline visits are not distributed uniformly over latent disease stages. These shortcomings require further investigations, likely demanding novel methodological approaches that can address the selection and temporal biases in the data and possibly exploiting other cohorts, as in [63].

Download:

Fig 5. Conversion risk predictions of NMM for CN baseline participants with different ground truth disease trajectories.

Blue line is the average MCI conversion risk with 68% confidence. Red dots represent the observed diagnosis time (x-coordinate) and the observed diagnosis (y-coordinate) of the participants with the corresponding trajectory. Grey dots are for reference.

https://doi.org/10.1371/journal.pone.0277322.g005

Download:

Fig 6. Conversion risk predictions of NMM for MCI baseline participants with different ground truth disease trajectories.

Blue line is the average AD conversion risk with 68% confidence. Red dots represent the observed diagnosis time (x-coordinate) and the observed diagnosis (y-coordinate) of the participants with the corresponding trajectory. Grey dots are for reference.

https://doi.org/10.1371/journal.pone.0277322.g006

Conclusion

We have presented a machine learning approach that uses participants’ multimodal baseline data with arbitrary missingness, to predict their future diagnostic status at any time point. We have demonstrated that our model can capture disease progression dynamics and produce future conversion predictions that are highly accurate. Our analyses allow us to dissect the impact of modeling choices and input data types. We found that molecular biomarkers are more useful for predicting MCI-to-AD conversion than CN-to-MCI conversion. Our results show that MRI features are essential for both types of predictions, yet different types of MRI-derived measurements can be useful in different stages.

Supporting information

S1 Table. Performance of each model in terms of AUC ROC for CN baseline participants.

Data format is mean ± standard error. LSM^† is a standard linear ridge regression model that is an alternative implementation of LSM (Linear Single-year Model). NMM^† is a slight modification of NMM (Nonlinear Multi-year Model), where the input time-to-follow-up (Δt) feature is encoded as the closest annual visit time. NSM: Nonlinear Single-year Model.

https://doi.org/10.1371/journal.pone.0277322.s001

(PDF)

S2 Table. Performance of each model in terms of AUC ROC for MCI baseline participants.

See caption of S1 Table.

https://doi.org/10.1371/journal.pone.0277322.s002

(PDF)

S1 Fig. Predictive performance of NMMs with different architectures.

ROC AUC values are averaged across 200 80–20 data splits. Error bars indicate the standard error across these splits. NMM, Nonlinear Multi-year Model with the architecture shown in Fig 1; NMM (3-layer), Nonlinear Multi-year Model with a three-layer architecture; NMM (Optimized), Nonlinear Multi-year Model with optimized architectures for each test set. Details of NMM (3-layer) and NMM(optimized) can be found in S1 Text.

https://doi.org/10.1371/journal.pone.0277322.s003

(PDF)

S1 Text. Details of NMM (3-layer) and NMM (optimized).

We use the same hyperparameters and activation functions for NMM (3-layer) and NMM (optimized) as NMM. NMM (3-layer) has an architecture consisting of 2 hidden layers with a width of 128 and an output layer of width 3. NMM (optimized) architectures for each test split are searched over a 3 × 3 grid, characterized by two parameters: width (W) and depth (D). W represents the number of neurons in the first hidden layer, and it can be either 64, 128, or 256. D represents the depth of the architecture in terms of equally wide blocks in Fig 1, i.e., a D of 1 means the architecture has 3 hidden layers of width W; a D of 2 means the architecture has 3 hidden layers of width W, followed by 5 hidden layers of width W/2; and a depth of 3 means the architecture has 3 hidden layers of width W, followed by 5 hidden layers of width W/2, followed by 2 hidden layers of width W/4. All architectures have an output layer with a width of 3. The best architecture is chosen by monitoring the validation loss in one of the train/validation splits.

https://doi.org/10.1371/journal.pone.0277322.s004

(PDF)

Acknowledgments

The authors would like to acknowledge the anonymous reviewers for their valuable revision of the manuscript.

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database http://adni.loni.usc.edu. Applications for ADNI data use can be submitted through the ADNI website at https://adni.loni.usc.edu/data-samples/access-data/. Others would be able to access the data in the same manner as the authors. The authors did not have any special access privileges that others would not have. The investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. Michael Weiner (E-mail: Michael.Weiner@ucsf.edu) serves as the principal investigator for ADNI. A complete listing of ADNI investigators and their affiliations can be found below.

Michael Weiner⁴, Paul Aisen⁵, Ronald Petersen⁶, Clifford R. Jack Jr.⁶, William Jagust⁷, John Q. Trojanowki⁸, Arthur W. Toga⁹, Laurel Beckett¹⁰, Robert C. Green¹¹, Andrew J. Saykin¹², John Morris¹³, Leslie M. Shaw¹⁴, Enchi Liu¹⁵, Tom Montine¹⁶, Ronald G. Thomas⁵, Michael Donohue⁵, Sarah Walter⁵, Devon Gessert⁵, Tamie Sather⁵, Gus Jiminez⁵, Danielle Harvey¹⁰, Michael Donohue⁵, Matthew Bernstein⁶, Nick Fox¹⁷, Paul Thompson¹⁸, Norbert Schuff¹⁹, Charles DeCArli¹⁰, Bret Borowski⁶, Jeff Gunter⁶, Matt Senjem⁶, Prashanthi Vemuri⁶, David Jones⁶, Kejal Kantarci⁶, Chad Ward⁶, Robert A. Koeppe²⁰, Norm Foster²¹, Eric M. Reiman²², Kewei Chen²², Chet Mathis²³, Susan Landau⁷, Nigel J. Cairns¹³, Erin Householder¹³, Lisa Taylor Reinwald¹³, Virginia Lee²⁴, Magdalena Korecka²⁴, Michal Figurski²⁴, Karen Crawford⁹, Scott Neu⁹, Tatiana M. Foroud¹², Steven Potkin²⁵, Li Shen¹², Faber Kelley¹², Sungeun Kim¹², Kwangsik Nho¹², Zaven Kachaturian²⁶, Richard Frank²⁷, Peter J. Snyder²⁸, Susan Molchan²⁹, Jeffrey Kaye³⁰, Joseph Quinn³⁰, Betty Lind³⁰, Raina Carter³⁰, Sara Dolen³⁰, Lon S. Schneider³¹, Sonia Pawluczyk³¹, Mauricio Beccera³¹, Liberty Teodoro³¹, Bryan M. Spann³¹, James Brewer³², Helen Vanderswag³², Adam Fleisher²², Judith L. Heidebrink²⁰, Joanne L. Lord²⁰, Ronald Petersen⁶, Sara S. Mason⁶, Colleen S. Albers⁶, David Knopman⁶, Kris Johnson⁶, Rachelle S. Doody³³, Javier Villanueva Meyer³³, Munir Chowdhury³³, Susan Rountree³³, Mimi Dang³³, Yaakov Stern³⁴, Lawrence S. Honig³⁴, Karen L. Bell³⁴, Beau Ances³⁵, John C. Morris³⁵, Maria Carroll³⁵, Sue Leon³⁵, Erin Householder¹³, Mark A. Mintun³⁵, Stacy Schneider³⁵, Angela OliverNG³⁶, Randall Griffith³⁶, David Clark³⁶, David Geldmacher³⁶, John Brockington³⁶, Erik Roberson³⁶, Hillel Grossman³⁷, Effie Mitsis³⁷, Leyla de Toledo-Morrell³⁸, Raj C. Shah³⁸, Ranjan Duara³⁹, Daniel Varon³⁹, Maria T. Greig³⁹, Peggy Roberts³⁹, Marilyn Albert⁴⁰, Chiadi Onyike⁴⁰, Daniel D’Agostino II⁴⁰, Stephanie Kielb⁴⁰, James E. Galvin⁴¹, Dana M. Pogorelec⁴¹, Brittany Cerbone⁴¹, Christina A. Michel⁴¹, Henry Rusinek⁴¹, Mony J. de Leon⁴¹, Lidia Glodzik⁴¹, Susan De Santi⁴¹, P. Murali Doraiswamy⁴², Jeffrey R. Petrella⁴², Terence Z. Wong⁴², Steven E. Arnold¹⁴, Jason H. Karlawish¹⁴, David Wolk¹⁴, Charles D. Smith⁴³, Greg Jicha⁴³, Peter Hardy⁴³, Partha Sinha⁴³, Elizabeth Oates⁴³, Gary Conrad⁴³, Oscar L. Lopez²³, MaryAnn Oakley²³, Donna M. Simpson²³, Anton P. Porsteinsson⁴⁴, Bonnie S. Goldstein⁴⁴, Kim Martin⁴⁴, Kelly M. Makino⁴⁴, M. Saleem Ismail⁴⁴, Connie Brand⁴⁴, Ruth A. Mulnard⁴⁵, Gaby Thai⁴⁵, Catherine Mc Adams Ortiz⁴⁵, Kyle Womack⁴⁶, Dana Mathews⁴⁶, Mary Quiceno⁴⁶, Ramon Diaz Arrastia⁴⁶, Richard King⁴⁶, Myron Weiner⁴⁶, Kristen Martin Cook⁴⁶, Michael DeVous⁴⁶, Allan I. Levey⁴⁷, James J. Lah⁴⁷, Janet S. Cellar⁴⁷, Jeffrey M. Burns⁴⁸, Heather S. Anderson⁴⁸, Russell H. Swerdlow⁴⁸, Liana Apostolova⁴⁹, Kathleen Tingus⁴⁹, Ellen Woo⁴⁹, Daniel H. S. Silverman⁴⁹, Po H. Lu⁴⁹, George Bartzokis⁴⁹, Neill R. Graff Radford⁵⁰, Francine Parfitt⁵⁰, Tracy Kendall⁵⁰, Heather Johnson⁵⁰, Martin R. Farlow¹², Ann Marie Hake¹², Brandy R. Matthews¹², Scott Herring¹², Cynthia Hunt¹², Christopher H. van Dyck⁵¹, Richard E. Carson⁵¹, Martha G. MacAvoy⁵¹, Howard Chertkow⁵², Howard Bergman⁵², Chris Hosein⁵², Sandra Black⁵³, Bojana Stefanovic⁵³, Curtis Caldwell⁵³, Ging Yuek Robin Hsiung⁵⁴, Howard Feldman⁵⁴, Benita Mudge⁵⁴, Michele Assaly Past⁵⁴, Andrew Kertesz⁵⁵, John Rogers⁵⁵, Dick Trost⁵⁵, Charles Bernick⁵⁶, Donna Munic⁵⁶, Diana Kerwin⁵⁷, Marek Marsel Mesulam⁵⁷, Kristine Lipowski⁵⁷, Chuang Kuo Wu⁵⁷, Nancy Johnson⁵⁷, Carl Sadowsky⁵⁸, Walter Martinez⁵⁸, Teresa Villena⁵⁸, Raymond Scott Turner⁵⁹, Kathleen Johnson⁵⁹, Brigid Reynolds⁵⁹, Reisa A. Sperling⁶⁰, Keith A. Johnson⁶⁰, Gad Marshall⁶⁰, Meghan Frey⁶⁰, Jerome Yesavage⁶¹, Joy L. Taylor⁶¹, Barton Lane⁶¹, Allyson Rosen⁶¹, Jared Tinklenberg⁶¹, Marwan N. Sabbagh⁶², Christine M. Belden⁶², Sandra A. Jacobson⁶², Sherye A. Sirrel⁶², Neil Kowall⁶³, Ronald Killiany⁶³, Andrew E. Budson⁶³, Alexander Norbash⁶³, Patricia Lynn Johnson⁶³, Thomas O. Obisesan⁶⁴, Saba Wolday⁶⁴, Joanne Allard⁶⁴, Alan Lerner⁶⁵, Paula Ogrocki⁶⁵, Leon Hudson⁶⁵, Evan Fletcher⁶⁶, Owen Carmichael⁶⁶, John Olichney⁶⁶, Charles DeCarli⁶⁶, Smita Kittur⁶⁷, Michael Borrie⁶⁸, T. Y. Lee⁶⁸, Rob Bartha⁶⁸, Sterling Johnson⁶⁹, Sanjay Asthana⁶⁹, Cynthia M. Carlsson⁶⁹, Steven G. Potkin⁷⁰, Adrian Preda⁷⁰, Dana Nguyen⁷⁰, Pierre Tariot²², Adam Fleisher²², Stephanie Reeder²², Vernice Bates⁷¹, Horacio Capote⁷¹, Michelle Rainka⁷¹, Douglas W. Scharre⁷², Maria Kataki⁷², Anahita Adeli⁷², Earl A. Zimmerman⁷³, Dzintra Celmins⁷³, Alice D. Brown⁷³, Godfrey D. Pearlson⁷⁴, Karen Blank⁷⁴, Karen Anderson⁷⁴, Robert B. Santulli⁷⁵, Tamar J. Kitzmiller⁷⁵, Eben S. Schwartz⁷⁵, Kaycee M. SinkS⁷⁶, Jeff D. Williamson⁷⁶, Pradeep Garg⁷⁶, Franklin Watkins⁷⁶, Brian R. Ott⁷⁷, Henry Querfurth⁷⁷, Geoffrey Tremont⁷⁷, Stephen Salloway⁷⁸, Paul Malloy⁷⁸, Stephen Correia⁷⁸, Howard J. Rosen⁴, Bruce L. Miller⁴, Jacobo Mintzer⁷⁹, Kenneth Spicer⁷⁹, David Bachman⁷⁹, Elizabether Finger⁸⁰, Stephen Pasternak⁸⁰, Irina Rachinsky⁸⁰, John Rogers⁵⁵, Andrew Kertesz⁵⁵, Dick Drost⁸⁰, Nunzio Pomara⁸¹, Raymundo Hernando⁸¹, Antero Sarrael⁸¹, Susan K. Schultz⁸², Laura L. Boles Ponto⁸², Hyungsub Shim⁸², Karen Elizabeth Smith⁸², Norman Relkin⁸³, Gloria Chaing⁸³, Lisa Raudin⁸³, Amanda Smith⁸⁴, Kristin Fargher⁸⁴, Balebail Ashok Raj⁸⁴

4 UC San Francisco, San Francisco, CA, USA. 5 UC San Diego, San Diego, CA, USA. 6 Mayo Clinic, Rochester, NY, USA. 7 UC Berkeley, Berkeley, CA, USA. 8 U Pennsylvania, Pennsylvania, CA, USA. 9 USC, Los Angeles, CA, USA. 10 UC Davis, Davis, CA, USA. 11 Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA. 12 Indiana University, Bloomington, IN, USA. 13 Washington University St. Louis, St. Louis, MO, USA. 14 University of Pennsylvania, Philadelphia, PA, USA. 15 Janssen Alzheimer Immunotherapy, South San Francisco, CA, USA. 16 University of Washington, Seattle, WA, USA. 17 University of London, London, UK. 18 USC School of Medicine, Los Angeles, CA, USA. 19 UCSF MRI, San Francisco, CA, USA. 20 University of Michigan, Ann Arbor, MI, USA. 21 University of Utah, Salt Lake City, UT, USA. 22 Banner Alzheimer’s Institute, Phoenix, AZ, USA. 23 University of Pittsburgh, Pittsburgh, PA, USA. 24 UPenn School of Medicine, Philadelphia, PA, USA. 25 UC Irvine, Newport Beach, CA, USA. 26 Khachaturian, Radebaugh & Associates, Inc and Alzheimer’s Association’s Ronald and Nancy Reagan’s Research Institute, Chicago, IL, USA. 27 General Electric, Boston, MA, USA. 28 Brown University, Providence, RI, USA. 29 National Institute on Aging/National Institutes of Health, Bethesda, MD, USA. 30 Oregon Health and Science University, Portland, OR, USA. 31 University of Southern California, Los Angeles, CA, USA. 32 University of California San Diego, San Diego, CA, USA. 33 Baylor College of Medicine, Houston, TX, USA. 34 Columbia University Medical Center, New York, NY, USA. 35 Washington University, St. Louis, MO, USA. 36 University of Alabama Birmingham, Birmingham, MO, USA. 37 Mount Sinai School of Medicine, New York, NY, USA. 38 Rush University Medical Center, Chicago, IL, USA. 39 Wien Center, Vienna, Austria. 40 Johns Hopkins University, Baltimore, MD, USA. 41 New York University, New York, NY, USA. 42 Duke University Medical Center, Durham, NC, USA. 43 University of Kentucky, city of Lexington, NC, USA. 44 University of Rochester Medical Center, Rochester, NY, USA. 45 University of California, Irvine, CA, USA. 46 University of Texas Southwestern Medical School, Dallas, TX, USA. 47 Emory University, Atlanta, GA, USA. 48 University of Kansas, Medical Center, Lawrence, KS, USA. 49 University of California, Los Angeles, CA, USA. 50 Mayo Clinic, Jacksonville, FL, USA. 51 Yale University School of Medicine, New Haven, CT, USA. 52 McGill Univ., Montreal Jewish General Hospital, Montreal, WI, USA. 53 Sunnybrook Health Sciences, Toronto, ON, Canada. 54 U.B.C. Clinic for AD & Related Disorders, British Columbia, BC, Canada. 55 Cognitive Neurology St. Joseph’s, Toronto, ON, Canada. 56 Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, USA. 57 Northwestern University, Evanston, IL, USA. 58 Premiere Research Inst Palm Beach Neurology, West Palm Beach, FL, USA. 59 Georgetown University Medical Center, Washington, DC, USA. 60 Brigham and Women’s Hospital, Boston, MA, USA. 61 Stanford University, Santa Clara County, CA, USA. 62 Banner Sun Health Research Institute, Sun City, AZ, USA. 63 Boston University, Boston, MA, USA. 64 Howard University, Washington, DC, USA. 65 Case Western Reserve University, Cleveland, OH, USA. 66 University of California, Davis Sacramento, CA, USA. 67 Neurological Care of CNY, New York, NY, USA. 68 Parkwood Hospital, Parkwood, CA, USA. 69 University of Wisconsin, Madison, WI, USA. 70 University of California, Irvine BIC, Irvine, CA, USA. 71 Dent Neurologic Institute, Amherst, MA, USA. 72 Ohio State University, Columbus, OH, USA. 73 Albany Medical College, Albany, NY, USA. 74 Hartford Hosp, Olin Neuropsychiatry Research Center, Hartford, CT, USA. 75 Dartmouth Hitchcock Medical Center, Albany, NY, USA. 76 Wake Forest University Health Sciences, Winston-Salem, NC, USA. 77 Rhode Island Hospital, Rhode Island, USA. 78 Butler Hospital, Providence, RI, USA. 79 Medical University South Carolina, Charleston, SC, USA. 80 St. Joseph’s Health Care, Toronto, Canada. 81 Nathan Kline Institute, Orangeburg, SC, USA. 82 University of Iowa College of Medicine, Iowa City, IA, USA. 83 Cornell University, Ithaca, NY, USA. 84 University of South Florida, USF Health Byrd Alzheimer’s Institute, Tampa, FL, USA.

References

1. Organization WH. Dementia; 2021. Available from: https://www.who.int/news-room/fact-sheets/detail/dementia.
2. for Disease Control C, Prevention. Leading causes of death; 2022. Available from: https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm.
3. James BD, Bennett DA. Causes and Patterns of Dementia: An Update in the Era of Redefining Alzheimer’s Disease. Annual Review of Public Health. 2019;40(1):65–84. pmid:30642228
- View Article
- PubMed/NCBI
- Google Scholar
4. Breijyeh Z, Karaman R. Comprehensive Review on Alzheimer’s Disease: Causes and Treatment. Molecules. 2020;25(24). pmid:33302541
- View Article
- PubMed/NCBI
- Google Scholar
5. Munoz DG, Feldman H. Causes of Alzheimer’s disease. CMAJ. 2000;162(1):65–72. pmid:11216203
- View Article
- PubMed/NCBI
- Google Scholar
6. LaMontagne PJ, Benzinger TL, Morris JC, Keefe S, Hornbeck R, Xiong C, et al. OASIS-3: Longitudinal Neuroimaging, Clinical, and Cognitive Dataset for Normal Aging and Alzheimer Disease. medRxiv. 2019.
- View Article
- Google Scholar
7. Malone IB, Cash D, Ridgway GR, MacManus DG, Ourselin S, Fox NC, et al. MIRIAD—Public release of a multiple time point Alzheimer’s MR imaging dataset. NeuroImage. 2013;70:33–36. pmid:23274184
- View Article
- PubMed/NCBI
- Google Scholar
8. Birkenbihl C, Westwood S, Shi L, Nevado-Holgado A, Westman E, Lovestone S, et al. ANMerge: A Comprehensive and Accessible Alzheimer’s Disease Patient-Level Dataset. Journal of Alzheimer’s Disease. 2021;79:423–431. pmid:33285634
- View Article
- PubMed/NCBI
- Google Scholar
9. Ellis KA, Bush AI, Darby D, De Fazio D, Foster J, Hudson P, et al. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. International Psychogeriatrics. 2009;21(4):672–687. pmid:19470201
- View Article
- PubMed/NCBI
- Google Scholar
10. Beekly DL, Ramos EM, van Belle G, Deitrich W, Clark AD, Jacka ME, et al. The National Alzheimer’s Coordinating Center (NACC) Database: An Alzheimer Disease Database. Alzheimer Disease & Associated Disorders. 2004;18:270–277. pmid:15592144
- View Article
- PubMed/NCBI
- Google Scholar
11. Zhang X, Mormino EC, Sun N, Sperling RA, Sabuncu MR, Yeo BT, et al. Bayesian model reveals latent atrophy factors with dissociable cognitive trajectories in Alzheimer’s disease. Proceedings of the National Academy of Sciences. 2016;113(42):E6535–E6544. pmid:27702899
- View Article
- PubMed/NCBI
- Google Scholar
12. Kivipelto M, Mangialasche F, Ngandu T. Lifestyle interventions to prevent cognitive impairment, dementia and Alzheimer disease. Nature Reviews Neurology. 2018;14:653–666. pmid:30291317
- View Article
- PubMed/NCBI
- Google Scholar
13. Mangialasche F, Solomon A, Winblad B, Mecocci P, Kivipelto M. Alzheimer’s disease: clinical trials and drug development. The Lancet Neurology. 2010;9(7):702–716. pmid:20610346
- View Article
- PubMed/NCBI
- Google Scholar
14. Cummings J, Lee G, Ritter A, Sabbagh M, Zhong K. Alzheimer’s disease drug development pipeline: 2019. Alzheimer’s & Dementia: Translational Research & Clinical Interventions. 2019;5:272–293. pmid:31334330
- View Article
- PubMed/NCBI
- Google Scholar
15. Rosenberg PB, Mielke MM, Appleby BS, Oh ES, Geda YE, Lyketsos CG. The Association of Neuropsychiatric Symptoms in MCI with Incident Dementia and Alzheimer Disease. The American Journal of Geriatric Psychiatry. 2013;21(7):685–695. pmid:23567400
- View Article
- PubMed/NCBI
- Google Scholar
16. Feldman H, Scheltens P, Scarpini E, Hermann N, Mesenbrink P, Mancione L, et al. Behavioral symptoms in mild cognitive impairment. Neurology. 2004;62(7):1199–1201. pmid:15079026
- View Article
- PubMed/NCBI
- Google Scholar
17. Grueso S, Viejo-Sobera R. Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer’s disease dementia: a systematic review. Alzheimer’s Research & Therapy. 2021;13. pmid:34583745
- View Article
- PubMed/NCBI
- Google Scholar
18. Chen Y, Denny KG, Harvey D, Farias ST, Mungas D, DeCarli C, et al. Progression from normal cognition to mild cognitive impairment in a diverse clinic-based and community-based elderly cohort. Alzheimer’s & Dementia. 2017;13:399–405.
- View Article
- Google Scholar
19. Peavy GM, Jacobson MW, Salmon DP, Gamst AC, Patterson TL, Goldman S, et al. The Influence of Chronic Stress on Dementia-related Diagnostic Change in Older Adults. Alzheimer Disease & Associated Disorders. 2012;26:260–266. pmid:22037597
- View Article
- PubMed/NCBI
- Google Scholar
20. Popuri K, Balachandar R, Alpert K, Lu D, Bhalla M, Mackenzie IR, et al. Development and validation of a novel dementia of Alzheimer’s type (DAT) score based on metabolism FDG-PET imaging. NeuroImage: Clinical. 2018;18:802–813. pmid:29876266
- View Article
- PubMed/NCBI
- Google Scholar
21. Yee E, Popuri K, Beg MF. Quantifying brain metabolism from FDG–PET images into a probability of Alzheimer’s dementia score. Human Brain Mapping. 2019. pmid:31507022
- View Article
- PubMed/NCBI
- Google Scholar
22. Rathore S, Habes M, Iftikhar MA, Shacklett A, Davatzikos C. A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer’s disease and its prodromal stages. NeuroImage. 2017;155:530–548. pmid:28414186
- View Article
- PubMed/NCBI
- Google Scholar
23. Ocasio E, Duong TQ. Deep learning prediction of mild cognitive impairment conversion to Alzheimer’s disease at 3 years after diagnosis using longitudinal and whole-brain 3D MRI. PeerJ Computer Science. 2021;7:e560. pmid:34141888
- View Article
- PubMed/NCBI
- Google Scholar
24. Pavisic IM, Nicholas JM, O’Connor A, Rice H, Lu K, Fox NC, et al. Disease duration in autosomal dominant familial Alzheimer disease. Neurology Genetics. 2020;6(5). pmid:33225064
- View Article
- PubMed/NCBI
- Google Scholar
25. Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack CR, Jagust W, et al. Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimer’s & Dementia. 2005;1:55–66.
- View Article
- Google Scholar
26. Donohue MC, Sperling RA, Salmon DP, Rentz DM, Raman R, Thomas RG, et al. The Preclinical Alzheimer Cognitive Composite. JAMA Neurology. 2014;71:961. pmid:24886908
- View Article
- PubMed/NCBI
- Google Scholar
27. Donohue MC, Sperling RA, Petersen R, Sun CK, Weiner MW, Aisen PS, et al. Association Between Elevated Brain Amyloid and Subsequent Cognitive Decline Among Cognitively Normal Persons. JAMA. 2017;317:2305–2316. pmid:28609533
- View Article
- PubMed/NCBI
- Google Scholar
28. Olsson A, Vanderstichele H, Andreasen N, De Meyer G, Wallin A, Holmberg B, et al. Simultaneous measurement of beta-amyloid(1-42), total tau, and phosphorylated tau (Thr181) in cerebrospinal fluid by the xMAP technology. Clinical Chemistry. 2005;51:336–345. pmid:15563479
- View Article
- PubMed/NCBI
- Google Scholar
29. Jellinger KA, Janetzky B, Attems J, Kienzl E. Biomarkers for early diagnosis of Alzheimer disease: ‘ALZheimer ASsociated gene’- a new blood biomarker? Journal of Cellular and Molecular Medicine. 2008;12:1094–1117. pmid:18363842
- View Article
- PubMed/NCBI
- Google Scholar
30. Jack CR, Bernstein MA, Fox NC, Thompson P, Alexander G, Harvey D, et al. The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging. 2008;27:685–691. pmid:18302232
- View Article
- PubMed/NCBI
- Google Scholar
31. Jack CR, Barnes J, Bernstein MA, Borowski BJ, Brewer J, Clegg S, et al. Magnetic resonance imaging in Alzheimer’s Disease Neuroimaging Initiative 2. Alzheimer’s & Dementia. 2015;11:740–756. pmid:26194310
- View Article
- PubMed/NCBI
- Google Scholar
32. Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage. 2006;31(3):968–980. pmid:16530430
- View Article
- PubMed/NCBI
- Google Scholar
33. Fischl B, van der Kouwe A, Destrieux C, Halgren E, Ségonne F, Salat DH, et al. Automatically Parcellating the Human Cerebral Cortex. Cerebral Cortex. 2004;14(1):11–22. pmid:14654453
- View Article
- PubMed/NCBI
- Google Scholar
34. Fischl B, Salat DH, van der Kouwe AJW, Makris N, Ségonne F, Quinn BT, et al. Sequence-independent segmentation of magnetic resonance images. NeuroImage. 2004;23(Supplement 1):S69–S84. pmid:15501102
- View Article
- PubMed/NCBI
- Google Scholar
35. Fischl B, Dale AM. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proceedings of the National Academy of Sciences of the United States of America. 2000;97(20):11050–11055. pmid:10984517
- View Article
- PubMed/NCBI
- Google Scholar
36. Fischl B, Liu A, Dale AM. Automated manifold surgery: constructing geometrically accurate and topologically correct models of the human cerebral cortex. IEEE Medical Imaging. 2001;20(1):70–80. pmid:11293693
- View Article
- PubMed/NCBI
- Google Scholar
37. Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron. 2002;33:341–355. pmid:11832223
- View Article
- PubMed/NCBI
- Google Scholar
38. Fischl B, Sereno MI, Tootell RBH, Dale AM. High-resolution intersubject averaging and a coordinate system for the cortical surface. Human Brain Mapping. 1999;8(4):272–284. pmid:10619420
- View Article
- PubMed/NCBI
- Google Scholar
39. Jovicich J, Czanner S, Greve D, Haley E, van der Kouwe A, Gollub R, et al. Reliability in multi-site structural MRI studies: Effects of gradient non-linearity correction on phantom and human data. NeuroImage. 2006;30(2):436–443. pmid:16300968
- View Article
- PubMed/NCBI
- Google Scholar
40. Kuperberg GR, Broome M, McGuire PK, David AS, Eddy M, Ozawa F, et al. Regionally localized thinning of the cerebral cortex in Schizophrenia. Archives of General Psychiatry. 2003;60:878–888. pmid:12963669
- View Article
- PubMed/NCBI
- Google Scholar
41. Rosas HD, Liu AK, Hersch S, Glessner M, Ferrante RJ, Salat DH, et al. Regional and progressive thinning of the cortical ribbon in Huntington’s disease. Neurology. 2002;58(5):695–701. pmid:11889230
- View Article
- PubMed/NCBI
- Google Scholar
42. Salat D, Buckner RL, Snyder AZ, Greve DN, Desikan RS, Busa E, et al. Thinning of the cerebral cortex in aging. Cerebral Cortex. 2004;14:721–730. pmid:15054051
- View Article
- PubMed/NCBI
- Google Scholar
43. Segonne F, Dale AM, Busa E, Glessner M, Salat D, Hahn HK, et al. A hybrid approach to the skull stripping problem in MRI. NeuroImage. 2004;22(3):1060–1075. pmid:15219578
- View Article
- PubMed/NCBI
- Google Scholar
44. Dale A, Fischl B, Sereno MI. Cortical Surface-Based Analysis: I. Segmentation and Surface Reconstruction. NeuroImage. 1999;9(2):179–194. pmid:9931268
- View Article
- PubMed/NCBI
- Google Scholar
45. Fischl B, Sereno MI, Dale A. Cortical Surface-Based Analysis: II: Inflation, Flattening, and a Surface-Based Coordinate System. NeuroImage. 1999;9(2):195–207. pmid:9931269
- View Article
- PubMed/NCBI
- Google Scholar
46. Han X, Jovicich J, Salat D, van der Kouwe A, Quinn B, Czanner S, et al. Reliability of MRI-derived measurements of human cerebral cortical thickness: The effects of field strength, scanner upgrade and manufacturer. NeuroImage. 2006;32(1):180–194. pmid:16651008
- View Article
- PubMed/NCBI
- Google Scholar
47. Sled JG, Zijdenbos AP, Evans AC. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans Med Imaging. 1998;17:87–97. pmid:9617910
- View Article
- PubMed/NCBI
- Google Scholar
48. Segonne F, Pacheco J, Fischl B. Geometrically accurate topology-correction of cortical surfaces using nonseparating loops. IEEE Trans Med Imaging. 2007;26:518–529. pmid:17427739
- View Article
- PubMed/NCBI
- Google Scholar
49. Reuter M, Rosas HD, Fischl B. Highly Accurate Inverse Consistent Registration: A Robust Approach. NeuroImage. 2010;53(4):1181–1196. pmid:20637289
- View Article
- PubMed/NCBI
- Google Scholar
50. Reuter M, Fischl B. Avoiding Asymmetry-Induced Bias in Longitudinal Image Processing. NeuroImage. 2011;57(1):19–21. pmid:21376812
- View Article
- PubMed/NCBI
- Google Scholar
51. Reuter M, Schmansky NJ, Rosas HD, Fischl B. Within-Subject Template Estimation for Unbiased Longitudinal Image Analysis. NeuroImage. 2012;61(4):1402–1418. pmid:22430496
- View Article
- PubMed/NCBI
- Google Scholar
52. Jagust WJ, Bandy D, Chen K, Foster NL, Landau SM, Mathis CA, et al. The Alzheimer’s Disease Neuroimaging Initiative positron emission tomography core. Alzheimer’s & Dementia. 2010;6:221–229.
- View Article
- Google Scholar
53. Jagust WJ, Landau SM, Koeppe RA, Reiman EM, Chen K, Mathis CA, et al. The Alzheimer’s Disease Neuroimaging Initiative 2 PET Core: 2015. Alzheimer’s & Dementia. 2015;11:757–771.
- View Article
- Google Scholar
54. Hartig M, Truran-Sacrey D, Raptentsetsang S, Simonson A, Mezher A, Schuff N, et al. UCSF FreeSurfer Methods; 2014. Available from: https://adni.bitbucket.io/reference/docs/UCSFFSX51/UCSF%20FreeSurfer%20Methods%20and%20QC_OFFICIAL.pdf.
- View Article
- Google Scholar
55. Campos S, Pizarro L, Valle C, Gray KR, Rueckert D, Allende H. Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study. In: Pardo A, Kittler J, editors. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Cham: Springer International Publishing; 2015. p. 3–10.
56. Wu Y, Zhang X, He Y, Cui J, Ge X, Han H, et al. Predicting Alzheimer’s disease based on survival data and longitudinally measured performance on cognitive and functional scales. Psychiatry Research. 2020;291:113201. pmid:32559670
- View Article
- PubMed/NCBI
- Google Scholar
57. Li D, Iddi S, Aisen PS, Thompson WK, Donohue MC. The relative efficiency of time-to-progression and continuous measures of cognition in presymptomatic Alzheimer’s disease. Alzheimer’s & Dementia: Translational Research & Clinical Interventions. 2019;5:308–318. pmid:31367671
- View Article
- PubMed/NCBI
- Google Scholar
58. Lin W, Tong T, Gao Q, Guo D, Du X, Yang Y, et al. Convolutional Neural Networks-Based MRI Image Analysis for the Alzheimer’s Disease Prediction From Mild Cognitive Impairment. Frontiers in Neuroscience. 2018;12. pmid:30455622
- View Article
- PubMed/NCBI
- Google Scholar
59. Pagani M, Nobili F, Morbelli S, Arnaldi D, Giuliani A, Öberg J, et al. Early identification of MCI converting to AD: a FDG PET study. European Journal of Nuclear Medicine and Molecular Imaging. 2017;44:2042–2052. pmid:28664464
- View Article
- PubMed/NCBI
- Google Scholar
60. Nozadi SH, Kadoury S. Classification of Alzheimer’s and MCI Patients from Semantically Parcelled PET Images: A Comparison between AV45 and FDG-PET. International Journal of Biomedical Imaging. 2018;2018:1–13. pmid:29736165
- View Article
- PubMed/NCBI
- Google Scholar
61. Mendelson AF, Zuluaga MA, Lorenzi M, Hutton BF, Ourselin S. Selection bias in the reported performances of AD classification pipelines. NeuroImage: Clinical. 2017;14:400–416. pmid:28271040
- View Article
- PubMed/NCBI
- Google Scholar
62. Yuan W, Beaulieu-Jones BK, Yu KH, Lipnick SL, Palmer N, Loscalzo J, et al. Temporal bias in case-control design: preventing reliable predictions of the future. Nature Communications. 2021;12.
- View Article
- Google Scholar
63. Shishegar R, Cox T, Rolls D, Bourgeat P, Doré V, Lamb F, et al. Using imputation to provide harmonized longitudinal measures of cognition across AIBL and ADNI. Scientific Reports. 2021;11:23788. pmid:34893624
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Organization WH. Dementia; 2021. Available from: https://www.who.int/news-room/fact-sheets/detail/dementia.

[ref2] 2. for Disease Control C, Prevention. Leading causes of death; 2022. Available from: https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm.

[ref3] 3. James BD, Bennett DA. Causes and Patterns of Dementia: An Update in the Era of Redefining Alzheimer’s Disease. Annual Review of Public Health. 2019;40(1):65–84. pmid:30642228
View Article
PubMed/NCBI
Google Scholar

[4] View Article

[5] PubMed/NCBI

[6] Google Scholar

[ref4] 4. Breijyeh Z, Karaman R. Comprehensive Review on Alzheimer’s Disease: Causes and Treatment. Molecules. 2020;25(24). pmid:33302541
View Article
PubMed/NCBI
Google Scholar

[8] View Article

[9] PubMed/NCBI

[10] Google Scholar

[ref5] 5. Munoz DG, Feldman H. Causes of Alzheimer’s disease. CMAJ. 2000;162(1):65–72. pmid:11216203
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref6] 6. LaMontagne PJ, Benzinger TL, Morris JC, Keefe S, Hornbeck R, Xiong C, et al. OASIS-3: Longitudinal Neuroimaging, Clinical, and Cognitive Dataset for Normal Aging and Alzheimer Disease. medRxiv. 2019.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref7] 7. Malone IB, Cash D, Ridgway GR, MacManus DG, Ourselin S, Fox NC, et al. MIRIAD—Public release of a multiple time point Alzheimer’s MR imaging dataset. NeuroImage. 2013;70:33–36. pmid:23274184
View Article
PubMed/NCBI
Google Scholar

[19] View Article

[20] PubMed/NCBI

[21] Google Scholar

[ref8] 8. Birkenbihl C, Westwood S, Shi L, Nevado-Holgado A, Westman E, Lovestone S, et al. ANMerge: A Comprehensive and Accessible Alzheimer’s Disease Patient-Level Dataset. Journal of Alzheimer’s Disease. 2021;79:423–431. pmid:33285634
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref9] 9. Ellis KA, Bush AI, Darby D, De Fazio D, Foster J, Hudson P, et al. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. International Psychogeriatrics. 2009;21(4):672–687. pmid:19470201
View Article
PubMed/NCBI
Google Scholar

[27] View Article

[28] PubMed/NCBI

[29] Google Scholar

[ref10] 10. Beekly DL, Ramos EM, van Belle G, Deitrich W, Clark AD, Jacka ME, et al. The National Alzheimer’s Coordinating Center (NACC) Database: An Alzheimer Disease Database. Alzheimer Disease & Associated Disorders. 2004;18:270–277. pmid:15592144
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref11] 11. Zhang X, Mormino EC, Sun N, Sperling RA, Sabuncu MR, Yeo BT, et al. Bayesian model reveals latent atrophy factors with dissociable cognitive trajectories in Alzheimer’s disease. Proceedings of the National Academy of Sciences. 2016;113(42):E6535–E6544. pmid:27702899
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref12] 12. Kivipelto M, Mangialasche F, Ngandu T. Lifestyle interventions to prevent cognitive impairment, dementia and Alzheimer disease. Nature Reviews Neurology. 2018;14:653–666. pmid:30291317
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref13] 13. Mangialasche F, Solomon A, Winblad B, Mecocci P, Kivipelto M. Alzheimer’s disease: clinical trials and drug development. The Lancet Neurology. 2010;9(7):702–716. pmid:20610346
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref14] 14. Cummings J, Lee G, Ritter A, Sabbagh M, Zhong K. Alzheimer’s disease drug development pipeline: 2019. Alzheimer’s & Dementia: Translational Research & Clinical Interventions. 2019;5:272–293. pmid:31334330
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref15] 15. Rosenberg PB, Mielke MM, Appleby BS, Oh ES, Geda YE, Lyketsos CG. The Association of Neuropsychiatric Symptoms in MCI with Incident Dementia and Alzheimer Disease. The American Journal of Geriatric Psychiatry. 2013;21(7):685–695. pmid:23567400
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref16] 16. Feldman H, Scheltens P, Scarpini E, Hermann N, Mesenbrink P, Mancione L, et al. Behavioral symptoms in mild cognitive impairment. Neurology. 2004;62(7):1199–1201. pmid:15079026
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref17] 17. Grueso S, Viejo-Sobera R. Machine learning methods for predicting progression from mild cognitive impairment to Alzheimer’s disease dementia: a systematic review. Alzheimer’s Research & Therapy. 2021;13. pmid:34583745
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref18] 18. Chen Y, Denny KG, Harvey D, Farias ST, Mungas D, DeCarli C, et al. Progression from normal cognition to mild cognitive impairment in a diverse clinic-based and community-based elderly cohort. Alzheimer’s & Dementia. 2017;13:399–405.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref19] 19. Peavy GM, Jacobson MW, Salmon DP, Gamst AC, Patterson TL, Goldman S, et al. The Influence of Chronic Stress on Dementia-related Diagnostic Change in Older Adults. Alzheimer Disease & Associated Disorders. 2012;26:260–266. pmid:22037597
View Article
PubMed/NCBI
Google Scholar

[66] View Article

[67] PubMed/NCBI

[68] Google Scholar

[ref20] 20. Popuri K, Balachandar R, Alpert K, Lu D, Bhalla M, Mackenzie IR, et al. Development and validation of a novel dementia of Alzheimer’s type (DAT) score based on metabolism FDG-PET imaging. NeuroImage: Clinical. 2018;18:802–813. pmid:29876266
View Article
PubMed/NCBI
Google Scholar

[70] View Article

[71] PubMed/NCBI

[72] Google Scholar

[ref21] 21. Yee E, Popuri K, Beg MF. Quantifying brain metabolism from FDG–PET images into a probability of Alzheimer’s dementia score. Human Brain Mapping. 2019. pmid:31507022
View Article
PubMed/NCBI
Google Scholar

[74] View Article

[75] PubMed/NCBI

[76] Google Scholar

[ref22] 22. Rathore S, Habes M, Iftikhar MA, Shacklett A, Davatzikos C. A review on neuroimaging-based classification studies and associated feature extraction methods for Alzheimer’s disease and its prodromal stages. NeuroImage. 2017;155:530–548. pmid:28414186
View Article
PubMed/NCBI
Google Scholar

[78] View Article

[79] PubMed/NCBI

[80] Google Scholar

[ref23] 23. Ocasio E, Duong TQ. Deep learning prediction of mild cognitive impairment conversion to Alzheimer’s disease at 3 years after diagnosis using longitudinal and whole-brain 3D MRI. PeerJ Computer Science. 2021;7:e560. pmid:34141888
View Article
PubMed/NCBI
Google Scholar

[82] View Article

[83] PubMed/NCBI

[84] Google Scholar

[ref24] 24. Pavisic IM, Nicholas JM, O’Connor A, Rice H, Lu K, Fox NC, et al. Disease duration in autosomal dominant familial Alzheimer disease. Neurology Genetics. 2020;6(5). pmid:33225064
View Article
PubMed/NCBI
Google Scholar

[86] View Article

[87] PubMed/NCBI

[88] Google Scholar

[ref25] 25. Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack CR, Jagust W, et al. Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimer’s & Dementia. 2005;1:55–66.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref26] 26. Donohue MC, Sperling RA, Salmon DP, Rentz DM, Raman R, Thomas RG, et al. The Preclinical Alzheimer Cognitive Composite. JAMA Neurology. 2014;71:961. pmid:24886908
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref27] 27. Donohue MC, Sperling RA, Petersen R, Sun CK, Weiner MW, Aisen PS, et al. Association Between Elevated Brain Amyloid and Subsequent Cognitive Decline Among Cognitively Normal Persons. JAMA. 2017;317:2305–2316. pmid:28609533
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref28] 28. Olsson A, Vanderstichele H, Andreasen N, De Meyer G, Wallin A, Holmberg B, et al. Simultaneous measurement of beta-amyloid(1-42), total tau, and phosphorylated tau (Thr181) in cerebrospinal fluid by the xMAP technology. Clinical Chemistry. 2005;51:336–345. pmid:15563479
View Article
PubMed/NCBI
Google Scholar

[101] View Article

[102] PubMed/NCBI

[103] Google Scholar

[ref29] 29. Jellinger KA, Janetzky B, Attems J, Kienzl E. Biomarkers for early diagnosis of Alzheimer disease: ‘ALZheimer ASsociated gene’- a new blood biomarker? Journal of Cellular and Molecular Medicine. 2008;12:1094–1117. pmid:18363842
View Article
PubMed/NCBI
Google Scholar

[105] View Article

[106] PubMed/NCBI

[107] Google Scholar

[ref30] 30. Jack CR, Bernstein MA, Fox NC, Thompson P, Alexander G, Harvey D, et al. The Alzheimer’s disease neuroimaging initiative (ADNI): MRI methods. Journal of Magnetic Resonance Imaging. 2008;27:685–691. pmid:18302232
View Article
PubMed/NCBI
Google Scholar

[109] View Article

[110] PubMed/NCBI

[111] Google Scholar

[ref31] 31. Jack CR, Barnes J, Bernstein MA, Borowski BJ, Brewer J, Clegg S, et al. Magnetic resonance imaging in Alzheimer’s Disease Neuroimaging Initiative 2. Alzheimer’s & Dementia. 2015;11:740–756. pmid:26194310
View Article
PubMed/NCBI
Google Scholar

[113] View Article

[114] PubMed/NCBI

[115] Google Scholar

[ref32] 32. Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage. 2006;31(3):968–980. pmid:16530430
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref33] 33. Fischl B, van der Kouwe A, Destrieux C, Halgren E, Ségonne F, Salat DH, et al. Automatically Parcellating the Human Cerebral Cortex. Cerebral Cortex. 2004;14(1):11–22. pmid:14654453
View Article
PubMed/NCBI
Google Scholar

[121] View Article

[122] PubMed/NCBI

[123] Google Scholar

[ref34] 34. Fischl B, Salat DH, van der Kouwe AJW, Makris N, Ségonne F, Quinn BT, et al. Sequence-independent segmentation of magnetic resonance images. NeuroImage. 2004;23(Supplement 1):S69–S84. pmid:15501102
View Article
PubMed/NCBI
Google Scholar

[125] View Article

[126] PubMed/NCBI

[127] Google Scholar

[ref35] 35. Fischl B, Dale AM. Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proceedings of the National Academy of Sciences of the United States of America. 2000;97(20):11050–11055. pmid:10984517
View Article
PubMed/NCBI
Google Scholar

[129] View Article

[130] PubMed/NCBI

[131] Google Scholar

[ref36] 36. Fischl B, Liu A, Dale AM. Automated manifold surgery: constructing geometrically accurate and topologically correct models of the human cerebral cortex. IEEE Medical Imaging. 2001;20(1):70–80. pmid:11293693
View Article
PubMed/NCBI
Google Scholar

[133] View Article

[134] PubMed/NCBI

[135] Google Scholar

[ref37] 37. Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, et al. Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain. Neuron. 2002;33:341–355. pmid:11832223
View Article
PubMed/NCBI
Google Scholar

[137] View Article

[138] PubMed/NCBI

[139] Google Scholar

[ref38] 38. Fischl B, Sereno MI, Tootell RBH, Dale AM. High-resolution intersubject averaging and a coordinate system for the cortical surface. Human Brain Mapping. 1999;8(4):272–284. pmid:10619420
View Article
PubMed/NCBI
Google Scholar

[141] View Article

[142] PubMed/NCBI

[143] Google Scholar

[ref39] 39. Jovicich J, Czanner S, Greve D, Haley E, van der Kouwe A, Gollub R, et al. Reliability in multi-site structural MRI studies: Effects of gradient non-linearity correction on phantom and human data. NeuroImage. 2006;30(2):436–443. pmid:16300968
View Article
PubMed/NCBI
Google Scholar

[145] View Article

[146] PubMed/NCBI

[147] Google Scholar

[ref40] 40. Kuperberg GR, Broome M, McGuire PK, David AS, Eddy M, Ozawa F, et al. Regionally localized thinning of the cerebral cortex in Schizophrenia. Archives of General Psychiatry. 2003;60:878–888. pmid:12963669
View Article
PubMed/NCBI
Google Scholar

[149] View Article

[150] PubMed/NCBI

[151] Google Scholar

[ref41] 41. Rosas HD, Liu AK, Hersch S, Glessner M, Ferrante RJ, Salat DH, et al. Regional and progressive thinning of the cortical ribbon in Huntington’s disease. Neurology. 2002;58(5):695–701. pmid:11889230
View Article
PubMed/NCBI
Google Scholar

[153] View Article

[154] PubMed/NCBI

[155] Google Scholar

[ref42] 42. Salat D, Buckner RL, Snyder AZ, Greve DN, Desikan RS, Busa E, et al. Thinning of the cerebral cortex in aging. Cerebral Cortex. 2004;14:721–730. pmid:15054051
View Article
PubMed/NCBI
Google Scholar

[157] View Article

[158] PubMed/NCBI

[159] Google Scholar

[ref43] 43. Segonne F, Dale AM, Busa E, Glessner M, Salat D, Hahn HK, et al. A hybrid approach to the skull stripping problem in MRI. NeuroImage. 2004;22(3):1060–1075. pmid:15219578
View Article
PubMed/NCBI
Google Scholar

[161] View Article

[162] PubMed/NCBI

[163] Google Scholar

[ref44] 44. Dale A, Fischl B, Sereno MI. Cortical Surface-Based Analysis: I. Segmentation and Surface Reconstruction. NeuroImage. 1999;9(2):179–194. pmid:9931268
View Article
PubMed/NCBI
Google Scholar

[165] View Article

[166] PubMed/NCBI

[167] Google Scholar

[ref45] 45. Fischl B, Sereno MI, Dale A. Cortical Surface-Based Analysis: II: Inflation, Flattening, and a Surface-Based Coordinate System. NeuroImage. 1999;9(2):195–207. pmid:9931269
View Article
PubMed/NCBI
Google Scholar

[169] View Article

[170] PubMed/NCBI

[171] Google Scholar

[ref46] 46. Han X, Jovicich J, Salat D, van der Kouwe A, Quinn B, Czanner S, et al. Reliability of MRI-derived measurements of human cerebral cortical thickness: The effects of field strength, scanner upgrade and manufacturer. NeuroImage. 2006;32(1):180–194. pmid:16651008
View Article
PubMed/NCBI
Google Scholar

[173] View Article

[174] PubMed/NCBI

[175] Google Scholar

[ref47] 47. Sled JG, Zijdenbos AP, Evans AC. A nonparametric method for automatic correction of intensity nonuniformity in MRI data. IEEE Trans Med Imaging. 1998;17:87–97. pmid:9617910
View Article
PubMed/NCBI
Google Scholar

[177] View Article

[178] PubMed/NCBI

[179] Google Scholar

[ref48] 48. Segonne F, Pacheco J, Fischl B. Geometrically accurate topology-correction of cortical surfaces using nonseparating loops. IEEE Trans Med Imaging. 2007;26:518–529. pmid:17427739
View Article
PubMed/NCBI
Google Scholar

[181] View Article

[182] PubMed/NCBI

[183] Google Scholar

[ref49] 49. Reuter M, Rosas HD, Fischl B. Highly Accurate Inverse Consistent Registration: A Robust Approach. NeuroImage. 2010;53(4):1181–1196. pmid:20637289
View Article
PubMed/NCBI
Google Scholar

[185] View Article

[186] PubMed/NCBI

[187] Google Scholar

[ref50] 50. Reuter M, Fischl B. Avoiding Asymmetry-Induced Bias in Longitudinal Image Processing. NeuroImage. 2011;57(1):19–21. pmid:21376812
View Article
PubMed/NCBI
Google Scholar

[189] View Article

[190] PubMed/NCBI

[191] Google Scholar

[ref51] 51. Reuter M, Schmansky NJ, Rosas HD, Fischl B. Within-Subject Template Estimation for Unbiased Longitudinal Image Analysis. NeuroImage. 2012;61(4):1402–1418. pmid:22430496
View Article
PubMed/NCBI
Google Scholar

[193] View Article

[194] PubMed/NCBI

[195] Google Scholar

[ref52] 52. Jagust WJ, Bandy D, Chen K, Foster NL, Landau SM, Mathis CA, et al. The Alzheimer’s Disease Neuroimaging Initiative positron emission tomography core. Alzheimer’s & Dementia. 2010;6:221–229.
View Article
Google Scholar

[197] View Article

[198] Google Scholar

[ref53] 53. Jagust WJ, Landau SM, Koeppe RA, Reiman EM, Chen K, Mathis CA, et al. The Alzheimer’s Disease Neuroimaging Initiative 2 PET Core: 2015. Alzheimer’s & Dementia. 2015;11:757–771.
View Article
Google Scholar

[200] View Article

[201] Google Scholar

[ref54] 54. Hartig M, Truran-Sacrey D, Raptentsetsang S, Simonson A, Mezher A, Schuff N, et al. UCSF FreeSurfer Methods; 2014. Available from: https://adni.bitbucket.io/reference/docs/UCSFFSX51/UCSF%20FreeSurfer%20Methods%20and%20QC_OFFICIAL.pdf.
View Article
Google Scholar

[203] View Article

[204] Google Scholar

[ref55] 55. Campos S, Pizarro L, Valle C, Gray KR, Rueckert D, Allende H. Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study. In: Pardo A, Kittler J, editors. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Cham: Springer International Publishing; 2015. p. 3–10.

[ref56] 56. Wu Y, Zhang X, He Y, Cui J, Ge X, Han H, et al. Predicting Alzheimer’s disease based on survival data and longitudinally measured performance on cognitive and functional scales. Psychiatry Research. 2020;291:113201. pmid:32559670
View Article
PubMed/NCBI
Google Scholar

[207] View Article

[208] PubMed/NCBI

[209] Google Scholar

[ref57] 57. Li D, Iddi S, Aisen PS, Thompson WK, Donohue MC. The relative efficiency of time-to-progression and continuous measures of cognition in presymptomatic Alzheimer’s disease. Alzheimer’s & Dementia: Translational Research & Clinical Interventions. 2019;5:308–318. pmid:31367671
View Article
PubMed/NCBI
Google Scholar

[211] View Article

[212] PubMed/NCBI

[213] Google Scholar

[ref58] 58. Lin W, Tong T, Gao Q, Guo D, Du X, Yang Y, et al. Convolutional Neural Networks-Based MRI Image Analysis for the Alzheimer’s Disease Prediction From Mild Cognitive Impairment. Frontiers in Neuroscience. 2018;12. pmid:30455622
View Article
PubMed/NCBI
Google Scholar

[215] View Article

[216] PubMed/NCBI

[217] Google Scholar

[ref59] 59. Pagani M, Nobili F, Morbelli S, Arnaldi D, Giuliani A, Öberg J, et al. Early identification of MCI converting to AD: a FDG PET study. European Journal of Nuclear Medicine and Molecular Imaging. 2017;44:2042–2052. pmid:28664464
View Article
PubMed/NCBI
Google Scholar

[219] View Article

[220] PubMed/NCBI

[221] Google Scholar

[ref60] 60. Nozadi SH, Kadoury S. Classification of Alzheimer’s and MCI Patients from Semantically Parcelled PET Images: A Comparison between AV45 and FDG-PET. International Journal of Biomedical Imaging. 2018;2018:1–13. pmid:29736165
View Article
PubMed/NCBI
Google Scholar

[223] View Article

[224] PubMed/NCBI

[225] Google Scholar

[ref61] 61. Mendelson AF, Zuluaga MA, Lorenzi M, Hutton BF, Ourselin S. Selection bias in the reported performances of AD classification pipelines. NeuroImage: Clinical. 2017;14:400–416. pmid:28271040
View Article
PubMed/NCBI
Google Scholar

[227] View Article

[228] PubMed/NCBI

[229] Google Scholar

[ref62] 62. Yuan W, Beaulieu-Jones BK, Yu KH, Lipnick SL, Palmer N, Loscalzo J, et al. Temporal bias in case-control design: preventing reliable predictions of the future. Nature Communications. 2021;12.
View Article
Google Scholar

[231] View Article

[232] Google Scholar

[ref63] 63. Shishegar R, Cox T, Rolls D, Bourgeat P, Doré V, Lamb F, et al. Using imputation to provide harmonized longitudinal measures of cognition across AIBL and ADNI. Scientific Reports. 2021;11:23788. pmid:34893624
View Article
PubMed/NCBI
Google Scholar

[234] View Article

[235] PubMed/NCBI

[236] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Dataset

Input features

Models

Loss function

Experimental details

Results

Impact of modeling choices

Contribution of different biomarkers

Disease progression risk predictions

Discussion

Conclusion

Supporting information

S1 Table. Performance of each model in terms of AUC ROC for CN baseline participants.

S2 Table. Performance of each model in terms of AUC ROC for MCI baseline participants.

S1 Fig. Predictive performance of NMMs with different architectures.

S1 Text. Details of NMM (3-layer) and NMM (optimized).

Acknowledgments

References