Applying Machine Learning Methods in Medical Research Studies Daniel Stahl Department of Biostatistics and Health Informatics Psychiatry, Psychology & Neuroscience (IoPPN), King s College London daniel.r.stahl@kcl.ac.uk 1
Machine learning in medical research Machine learning (ML) explores the study and construction of algorithms that can learn from and make predictions on data (Mund 2013) Machine learning is concerned with prediction and automated model building Typically used to analyse large, complex datasets with the number of variables often larger than sample szie (p>>n): Bioinformatics Brain imaging (fmri) Clinical prediction models from large databases 2
Multiple tests: DNA micro arrays The aim of DNA micro array experiments is to detect differential gene expression. E.g. to identify genes expression changes under different treatment conditions or among different types of cell samples. Often hundred of thousands of genes are tested on the array for expression changes. R Jansen et al (2016)Gene expression in major depressive disorder. Mol. Psych. 21, 339 347; 3
3-Dimensional Data fmri: Detect changes in brain activity somewhere in the brain: 64x64 voxel matrix 43 slices in the brain = 176128 voxels About 1/3 of this area is studied 50,000 voxels/ hypotheses tests Add time as 4 th dimension and you will get into 100.000s 1 2 3 43 Does the brain activity change after a stimulus? Images: T. Nichols:www.sph.umich.edu/~nichols/FDR/
Clinical Prediction Models inform health care providers and patients about the risk of developing an disease, the risk of the presence of a disease and about the future course of an illness based on currently available information about the patient: Main types of clinical prediction models Risk prediction models Diagnostic models Prognostic models: Prediction models: predicts likely benefit of treatment (Personalized medicine)
Example: Chronic fatigue syndrome (CFS) Risk prediction What is the risk of a person who has family members with CFS to develop the condition themselves? Diagnosis Can we reliable diagnose a patient to have CFS and not suffering from an illness with similar symptoms (i.e. depression)? Prognosis: How likely is it that a patient diagnosed with CFS will recover if untreated? Prediction and Personalized medicine: Which of the three main treatments will most likely benefit a patient most? Which patients are most likely to benefit from Cognitive behavioural therapy?
Machine learning (ML) Typically used to analyse large, complex datasets with the number of variables often larger than sample szie (p>>n): Bioinformatics Brain imaging Clinical prediction models from large databases Risk prediction models Diagnostic models Prognostic models: Prediction of treatment success (Personalized Medicne) Statistical modelling is predominately used in other medical research areas. Why?
Outline Why are Machine learning methods not more widely used in many medical research areas? Differences between machine learning and statistical modelling Can we implement machine learning methods alongside statistical modelling to improve medical research? Example: Problems of low reproducibility in medical research due to selective inference (i.e. multiple testing) Example form autism research using lasso regression methods Outlook 8
Describing and explaining the world Statistical modelling is the formalization of relationships between variables in the form of mathematical equations. we infer the process by which data was generated! theory-driven Statistical models, such as regression, are therefore typically used for explanatory research to assess causal hypotheses that explain why and how empirical phenomena occur (Gregor 2006) explanatory research usually infers from a random sample to an expected mean response of the underlying population. 9
10
Example of Explaining the world D. Stahl et al. 2014) Mechanisms of change underlying the efficacy of cognitive behaviour therapy for chronic fatigue syndrome : a mediation analysis. Psychological Medicine (2014), 44, 1331 1344
Statistical modelling Construct causes construct, via the function, such that =. and are operationalized by measurable variables X and Y and the model by a statistical model f, such as E(Y)=f(X) In explanatory research we aim to match f as closely as possible to to assess (or develop new) theoretical hypotheses/models. X and Y are tools for estimating to assess causal hypotheses Theoretical prediction: theory predicts an association between X and Y) and less empirical prediction 12
Machine Learning In machine learning the true data generating process is less of importance: X and Y are of interest and the function f is used as a tool for generating accurate predictions of unseen Y values: Even if = describes the causal relationships other functions than and variables other than X can be better predictors of Y. ML algorithms are optimized for the purpose of predicting new or future observations, while in explanatory research minimizing the bias is the key criterion to select a best model Implications for model selection and estimation: 13
Expected Prediction Error We assume that there is a relationship between outcome Y (depression score) and at least one of the p independent variables X (clinical and demographic characteristics) We can model the relationships as: and where is an unknown function random error with mean 0 and variance σ Then the expected prediction error (EPE) is EPE = E(Y ) 2 )) σ 14
Bias and Variance )) + σ Bias 2 error + Model estimation variance + Irreducible error Bias is result of misspecifying the model f Estimation variance is result of using sample to estimate f() σ is irreducible error even if the model f is correctly specified and estimated SD1 15
Slide 15 SD1 remove? Stahl, Daniel, 03/09/2016
Explanatory vs. Prediction Modelling Explanatory modelling: minimizing model bias and then sampling variance by minimizing the residual sum of squares between observed and predicted responses of the same data set (i.e. BLUE). Machine learning and predictive modelling: minimizing combination of model bias and sampling variance by minimizing the residual sum of squares between observed and predicted responses of an unseen (new) data set (using e.g. cross-validation) Theoretical model itself is not of interest ( Black box ) Tension between explanatory and predictive modelling because the best explanatory model may differ from the best predictive model (Sober 2006). 16
Explanatory research Prediction Who is interested? Mainly academia Until recently: more outside Academia Methodology Main interest Statistical modelling using inference, usually using probability models Unbiased estimates (correct model) Machine learning: learn from data in order to make predictions Prediction of Y for unseen cases Biggest threat Confounding Poor prediction of new cases Omitted variables Explained Variance r 2 Multicollinearity Missing data Measurement error Omission of confounder variables can invalidate conclusions Nice but not necessary, large sample size may compensate Major concern because accurate parameter estimates are under threat Major concern: potential serious bias, major developments in last 30 years Great concern because measurement error in predictors produces bias can Mostly ignored because interpretation of model is not of interest Crucial Minor concern because interpretation is not of importance Hardly addressed, missing data indicator is often sufficient for model Variables with many missing data are useless Only concern because of loss of predictive power Adapted from: Paul Allison (2014) http://statisticalhorizons.com/prediction-vs-causation-in-regression-analysis
Explanatory research and selective inference The explanatory approach using statistical modelling works very well with a few well specified models. However, with a large number of variables and applying variable selection procedures to select a subset of variables problems with overfitting and the validity of statistical inference will occur. Presenting a final reduced model ignoring the assessments of excluded variables (selective inference) greatly exaggerates the apparent strength of a predictor and the assessment of its significance! Selective inference is a main causes in the low reproducibility of studies in medical research or other sciences. 18
Cited 4441 times
Using ML for explanatory modelling Machine learning is mainly used for prediction modelling Can we use machine learning for medical research besides prediction modelling? Can we use it for explanatory modelling in theory building and theory testing? Can we use it for reducing the problem of over-fitting and selective inference? 20
ML methods for explanatory research A model that achieve both explanation and prediction - will have to compromise: If causal explanation is main purpose we can assess predictive ability of a causal model and may modify the model if predictive ability is not satisfactory. If prediction is primary aim we can build a prediction model that is relatively transparent but sacrifice some predictive power. Regularized models are a promising method to use ML for explanatory research 21
Statistical Learning In recent years ML and Statistics are merging into Statistical Learning Theory The theory describes the properties of learning algorithms in a probabilistic framework and say how well algorithms can be expected to do at producing rules with minimum expected error on new cases. Statistical learning methods can be used to adjust regression models with good prediction modelling abilities and allowing to understand the process of predication! penalized or regularized methods
Regularized regressions methods A modern approach to prediction modelling are regularized or penalized methods that can be applied to both large data sets (bioinformatics, neuroimaging) and small data sets with a large number of variables (RCTs, experimental studies, cohort studies). Not really new: Ridge regression: Arthur Hoerl and Robert Kennard (1970) limited computer power restricted their use 23
Loss function The performance of our model function is measured by a loss function for penalizing error in prediction. Popular one is the squared loss function We decide to choose the function f(x) which minimizes the expected loss or here the expected mean squared error (MSE). The expected MSE can be estimated by cross-validation or bootstrapping methods. In the linear regression case, the OLS usually does not provide the best solution. 2 24
Avoiding overfitting by penalising When a model over-fits the data, standard estimates of regression coefficients are inflated or unstable: poor prediction and generalization power Shrinkage of regression coefficients is an important technique to battle overfitting and to improve prediction accuracy in ML Estimates can be stabilised (regularised) by adding a penalty to the estimating equations For linear regressions, the penalty is added to the sum of squared errors(sse): Penalty term 25
Penalty functions Three commonly used penalty functions are 1. Ridge penalty: sum of squared coefficients ( 2 ) forms the penalty Also called L2 norm 2. LASSO (Least Absolute Shrinkage and Selection Operator): sum of absolute coefficients ( ) forms the penalty Also called L1 norm 3. Elastic net a combination of L1 and L2 norm regularization 26
Regularized LASSO regression LASSO penalize the size of the regression coefficients based on their norm: LASSO tends to select one variable from a group of correlated variables automatic variable selection 27
How to select Lambda? Goal is to evaluate the model in terms of it s ability to predict future observations: the model need to be evaluated on a dataset that was not used to build the model (test sets) We assess different lambdas and choose the one which predicts best unseen cases using cross-validation This best lambda is used fit the model using the complete data set 28
Example: Regularized regression Data set with 21 predictor variables, N=100 Regression coefficients of final model (lambda =0.17): Variable 1 to 20: 0 Variable 21: 0.47 29
Using hold-out data for prediction accuracy estimation Using CV to select optimal selects the best set of predictors of unseen cases. However: Prediction accuracy measures are over-optimistic estimates for accuracy of future sample: CV test data were used to select our model! Often ignored! 30
Nested Cross-validation We need to retain an independent test dataset that shall never be used for parameter tuning. Or else, our results will be over-optimistic. Internal validation fold Model selection fold from: Sebastian Raschka, https://www.quora.com/i-train-my-system-based-on-the-10-fold-cross-validation-framework-nowit-gives-me-10-different-models-which-model-to-select-as-a-representative 31
Application of Lasso Example of prediction model to classify infants at risk for autism based on event-related potential (ERP) studies 32
Event-related potential (ERP) ERPs are electrical brain activity recordings that are direct result of a specific stimulus (external or internal event). ERPs are commonly quantified by measuring the amplitude and latency of observable peaks of the signal time-locked stimulus These signals provide real-time indices of neural information processing and allow assessing cognitive processes. From:http://www.uel.ac.uk/ircd/babylab/eeglab.htm 33
Applying Lasso to ERP data Elsabbagh et al. (2009) investigated ERP correlates of eye gaze processing in 36 ten-months old infants. Two groups: infants with autistic siblings ( at risk -group ) versus control infants Two experimental conditions: direct gaze and averted gaze 18 averaged ERP measurements per experimental condition from (= 36 measurements per infant) were available Standard logistic regression is not possible Univariate tests (or stepwise model selection) will result in inflated alpha errors Methodological tasks: How to best to classify infants as to their group (at-risk group versus control group) Which ERP signals are responsible for group differences? 34
Regularized logistic regression Lasso In logistic regression we replace the residual sum of squares by the corresponding residual sum of the squared deviance residuals (equivalent to -2LogLiklihood or Deviance) 35
Methods used Regularized logistic regression Lasso Nested cross-validation 500 outer loops with 10 fold cross-validation to estimate prediction accuracy (internal validity) 100 inner loops with 10 fold cross validation to select optimal lambda Final model based on averaged lambda Measures of predictive power: Area under the curve (AUC) or c statistics Sensitivity Specifity % correct classification KCL1 36
Slide 36 KCL1 Such a statistic is an estimated conditional probability that for any pair of case and control, the predicted risk of an event is higher for the case [6]. King's College London, 02/09/2016
Nested Cross-validation 37
Results The final model selected 8 out of 36 variables. P400lat_D_Occ shows the largest effect size and is the only electrode selected if we select the tolerance (minimal + 1SE = stronger penalty)) Electrode st. Odds ratio N1_D_Left 1.30 N1_A_Left 1.61 N2_D_Left 1.16 N2_A_Right 1.54 N1Lat_D_Right 1.19 N1Lat_D_Cent 1.09 P400_D_Occ 1.22 P400lat_D_Occ 2.44 (IQR) 38
Internal validity Internal validity estimates (95% range) based on nested CV: Area under the curve (AUC): 0.63 (0.52 to 0.71) Accuracy: 58.3% (50% to 66.7%) Specificity for risk group: 52.9% (47.2% to 63.2%) Sensitivity for risk group: 63.2% (57.2% to 64.7%) Permutation test showed that measures are above chance. 39
Summary of results Lasso regression predicted group membership above chance, which suggest that at-risk group is different from control group. Automatic variable selection suggest that prolonged latency of occipital P400 ERP electrode in direct gaze condition is mainly responsible for discrimination. This signal is known to be sensitive to face processing in infants Atypical response in at 10 months old at risk -group infants? In agreement with theory about development of autism! 40
Case-control studies Before concluding that an individual study's conclusions are valid, one must consider three sources of error that might provide an alternative explanation for the findings. These are: Random Error Confounding Other Bias: e.g. selection bias (control group parents are more social ) Cross-validation only controls for random error!
Conclusion We need to be aware of the differences of explanatory and prediction modelling research and their methodologies Machine learning methods are prediction models: build models which minimizes the prediction error of unseen data Machine learning methods combined with cross-validation can also be useful in explanatory research or causal modelling. Regularization ML methods are useful to increase the reproducibility of studies in medical research by avoiding avoid multiple testing problems/model selection problems 42
Outlook Medical research should integrate machine learning as a promising tool for explanatory research, such as: New theory generation/explorative data analyses Measurement development Comparison of competing theories Improvement of existing models Relevance assessment Assessment of predictive power of empirical models ML can be used for discovery of new constructs, identifying relationships, refinement of existing models and identifying unknown pattern. ML community can also learn from Statisticians: confounding, sampling bias, missing data Combining machine learning and statistical methods ( Statistical Learning ) are of great interest to improve both explanatory medical research. 43
Thank you for your attention Thanks to Mayada Elsabbagh (McGill) and Mark H. Johnson (Birkbeck) and the BASIS team for providing their data Shmueli, G. (2010) To explain or to predict? Statistical Science 25, 289-310. Gregor, S (2006) The nature of theory in information systems. MIS Quarterly 30, 611-42 Sober (2006) Parsimony: In The Philosophy of Science: An Encyclopaedia. Routledge: Oxford 44