Applying Machine Learning Methods in Medical Research Studies

Similar documents
Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

RISK PREDICTION MODEL: PENALIZED REGRESSIONS

Summary of main challenges and future directions

MODEL SELECTION STRATEGIES. Tony Panzarella

What is Regularization? Example by Sean Owen

Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University

MS&E 226: Small Data

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

Variable selection should be blinded to the outcome

CHAPTER VI RESEARCH METHODOLOGY

3. Model evaluation & selection

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Network-based pattern recognition models for neuroimaging

Selection and Combination of Markers for Prediction

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

investigate. educate. inform.

(CORRELATIONAL DESIGN AND COMPARATIVE DESIGN)

Biostatistics II

Ridge regression for risk prediction

Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Levenberg-Marquardt Algorithm

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Regression Discontinuity Analysis

Technical Specifications

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

CSE 258 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis

An Introduction to Bayesian Statistics

SISCR Module 7 Part I: Introduction Basic Concepts for Binary Biomarkers (Classifiers) and Continuous Biomarkers

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Analysis Shenyang Guo, Ph.D.

4. Model evaluation & selection

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Introduction to Computational Neuroscience

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

CHAPTER 4 RESULTS. In this chapter the results of the empirical research are reported and discussed in the following order:

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Study Guide #2: MULTIPLE REGRESSION in education

Chapter 1: Explaining Behavior

Introduction to Computational Neuroscience

Identification of Tissue Independent Cancer Driver Genes

Inferential Statistics

Multiple Regression Analysis

Multivariable Systems. Lawrence Hubert. July 31, 2011

Sum of Neurally Distinct Stimulus- and Task-Related Components.

Multiple Regression Models

DATA MANAGEMENT & TYPES OF ANALYSES OFTEN USED. Dennis L. Molfese University of Nebraska - Lincoln

IAPT: Regression. Regression analyses

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

SISCR Module 4 Part III: Comparing Two Risk Models. Kathleen Kerr, Ph.D. Associate Professor Department of Biostatistics University of Washington

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Combining machine learning and matching techniques to improve causal inference in program evaluation

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Template 1 for summarising studies addressing prognostic questions

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Methods for Addressing Selection Bias in Observational Studies

A Comparative Study of Some Estimation Methods for Multicollinear Data

11/24/2017. Do not imply a cause-and-effect relationship

Supplementary Materials

Experimental Psychology

Testing Statistical Models to Improve Screening of Lung Cancer

Unit 1 Exploring and Understanding Data

Lecture Outline Biost 517 Applied Biostatistics I

Clustering Autism Cases on Social Functioning

Voxel-based Lesion-Symptom Mapping. Céline R. Gillebert

VARIABLE SELECTION WHEN CONFRONTED WITH MISSING DATA

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b

Outline of Part III. SISCR 2016, Module 7, Part III. SISCR Module 7 Part III: Comparing Two Risk Models

Chapter 1: Exploring Data

WELCOME! Lecture 11 Thommy Perlinger

Bringing machine learning to the point of care to inform suicide prevention

Assessment of a disease screener by hierarchical all-subset selection using area under the receiver operating characteristic curves

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

Reliability of Ordination Analyses

Novel Machine Learning Methods for ERP Analysis: A Validation From Research on Infants at Risk for Autism

Gene expression correlates of clinical prostate cancer behavior

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Homo heuristicus and the bias/variance dilemma

Part 8 Logistic Regression

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

A COMBINATORY ALGORITHM OF UNIVARIATE AND MULTIVARIATE GENE SELECTION

Transcription:

Applying Machine Learning Methods in Medical Research Studies Daniel Stahl Department of Biostatistics and Health Informatics Psychiatry, Psychology & Neuroscience (IoPPN), King s College London daniel.r.stahl@kcl.ac.uk 1

Machine learning in medical research Machine learning (ML) explores the study and construction of algorithms that can learn from and make predictions on data (Mund 2013) Machine learning is concerned with prediction and automated model building Typically used to analyse large, complex datasets with the number of variables often larger than sample szie (p>>n): Bioinformatics Brain imaging (fmri) Clinical prediction models from large databases 2

Multiple tests: DNA micro arrays The aim of DNA micro array experiments is to detect differential gene expression. E.g. to identify genes expression changes under different treatment conditions or among different types of cell samples. Often hundred of thousands of genes are tested on the array for expression changes. R Jansen et al (2016)Gene expression in major depressive disorder. Mol. Psych. 21, 339 347; 3

3-Dimensional Data fmri: Detect changes in brain activity somewhere in the brain: 64x64 voxel matrix 43 slices in the brain = 176128 voxels About 1/3 of this area is studied 50,000 voxels/ hypotheses tests Add time as 4 th dimension and you will get into 100.000s 1 2 3 43 Does the brain activity change after a stimulus? Images: T. Nichols:www.sph.umich.edu/~nichols/FDR/

Clinical Prediction Models inform health care providers and patients about the risk of developing an disease, the risk of the presence of a disease and about the future course of an illness based on currently available information about the patient: Main types of clinical prediction models Risk prediction models Diagnostic models Prognostic models: Prediction models: predicts likely benefit of treatment (Personalized medicine)

Example: Chronic fatigue syndrome (CFS) Risk prediction What is the risk of a person who has family members with CFS to develop the condition themselves? Diagnosis Can we reliable diagnose a patient to have CFS and not suffering from an illness with similar symptoms (i.e. depression)? Prognosis: How likely is it that a patient diagnosed with CFS will recover if untreated? Prediction and Personalized medicine: Which of the three main treatments will most likely benefit a patient most? Which patients are most likely to benefit from Cognitive behavioural therapy?

Machine learning (ML) Typically used to analyse large, complex datasets with the number of variables often larger than sample szie (p>>n): Bioinformatics Brain imaging Clinical prediction models from large databases Risk prediction models Diagnostic models Prognostic models: Prediction of treatment success (Personalized Medicne) Statistical modelling is predominately used in other medical research areas. Why?

Outline Why are Machine learning methods not more widely used in many medical research areas? Differences between machine learning and statistical modelling Can we implement machine learning methods alongside statistical modelling to improve medical research? Example: Problems of low reproducibility in medical research due to selective inference (i.e. multiple testing) Example form autism research using lasso regression methods Outlook 8

Describing and explaining the world Statistical modelling is the formalization of relationships between variables in the form of mathematical equations. we infer the process by which data was generated! theory-driven Statistical models, such as regression, are therefore typically used for explanatory research to assess causal hypotheses that explain why and how empirical phenomena occur (Gregor 2006) explanatory research usually infers from a random sample to an expected mean response of the underlying population. 9

10

Example of Explaining the world D. Stahl et al. 2014) Mechanisms of change underlying the efficacy of cognitive behaviour therapy for chronic fatigue syndrome : a mediation analysis. Psychological Medicine (2014), 44, 1331 1344

Statistical modelling Construct causes construct, via the function, such that =. and are operationalized by measurable variables X and Y and the model by a statistical model f, such as E(Y)=f(X) In explanatory research we aim to match f as closely as possible to to assess (or develop new) theoretical hypotheses/models. X and Y are tools for estimating to assess causal hypotheses Theoretical prediction: theory predicts an association between X and Y) and less empirical prediction 12

Machine Learning In machine learning the true data generating process is less of importance: X and Y are of interest and the function f is used as a tool for generating accurate predictions of unseen Y values: Even if = describes the causal relationships other functions than and variables other than X can be better predictors of Y. ML algorithms are optimized for the purpose of predicting new or future observations, while in explanatory research minimizing the bias is the key criterion to select a best model Implications for model selection and estimation: 13

Expected Prediction Error We assume that there is a relationship between outcome Y (depression score) and at least one of the p independent variables X (clinical and demographic characteristics) We can model the relationships as: and where is an unknown function random error with mean 0 and variance σ Then the expected prediction error (EPE) is EPE = E(Y ) 2 )) σ 14

Bias and Variance )) + σ Bias 2 error + Model estimation variance + Irreducible error Bias is result of misspecifying the model f Estimation variance is result of using sample to estimate f() σ is irreducible error even if the model f is correctly specified and estimated SD1 15

Slide 15 SD1 remove? Stahl, Daniel, 03/09/2016

Explanatory vs. Prediction Modelling Explanatory modelling: minimizing model bias and then sampling variance by minimizing the residual sum of squares between observed and predicted responses of the same data set (i.e. BLUE). Machine learning and predictive modelling: minimizing combination of model bias and sampling variance by minimizing the residual sum of squares between observed and predicted responses of an unseen (new) data set (using e.g. cross-validation) Theoretical model itself is not of interest ( Black box ) Tension between explanatory and predictive modelling because the best explanatory model may differ from the best predictive model (Sober 2006). 16

Explanatory research Prediction Who is interested? Mainly academia Until recently: more outside Academia Methodology Main interest Statistical modelling using inference, usually using probability models Unbiased estimates (correct model) Machine learning: learn from data in order to make predictions Prediction of Y for unseen cases Biggest threat Confounding Poor prediction of new cases Omitted variables Explained Variance r 2 Multicollinearity Missing data Measurement error Omission of confounder variables can invalidate conclusions Nice but not necessary, large sample size may compensate Major concern because accurate parameter estimates are under threat Major concern: potential serious bias, major developments in last 30 years Great concern because measurement error in predictors produces bias can Mostly ignored because interpretation of model is not of interest Crucial Minor concern because interpretation is not of importance Hardly addressed, missing data indicator is often sufficient for model Variables with many missing data are useless Only concern because of loss of predictive power Adapted from: Paul Allison (2014) http://statisticalhorizons.com/prediction-vs-causation-in-regression-analysis

Explanatory research and selective inference The explanatory approach using statistical modelling works very well with a few well specified models. However, with a large number of variables and applying variable selection procedures to select a subset of variables problems with overfitting and the validity of statistical inference will occur. Presenting a final reduced model ignoring the assessments of excluded variables (selective inference) greatly exaggerates the apparent strength of a predictor and the assessment of its significance! Selective inference is a main causes in the low reproducibility of studies in medical research or other sciences. 18

Cited 4441 times

Using ML for explanatory modelling Machine learning is mainly used for prediction modelling Can we use machine learning for medical research besides prediction modelling? Can we use it for explanatory modelling in theory building and theory testing? Can we use it for reducing the problem of over-fitting and selective inference? 20

ML methods for explanatory research A model that achieve both explanation and prediction - will have to compromise: If causal explanation is main purpose we can assess predictive ability of a causal model and may modify the model if predictive ability is not satisfactory. If prediction is primary aim we can build a prediction model that is relatively transparent but sacrifice some predictive power. Regularized models are a promising method to use ML for explanatory research 21

Statistical Learning In recent years ML and Statistics are merging into Statistical Learning Theory The theory describes the properties of learning algorithms in a probabilistic framework and say how well algorithms can be expected to do at producing rules with minimum expected error on new cases. Statistical learning methods can be used to adjust regression models with good prediction modelling abilities and allowing to understand the process of predication! penalized or regularized methods

Regularized regressions methods A modern approach to prediction modelling are regularized or penalized methods that can be applied to both large data sets (bioinformatics, neuroimaging) and small data sets with a large number of variables (RCTs, experimental studies, cohort studies). Not really new: Ridge regression: Arthur Hoerl and Robert Kennard (1970) limited computer power restricted their use 23

Loss function The performance of our model function is measured by a loss function for penalizing error in prediction. Popular one is the squared loss function We decide to choose the function f(x) which minimizes the expected loss or here the expected mean squared error (MSE). The expected MSE can be estimated by cross-validation or bootstrapping methods. In the linear regression case, the OLS usually does not provide the best solution. 2 24

Avoiding overfitting by penalising When a model over-fits the data, standard estimates of regression coefficients are inflated or unstable: poor prediction and generalization power Shrinkage of regression coefficients is an important technique to battle overfitting and to improve prediction accuracy in ML Estimates can be stabilised (regularised) by adding a penalty to the estimating equations For linear regressions, the penalty is added to the sum of squared errors(sse): Penalty term 25

Penalty functions Three commonly used penalty functions are 1. Ridge penalty: sum of squared coefficients ( 2 ) forms the penalty Also called L2 norm 2. LASSO (Least Absolute Shrinkage and Selection Operator): sum of absolute coefficients ( ) forms the penalty Also called L1 norm 3. Elastic net a combination of L1 and L2 norm regularization 26

Regularized LASSO regression LASSO penalize the size of the regression coefficients based on their norm: LASSO tends to select one variable from a group of correlated variables automatic variable selection 27

How to select Lambda? Goal is to evaluate the model in terms of it s ability to predict future observations: the model need to be evaluated on a dataset that was not used to build the model (test sets) We assess different lambdas and choose the one which predicts best unseen cases using cross-validation This best lambda is used fit the model using the complete data set 28

Example: Regularized regression Data set with 21 predictor variables, N=100 Regression coefficients of final model (lambda =0.17): Variable 1 to 20: 0 Variable 21: 0.47 29

Using hold-out data for prediction accuracy estimation Using CV to select optimal selects the best set of predictors of unseen cases. However: Prediction accuracy measures are over-optimistic estimates for accuracy of future sample: CV test data were used to select our model! Often ignored! 30

Nested Cross-validation We need to retain an independent test dataset that shall never be used for parameter tuning. Or else, our results will be over-optimistic. Internal validation fold Model selection fold from: Sebastian Raschka, https://www.quora.com/i-train-my-system-based-on-the-10-fold-cross-validation-framework-nowit-gives-me-10-different-models-which-model-to-select-as-a-representative 31

Application of Lasso Example of prediction model to classify infants at risk for autism based on event-related potential (ERP) studies 32

Event-related potential (ERP) ERPs are electrical brain activity recordings that are direct result of a specific stimulus (external or internal event). ERPs are commonly quantified by measuring the amplitude and latency of observable peaks of the signal time-locked stimulus These signals provide real-time indices of neural information processing and allow assessing cognitive processes. From:http://www.uel.ac.uk/ircd/babylab/eeglab.htm 33

Applying Lasso to ERP data Elsabbagh et al. (2009) investigated ERP correlates of eye gaze processing in 36 ten-months old infants. Two groups: infants with autistic siblings ( at risk -group ) versus control infants Two experimental conditions: direct gaze and averted gaze 18 averaged ERP measurements per experimental condition from (= 36 measurements per infant) were available Standard logistic regression is not possible Univariate tests (or stepwise model selection) will result in inflated alpha errors Methodological tasks: How to best to classify infants as to their group (at-risk group versus control group) Which ERP signals are responsible for group differences? 34

Regularized logistic regression Lasso In logistic regression we replace the residual sum of squares by the corresponding residual sum of the squared deviance residuals (equivalent to -2LogLiklihood or Deviance) 35

Methods used Regularized logistic regression Lasso Nested cross-validation 500 outer loops with 10 fold cross-validation to estimate prediction accuracy (internal validity) 100 inner loops with 10 fold cross validation to select optimal lambda Final model based on averaged lambda Measures of predictive power: Area under the curve (AUC) or c statistics Sensitivity Specifity % correct classification KCL1 36

Slide 36 KCL1 Such a statistic is an estimated conditional probability that for any pair of case and control, the predicted risk of an event is higher for the case [6]. King's College London, 02/09/2016

Nested Cross-validation 37

Results The final model selected 8 out of 36 variables. P400lat_D_Occ shows the largest effect size and is the only electrode selected if we select the tolerance (minimal + 1SE = stronger penalty)) Electrode st. Odds ratio N1_D_Left 1.30 N1_A_Left 1.61 N2_D_Left 1.16 N2_A_Right 1.54 N1Lat_D_Right 1.19 N1Lat_D_Cent 1.09 P400_D_Occ 1.22 P400lat_D_Occ 2.44 (IQR) 38

Internal validity Internal validity estimates (95% range) based on nested CV: Area under the curve (AUC): 0.63 (0.52 to 0.71) Accuracy: 58.3% (50% to 66.7%) Specificity for risk group: 52.9% (47.2% to 63.2%) Sensitivity for risk group: 63.2% (57.2% to 64.7%) Permutation test showed that measures are above chance. 39

Summary of results Lasso regression predicted group membership above chance, which suggest that at-risk group is different from control group. Automatic variable selection suggest that prolonged latency of occipital P400 ERP electrode in direct gaze condition is mainly responsible for discrimination. This signal is known to be sensitive to face processing in infants Atypical response in at 10 months old at risk -group infants? In agreement with theory about development of autism! 40

Case-control studies Before concluding that an individual study's conclusions are valid, one must consider three sources of error that might provide an alternative explanation for the findings. These are: Random Error Confounding Other Bias: e.g. selection bias (control group parents are more social ) Cross-validation only controls for random error!

Conclusion We need to be aware of the differences of explanatory and prediction modelling research and their methodologies Machine learning methods are prediction models: build models which minimizes the prediction error of unseen data Machine learning methods combined with cross-validation can also be useful in explanatory research or causal modelling. Regularization ML methods are useful to increase the reproducibility of studies in medical research by avoiding avoid multiple testing problems/model selection problems 42

Outlook Medical research should integrate machine learning as a promising tool for explanatory research, such as: New theory generation/explorative data analyses Measurement development Comparison of competing theories Improvement of existing models Relevance assessment Assessment of predictive power of empirical models ML can be used for discovery of new constructs, identifying relationships, refinement of existing models and identifying unknown pattern. ML community can also learn from Statisticians: confounding, sampling bias, missing data Combining machine learning and statistical methods ( Statistical Learning ) are of great interest to improve both explanatory medical research. 43

Thank you for your attention Thanks to Mayada Elsabbagh (McGill) and Mark H. Johnson (Birkbeck) and the BASIS team for providing their data Shmueli, G. (2010) To explain or to predict? Statistical Science 25, 289-310. Gregor, S (2006) The nature of theory in information systems. MIS Quarterly 30, 611-42 Sober (2006) Parsimony: In The Philosophy of Science: An Encyclopaedia. Routledge: Oxford 44