Comparison of cross-validation and bagging for building a seasonal runoff forecast model

Size: px
Start display at page:

Download "Comparison of cross-validation and bagging for building a seasonal runoff forecast model"

Transcription

1 Comparison of cross-validation and bagging for building a seasonal runoff forecast Simon Schick, Ole Rössler, Rolf Weingartner University of Bern, Switzerland Institute of Geography Oeschger Centre for Climate Change Research 1/12

2 Why resampling? training selection testing (Hastie et al., 2009) 2/12

3 Why resampling? training selection testing (Hastie et al., 2009) small sample sizes 2/12

4 Why resampling? training selection testing (Hastie et al., 2009) small sample sizes weak relationships and large pool of candidate predictors 2/12

5 Why resampling? training selection testing (Hastie et al., 2009) small sample sizes weak relationships and large pool of candidate predictors Can we benefit from the s out of resampling? 2/12

6 Comparison of 2 approaches best guess (BGS) use all data points testing by leave-one-out cross-validation (LOO) bagged (BAG) bagging: bootstrap aggregating (Breiman, 1996) testing by out of bag predictions (OOB) 3/12

7 Regression Y = β 0 + β 1 x Q + β 2 x P + β 3 x T + ε Y i,j mean runoff for i = 30,60,90 days, starting with the 1 st and 16 th of every month (j = 1,...,24) centered and scaled 4/12

8 Regression Y = β 0 + β 1 x Q + β 2 x P + β 3 x T + ε Y i,j mean runoff for i = 30,60,90 days, starting with the 1 st and 16 th of every month (j = 1,...,24) centered and scaled x initial conditions, parametrized by runoff Q, precipitation P, and temperature T univariate screening for 10, 20,..., 720 days 4/12

9 Regression Y = β 0 + β 1 x Q + β 2 x P + β 3 x T + ε Y i,j mean runoff for i = 30,60,90 days, starting with the 1 st and 16 th of every month (j = 1,...,24) centered and scaled x initial conditions, parametrized by runoff Q, precipitation P, and temperature T univariate screening for 10, 20,..., 720 days β estimation by partial least squares at least the first PLS direction is selected ε residuals 4/12

10 (32 years) 66 catchments (no nesting) P and T out of E-OBS gridded data set (Haylock et al., 2008) 5/12

11 Hindcast experiment training selection testing (Hastie et al., 2009) 6/12

12 Hindcast experiment training selection testing leave one out predictor screening cross validation partial least squares best guess (BGS) testing: LOO 6/12

13 Hindcast experiment training selection testing leave one out predictor screening cross validation partial least squares best guess (BGS) testing: LOO bootstrap predictor screening cross validation partial least squares bagged (BAG) testing: OOB 6/12

14 Hindcast experiment training selection testing leave one out predictor screening cross validation partial least squares best guess (BGS) testing: LOO bootstrap predictor screening cross validation partial least squares bagged (BAG) testing: OOB leave one out empirical mean seasonal regime (SRG) testing: LOO 6/12

15 Hindcast experiment training selection testing leave one out predictor screening cross validation partial least squares best guess (BGS) testing: LOO bootstrap predictor screening cross validation partial least squares bagged (BAG) testing: OOB leave one out empirical mean seasonal regime (SRG) testing: LOO outer cross validation (8 folds) 6/12

16 Mean squared error of prediction (n=66) 7/12

17 Mean squared error of prediction H0: The less complex performs equal or even better than the more complex. H1: The more complex performs better. Table: p-values of right-sided t-test using paired differences of Ê MSP (only outer cross-validation) Y 30 Y 60 Y 90 SRG - BGS 0.21 >0.99 >0.99 SRG - BAG < BGS - BAG <0.01 <0.01 <0.01 8/12

18 Mean squared error of prediction How much reduces bagging Ê MSP? Table: Ê MSP and reduction (only outer cross-validation) Y 30 Y 60 Y 90 BGS BAG reduction % /12

19 LOO and OOB provide on average accurate estimates of prediction error. 10/12

20 LOO and OOB provide on average accurate estimates of prediction error. SRG outperforms BGS and BAG in many catchments. 10/12

21 LOO and OOB provide on average accurate estimates of prediction error. SRG outperforms BGS and BAG in many catchments. Most likely, BAG outperforms BGS on average. 10/12

22 LOO and OOB provide on average accurate estimates of prediction error. SRG outperforms BGS and BAG in many catchments. Most likely, BAG outperforms BGS on average. Error reduction is strongest, when BGS has low or no skill. 10/12

23 LOO and OOB provide on average accurate estimates of prediction error. SRG outperforms BGS and BAG in many catchments. Most likely, BAG outperforms BGS on average. Error reduction is strongest, when BGS has low or no skill. Suboptimal: Comparison rests on resampling. 10/12

24 runoff series and catchment boundaries: Landesanstalt für Umwelt, Messungen und Naturschutz Baden-Württemberg; Bayerisches Landesamt für Umwelt; Land Vorarlberg (data.vorarlberg.gv.at); Bundesministerium für Land- und Forstwirtschaft, Umwelt und Wasserwirtschaft Österreich; Schweizerisches Bundesamt für Umwelt precipitation and temperature series: E-OBS data set (EU-FP6 project ENSEMBLE, ensembles-eu.metoffice.com, and ECA&D, project ecad.eu) digital elevation : EU-DEM, produced using Copernicus data and information funded by the European Union Bernhard Wehren made available additional runoff data for the river Kander at Hondrich 11/12

25 Breiman, L.: Bagging Predictors. Machine Learning 24.2, , Garen, D. C.: Improved techniques in regression-based streamflow volume forecasting. Journal of Water Resources Planning and Management, 118.6, , Hastie, T., Tibshirani, R., and Friedman, J.: The Elements of Statistical Learning. Mining, Inference, and Prediction. Second Edition. Springer New York Inc., Haylock, M. R., Hofstra, N., Klein Tank, A. M. G., Klok, E. J., Jones, P. D., and New, M.: A European daily high-resolution gridded data set of surface temperature and precipitation for Journal of Geophysical Research: Atmospheres, 113, D20119, Mevik, B.-H., and Wehrens, R.: The pls Package: Principal Component and Partial Least Squares Regression in R. Journal of Statistical Software, 18, 2, experimental forecasts: 12/12

26 Ê MSP : mean squared error of prediction H0: The less complex performs equal or even better than the more complex H1: The more complex performs better Table: p-values of right-sided t-test (nonparametric bootstrap) using paired differences of Ê MSP (only outer cross-validation) Y 30 Y 60 Y 90 SRG - BGS 0.21 (0.21) >0.99 (>0.99) >0.99 (>0.99) SRG - BAG <0.01 (<0.01) 0.03 (0.02) 0.67 (0.69) BGS - BAG <0.01 (<0.01) <0.01 (<0.01) <0.01 (<0.01) 12/12

27 Ê MAP : mean absolute error of prediction (n=66) 12/12

28 Ê MAP : mean absolute error of prediction H0: The less complex performs equal or even better than the more complex H1: The more complex performs better Table: p-values of right-sided t-test (nonparametric bootstrap) using paired differences of Ê MAP (only outer cross-validation) Y 30 Y 60 Y 90 SRG - BGS 0.02 (0.01) 0.92 (0.92) >0.99 (>0.99) SRG - BAG <0.01 (<0.01) <0.01 (<0.01) 0.25 (0.25) BGS - BAG <0.01 (<0.01) <0.01 (<0.01) <0.01 (<0.01) 12/12

29 Nash-Sutcliffe Efficiency (n=66; six outliers in [-3.7,-1.8] are not shown for readability) 12/12

30 12/12

31 12/12

32 12/12

33 12/12

34 alpine: Landwasser, Davos (Y 60 ) 12/12

35 lake: Aabach, Hitzkirch (Y 60 ) 12/12

36 lowland: Töss, Neftenbach (Y 60 ) 12/12

37 regulated: Julia, Tiefencastel (Y 60 ) 12/12

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) A classification model for the Leiden proteomics competition Hoefsloot, H.C.J.; Berkenbos-Smit, S.; Smilde, A.K. Published in: Statistical Applications in Genetics

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Performance of Median and Least Squares Regression for Slightly Skewed Data

Performance of Median and Least Squares Regression for Slightly Skewed Data World Academy of Science, Engineering and Technology 9 Performance of Median and Least Squares Regression for Slightly Skewed Data Carolina Bancayrin - Baguio Abstract This paper presents the concept of

More information

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation Institute for Clinical Evaluative Sciences From the SelectedWorks of Peter Austin 2012 Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

More information

Spatiotemporal Regime of Climate & Streamflow in the US Great Lakes Basin

Spatiotemporal Regime of Climate & Streamflow in the US Great Lakes Basin Spatiotemporal Regime of Climate & Streamflow in the US Great Lakes Basin Boris Shmagin & Carol Johnston, South Dakota State University, Nir Y. Krakauer, City College of New York Introduction http://precedings.nature.com/documents/7/version/

More information

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15) ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer

More information

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California

Computer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface

More information

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11 Article from Forecasting and Futurism Month Year July 2015 Issue Number 11 Calibrating Risk Score Model with Partial Credibility By Shea Parkes and Brad Armstrong Risk adjustment models are commonly used

More information

An Improved Algorithm To Predict Recurrence Of Breast Cancer

An Improved Algorithm To Predict Recurrence Of Breast Cancer An Improved Algorithm To Predict Recurrence Of Breast Cancer Umang Agrawal 1, Ass. Prof. Ishan K Rajani 2 1 M.E Computer Engineer, Silver Oak College of Engineering & Technology, Gujarat, India. 2 Assistant

More information

BIOINFORMATICS ORIGINAL PAPER

BIOINFORMATICS ORIGINAL PAPER BIOINFORMATICS ORIGINAL PAPER Vol. 21 no. 9 2005, pages 1979 1986 doi:10.1093/bioinformatics/bti294 Gene expression Estimating misclassification error with small samples via bootstrap cross-validation

More information

Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008

Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Journal of Machine Learning Research 9 (2008) 59-64 Published 1/08 Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Jerome Friedman Trevor Hastie Robert

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables

More information

BayesRandomForest: An R

BayesRandomForest: An R BayesRandomForest: An R implementation of Bayesian Random Forest for Regression Analysis of High-dimensional Data Oyebayo Ridwan Olaniran (rid4stat@yahoo.com) Universiti Tun Hussein Onn Malaysia Mohd Asrul

More information

Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2

Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2 APPLIED PROBLEMS Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2 A. N. Polovinkin a, I. B. Krylov a, P. N. Druzhkov a, M. V. Ivanchenko a, I. B. Meyerov

More information

Random forest of modified risk factor on ischemic and hemorrhagic (Case study: Medicum Clinic, Tallinn, Estonia)

Random forest of modified risk factor on ischemic and hemorrhagic (Case study: Medicum Clinic, Tallinn, Estonia) Proceedings of the IConSSE FSM SWCU (2015), pp. MA.26 41 ISBN: 978-602-1047-21-7 MA.26 Random forest of modified risk factor on ischemic and hemorrhagic (Case study: Medicum Clinic, Tallinn, Estonia) Ria

More information

Switzerland. David Fäh. Universität Zürich. Institut für Sozial- und Präventivmedizin

Switzerland. David Fäh. Universität Zürich. Institut für Sozial- und Präventivmedizin Environment and cardiovascular disease in Switzerland David Fäh ISPM Zürich Aims of this meeting Evaluate potential for collaboration between Swiss TPH and ISPM Zürich Contribution ISPM Zürich: Concept

More information

Prediction of blood β-hydroxybutyrate content in early-lactation New Zealand dairy cows using milk infrared spectra

Prediction of blood β-hydroxybutyrate content in early-lactation New Zealand dairy cows using milk infrared spectra Prediction of blood β-hydroxybutyrate content in early-lactation New Zealand dairy cows using milk infrared spectra V. Bonfatti 1, S.-A. Turner 2, B. Kuhn-Sherlock 2, C. Phyn 2, J. Pryce 3,4 valentina.bonfatti@unipd.it

More information

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis DSC 4/5 Multivariate Statistical Methods Applications DSC 4/5 Multivariate Statistical Methods Discriminant Analysis Identify the group to which an object or case (e.g. person, firm, product) belongs:

More information

Statistics for EES Factorial analysis of variance

Statistics for EES Factorial analysis of variance Statistics for EES Factorial analysis of variance Dirk Metzler http://evol.bio.lmu.de/_statgen 1. July 2013 1 ANOVA and F-Test 2 Pairwise comparisons and multiple testing 3 Non-parametric: The Kruskal-Wallis

More information

3. Model evaluation & selection

3. Model evaluation & selection Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

More information

PRINCIPLES OF EFFECTIVE MACHINE LEARNING APPLICATIONS IN REAL-WORLD EVIDENCE

PRINCIPLES OF EFFECTIVE MACHINE LEARNING APPLICATIONS IN REAL-WORLD EVIDENCE PRINCIPLES OF EFFECTIVE MACHINE LEARNING APPLICATIONS IN REAL-WORLD EVIDENCE Prepared and Presented by: Gorana Capkun-Niggli, PhD, Global Head of Innovation, Health Economics and Outcomes Research, Novartis,

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information

Variable selection should be blinded to the outcome

Variable selection should be blinded to the outcome Variable selection should be blinded to the outcome Tamás Ferenci Manuscript type: Letter to the Editor Title: Variable selection should be blinded to the outcome Author List: Tamás Ferenci * (Physiological

More information

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen

More information

Mohammad Amin Asadi Zarch et al.,2015

Mohammad Amin Asadi Zarch et al.,2015 A discusion on the paper "Droughts in a warming climate: A global assessment of Standardized precipitation index (SPI) and Reconnaissance drought index (RDI)" Mohammad Amin Asadi Zarch et al.,2015 Reporter:PanCongcong

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

Generalizations and Extensions of the Probability of Superiority Effect Size Estimator

Generalizations and Extensions of the Probability of Superiority Effect Size Estimator Multivariate Behavioral Research, 48:208 219, 2013 Copyright Taylor & Francis Group, LLC ISSN: 0027-3171 print/1532-7906 online DOI: 10.1080/00273171.2012.738184 Generalizations and Extensions of the Probability

More information

ipred : Improved Predictors

ipred : Improved Predictors ipred : Improved Predictors This short manual is heavily based on Peters et al. (2002b) and needs some improvements. 1 Introduction In classification problems, there are several attempts to create rules

More information

q3_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

q3_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. q3_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) The relationship between the number of games won by a minor

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Propensity scores and causal inference using machine learning methods

Propensity scores and causal inference using machine learning methods Propensity scores and causal inference using machine learning methods Austin Nichols (Abt) & Linden McBride (Cornell) July 27, 2017 Stata Conference Baltimore, MD Overview Machine learning methods dominant

More information

Geoffrey Stewart Morrison 1,2. School of Electrical Engineering & Telecommunications, University of New South Wales, UNSW Sydney, NSW 2052, Australia

Geoffrey Stewart Morrison 1,2. School of Electrical Engineering & Telecommunications, University of New South Wales, UNSW Sydney, NSW 2052, Australia Research Report CALCULATION OF FORENSIC LIKELIHOOD RATIOS: USE OF MONTE CARLO SIMULATIONS TO COMPARE THE OUTPUT OF SCORE- BASED APPROACHES WITH TRUE LIKELIHOOD-RATIO VALUES Geoffrey Stewart Morrison,2

More information

Recursive Partitioning Methods for Data Imputation in the Context of Item Response Theory: A Monte Carlo Simulation

Recursive Partitioning Methods for Data Imputation in the Context of Item Response Theory: A Monte Carlo Simulation Psicológica (2018), 39, 88-117. doi: 10.2478/psicolj-2018-0005 Recursive Partitioning Methods for Data Imputation in the Context of Item Response Theory: A Monte Carlo Simulation Julianne M. Edwards *1

More information

Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients

Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients Muhammad Hasan Sidiq Kurniawan 1) 1)* Department of Statistics, Universitas Islam Indonesia hasansidiq@uiiacid

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

Model reconnaissance: discretization, naive Bayes and maximum-entropy. Sanne de Roever/ spdrnl

Model reconnaissance: discretization, naive Bayes and maximum-entropy. Sanne de Roever/ spdrnl Model reconnaissance: discretization, naive Bayes and maximum-entropy Sanne de Roever/ spdrnl December, 2013 Description of the dataset There are two datasets: a training and a test dataset of respectively

More information

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe

More information

COAL COMBUSTION RESIDUALS RULE STATISTICAL METHODS CERTIFICATION SOUTHERN ILLINOIS POWER COOPERATIVE (SIPC)

COAL COMBUSTION RESIDUALS RULE STATISTICAL METHODS CERTIFICATION SOUTHERN ILLINOIS POWER COOPERATIVE (SIPC) Regulatory Guidance Regulatory guidance provided in 40 CFR 257.90 specifies that a CCR groundwater monitoring program must include selection of the statistical procedures to be used for evaluating groundwater

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

A novel approach to estimation of the time to biomarker threshold: Applications to HIV

A novel approach to estimation of the time to biomarker threshold: Applications to HIV A novel approach to estimation of the time to biomarker threshold: Applications to HIV Pharmaceutical Statistics, Volume 15, Issue 6, Pages 541-549, November/December 2016 PSI Journal Club 22 March 2017

More information

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model

A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model Nevitt & Tam A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model Jonathan Nevitt, University of Maryland, College Park Hak P. Tam, National Taiwan Normal University

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

Computational Capacity and Statistical Inference: A Never Ending Interaction. Finbarr Sloane EHR/DRL

Computational Capacity and Statistical Inference: A Never Ending Interaction. Finbarr Sloane EHR/DRL Computational Capacity and Statistical Inference: A Never Ending Interaction Finbarr Sloane EHR/DRL Studies in Crop Variation I (1921) It has been estimated that Sir Ronald A. Fisher spent about 185

More information

Machine Learning Statistical Learning. Prof. Matteo Matteucci

Machine Learning Statistical Learning. Prof. Matteo Matteucci Machine Learning Statistical Learning Pro. Matteo Matteucci Statistical Learning Outline o What Is Statistical Learning? Why estimate? How do we estimate? The trade-o between prediction accuracy & model

More information

Walkability vs. Several Health Diagnoses for Klamath Falls, OR

Walkability vs. Several Health Diagnoses for Klamath Falls, OR Walkability vs. Several Health Diagnoses for Klamath Falls, OR John Ritter, Ph.D. Geomatics Dept, Oregon Tech Stephanie Van Dyke, MD, MPH Medical Director, Sky Lakes Wellness Center Katherine Pope, RN,

More information

ISIR: Independent Sliced Inverse Regression

ISIR: Independent Sliced Inverse Regression ISIR: Independent Sliced Inverse Regression Kevin B. Li Beijing Jiaotong University Abstract In this paper we consider a semiparametric regression model involving a p-dimensional explanatory variable x

More information

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin Key words : Bayesian approach, classical approach, confidence interval, estimation, randomization,

More information

Math 215, Lab 7: 5/23/2007

Math 215, Lab 7: 5/23/2007 Math 215, Lab 7: 5/23/2007 (1) Parametric versus Nonparamteric Bootstrap. Parametric Bootstrap: (Davison and Hinkley, 1997) The data below are 12 times between failures of airconditioning equipment in

More information

Radiotherapy Outcomes

Radiotherapy Outcomes in partnership with Outcomes Models with Machine Learning Sarah Gulliford PhD Division of Radiotherapy & Imaging sarahg@icr.ac.uk AAPM 31 st July 2017 Making the discoveries that defeat cancer Radiotherapy

More information

Applying Machine Learning Methods in Medical Research Studies

Applying Machine Learning Methods in Medical Research Studies Applying Machine Learning Methods in Medical Research Studies Daniel Stahl Department of Biostatistics and Health Informatics Psychiatry, Psychology & Neuroscience (IoPPN), King s College London daniel.r.stahl@kcl.ac.uk

More information

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do. Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and

More information

Spatio-temporal modeling of weekly malaria incidence in children under 5 for early epidemic detection in Mozambique

Spatio-temporal modeling of weekly malaria incidence in children under 5 for early epidemic detection in Mozambique Spatio-temporal modeling of weekly malaria incidence in children under 5 for early epidemic detection in Mozambique Katie Colborn, PhD Department of Biostatistics and Informatics University of Colorado

More information

Update of NOAA (NCEP, Climate Test Bed) Seasonal Forecast Activities

Update of NOAA (NCEP, Climate Test Bed) Seasonal Forecast Activities Update of NOAA (NCEP, Climate Test Bed) Seasonal Forecast Activities Stephen J. Lord Director NCEP Environmental Modeling Center NCEP: where America s climate, weather, and ocean services begin 1 Overview

More information

Impute vs. Ignore: Missing Values for Prediction

Impute vs. Ignore: Missing Values for Prediction Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 2013 Impute vs. Ignore: Missing Values for Prediction Qianyu Zhang, Ashfaqur Rahman, and Claire D Este

More information

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers Kai-Ming Jiang 1,2, Bao-Liang Lu 1,2, and Lei Xu 1,2,3(&) 1 Department of Computer Science and Engineering,

More information

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Predicting Breast Cancer Survival Using Treatment and Patient Factors Predicting Breast Cancer Survival Using Treatment and Patient Factors William Chen wchen808@stanford.edu Henry Wang hwang9@stanford.edu 1. Introduction Breast cancer is the leading type of cancer in women

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

PERFORMANCE OF THOMAS FIERING MODEL FOR GENERATING SYNTHETIC STREAMFLOW OF JAKHAM RIVER

PERFORMANCE OF THOMAS FIERING MODEL FOR GENERATING SYNTHETIC STREAMFLOW OF JAKHAM RIVER Plant Archives Vol. 18 No. 1, 2018 pp. 325-330 ISSN 0972-5210 PERFORMANCE OF THOMAS FIERING MODEL FOR GENERATING SYNTHETIC STREAMFLOW OF JAKHAM RIVER Priyanka Sharma 1, S. R. Bhakar 2 and P. K. Singh 2

More information

Combining machine learning and matching techniques to improve causal inference in program evaluation

Combining machine learning and matching techniques to improve causal inference in program evaluation bs_bs_banner Journal of Evaluation in Clinical Practice ISSN1365-2753 Combining machine learning and matching techniques to improve causal inference in program evaluation Ariel Linden DrPH 1,2 and Paul

More information

Quantile Regression for Final Hospitalization Rate Prediction

Quantile Regression for Final Hospitalization Rate Prediction Quantile Regression for Final Hospitalization Rate Prediction Nuoyu Li Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 nuoyul@cs.cmu.edu 1 Introduction Influenza (the flu) has

More information

Developing a Predictive Model of Physician Attribution of Patient Satisfaction Surveys

Developing a Predictive Model of Physician Attribution of Patient Satisfaction Surveys ABSTRACT Paper 1089-2017 Developing a Predictive Model of Physician Attribution of Patient Satisfaction Surveys Ingrid C. Wurpts, Ken Ferrell, and Joseph Colorafi, Dignity Health For all healthcare systems,

More information

J2.6 Imputation of missing data with nonlinear relationships

J2.6 Imputation of missing data with nonlinear relationships Sixth Conference on Artificial Intelligence Applications to Environmental Science 88th AMS Annual Meeting, New Orleans, LA 20-24 January 2008 J2.6 Imputation of missing with nonlinear relationships Michael

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

Summary of main challenges and future directions

Summary of main challenges and future directions Summary of main challenges and future directions Martin Schumacher Institute of Medical Biometry and Medical Informatics, University Medical Center, Freiburg Workshop October 2008 - F1 Outline Some historical

More information

MULTIPLE REGRESSION OF CPS DATA

MULTIPLE REGRESSION OF CPS DATA MULTIPLE REGRESSION OF CPS DATA A further inspection of the relationship between hourly wages and education level can show whether other factors, such as gender and work experience, influence wages. Linear

More information

Learning from data when all models are wrong

Learning from data when all models are wrong Learning from data when all models are wrong Peter Grünwald CWI / Leiden Menu Two Pictures 1. Introduction 2. Learning when Models are Seriously Wrong Joint work with John Langford, Tim van Erven, Steven

More information

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals

Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Part [2.1]: Evaluation of Markers for Treatment Selection Linking Clinical and Statistical Goals Patrick J. Heagerty Department of Biostatistics University of Washington 174 Biomarkers Session Outline

More information

AMELIA II: A Package for Missing Data

AMELIA II: A Package for Missing Data AMELIA II: A Package for Missing Data James Honaker Gary King Matthew Blackwell July 24, 2009 I want to convince you of three things. I want to convince you of three things. 1 Missing data is a problem

More information

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

Using climate models to project the future distributions of climate-sensitive infectious diseases

Using climate models to project the future distributions of climate-sensitive infectious diseases Liverpool Marine Symposium, 17 Jan 2011 Using climate models to project the future distributions of climate-sensitive infectious diseases Prof. Matthew Baylis Liverpool University Climate and Infectious

More information

Practical Regression: Convincing Empirical Research in Ten Steps

Practical Regression: Convincing Empirical Research in Ten Steps DAVID DRANOVE 7-112-001 Practical Regression: Convincing Empirical Research in Ten Steps This is one in a series of notes entitled Practical Regression. These notes supplement the theoretical content of

More information

Predictive Models for Healthcare Analytics

Predictive Models for Healthcare Analytics Predictive Models for Healthcare Analytics A Case on Retrospective Clinical Study Mengling Mornin Feng mfeng@mit.edu mornin@gmail.com 1 Learning Objectives After the lecture, students should be able to:

More information

STAT 151B. Administrative Info. Statistics 151B: Introduction Modern Statistical Prediction and Machine Learning. Overview and introduction

STAT 151B. Administrative Info. Statistics 151B: Introduction Modern Statistical Prediction and Machine Learning. Overview and introduction Statistics 151B: Modern Statistical Prediction and Machine Learning Overview and introduction information Homepage: http://www.stat.berkeley.edu/ jon/ stat-151b-spring-2012 All announcements and materials

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: 2347-5552, Volume-2, Issue-6, November-2014 Classification and Statistical Analysis of Auditory FMRI Data Using

More information

Visual and Decision Informatics (CVDI)

Visual and Decision Informatics (CVDI) University of Louisiana at Lafayette, Vijay V Raghavan, 337.482.6603, raghavan@louisiana.edu Drexel University, Xiaohua (Tony) Hu, 215.895.0551, xh29@drexel.edu Tampere University (Finland), Moncef Gabbouj,

More information

A methodology for the analysis of medical data

A methodology for the analysis of medical data Please cite this book chapter as: A. Tsanas, M.A. Little, P.E. McSharry, A methodology for the analysis of medical data, Handbook of Systems and Complexity in Health, Springer, New York, pp. 113-125, 2013

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature16467 Supplementary Discussion Relative influence of larger and smaller EWD impacts. To examine the relative influence of varying disaster impact severity on the overall disaster signal

More information

On testing dependency for data in multidimensional contingency tables

On testing dependency for data in multidimensional contingency tables On testing dependency for data in multidimensional contingency tables Dominika Polko 1 Abstract Multidimensional data analysis has a very important place in statistical research. The paper considers the

More information

Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University

Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University Multivariate Regression with Small Samples: A Comparison of Estimation Methods W. Holmes Finch Maria E. Hernández Finch Ball State University High dimensional multivariate data, where the number of variables

More information

Supplementary Material. other ethnic backgrounds. All but six of the yoked pairs were matched on ethnicity. Results

Supplementary Material. other ethnic backgrounds. All but six of the yoked pairs were matched on ethnicity. Results Supplementary Material S1 Methodological Details Participants The sample was 80% Caucasian, 16.7% Asian or Asian American, and 3.3% from other ethnic backgrounds. All but six of the yoked pairs were matched

More information

Detection and Classification of Diabetic Retinopathy using Retinal Images

Detection and Classification of Diabetic Retinopathy using Retinal Images Detection and Classification of Diabetic Retinopathy using Retinal Images Kanika Verma, Prakash Deep and A. G. Ramakrishnan, Senior Member, IEEE Medical Intelligence and Language Engineering Lab Department

More information

Behavioral Data Mining. Lecture 4 Measurement

Behavioral Data Mining. Lecture 4 Measurement Behavioral Data Mining Lecture 4 Measurement Outline Hypothesis testing Parametric statistical tests Non-parametric tests Precision-Recall plots ROC plots Hardware update Icluster machines are ready for

More information

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Pei Wang Department of Statistics Stanford University Stanford, CA 94305 wp57@stanford.edu Young Kim, Jonathan Pollack Department

More information

AP Statistics. Semester One Review Part 1 Chapters 1-5

AP Statistics. Semester One Review Part 1 Chapters 1-5 AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically

More information

Class Outlier Detection. Zuzana Pekarčíková

Class Outlier Detection. Zuzana Pekarčíková Class Outlier Detection Zuzana Pekarčíková Outline λ What is an Outlier? λ Applications of Outlier Detection λ Types of Outliers λ Outlier Detection Methods Types λ Basic Outlier Detection Methods λ High-dimensional

More information

Performance, Labour and Economic Aspects of Different Farrowing Systems

Performance, Labour and Economic Aspects of Different Farrowing Systems 1 Performance, Labour and Economic Aspects of Different Farrowing Systems University of Natural Resources and Applied Life Sciences, Department of Sustainable Agricultural Systems, Division of Agricultural

More information

Swiss Brown Swiss in different environments: Does GxE play an important role? Beat Bapst Qualitas AG, Switzerland

Swiss Brown Swiss in different environments: Does GxE play an important role? Beat Bapst Qualitas AG, Switzerland Swiss Brown Swiss in different environments: Does GxE play an important role? Beat Bapst Qualitas AG, Switzerland 07.04.2016 World Brown Swiss Congress, Mende Introduction/Background Brown Swiss Dairy

More information

Extraversion. The Extraversion factor reliability is 0.90 and the trait scale reliabilities range from 0.70 to 0.81.

Extraversion. The Extraversion factor reliability is 0.90 and the trait scale reliabilities range from 0.70 to 0.81. MSP RESEARCH NOTE B5PQ Reliability and Validity This research note describes the reliability and validity of the B5PQ. Evidence for the reliability and validity of is presented against some of the key

More information

Ensemble based probabilistic forecasting of meteorology and air quality in Oslo, Norway

Ensemble based probabilistic forecasting of meteorology and air quality in Oslo, Norway Ensemble based probabilistic forecasting of meteorology and air quality in Oslo, Norway Sam Erik Walker, Bruce Rolstad Denby, Núria Castell NILU Norwegian Institute for Air Research 21 August 2014 World

More information

UNCERTAINTY, HEURISTICS AND INJURY PREDICTION

UNCERTAINTY, HEURISTICS AND INJURY PREDICTION UNCERTAINTY, HEURISTICS AND INJURY PREDICTION Written by Mladen Jovanovic, Serbia Predicting injuries in high-performance sports is of great importance for both players and clubs, but also for fans Having

More information

Importance of factors contributing to work-related stress: comparison of four metrics

Importance of factors contributing to work-related stress: comparison of four metrics Importance of factors contributing to work-related stress: comparison of four metrics Mounia N. Hocine, Natalia Feropontova, Ndèye Niang, Karim Aït-Bouziad, Gilbert Saporta Conservatoire national des arts

More information

NORTH DAKOTA 2011 FLOOD EVENT

NORTH DAKOTA 2011 FLOOD EVENT NORTH DAKOTA 2011 FLOOD EVENT OUTLINE Missouri River Basin Geography Generic Reservoir Operations Weather and Climate 2011 Basin-wide Hydrology Missouri River Timeline Missouri River Damages Mouse River

More information

Statistics 571: Statistical Methods Summer 2003 Final Exam Ramón V. León

Statistics 571: Statistical Methods Summer 2003 Final Exam Ramón V. León Name: Statistics 571: Statistical Methods Summer 2003 Final Exam Ramón V. León This exam is closed-book and closed-notes. However, you can use up to twenty pages of personal notes as an aid in answering

More information

QUANTIFYING CEREBRAL CONTRIBUTIONS TO PAIN 1

QUANTIFYING CEREBRAL CONTRIBUTIONS TO PAIN 1 QUANTIFYING CEREBRAL CONTRIBUTIONS TO PAIN 1 Supplementary Figure 1. Overview of the SIIPS1 development. The development of the SIIPS1 consisted of individual- and group-level analysis steps. 1) Individual-person

More information

Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer

Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer Jayashree Kalpathy-Cramer PhD 1, William Hersh, MD 1, Jong Song Kim, PhD

More information

Migratory Bird classification and analysis Aparna Pal

Migratory Bird classification and analysis Aparna Pal Migratory Bird classification and analysis Aparna Pal apal4@wisc.edu Abstract The use of classification vectors to classify land and seabirds act as a first step to pattern classification of migratory

More information

NPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management

NPTEL Project. Econometric Modelling. Module 14: Heteroscedasticity Problem. Module 16: Heteroscedasticity Problem. Vinod Gupta School of Management 1 P age NPTEL Project Econometric Modelling Vinod Gupta School of Management Module 14: Heteroscedasticity Problem Module 16: Heteroscedasticity Problem Rudra P. Pradhan Vinod Gupta School of Management

More information