Machine Learning Statistical Learning. Prof. Matteo Matteucci

Similar documents
Applied Quantitative Methods II

Motivation: Fraud Detection

CSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

EC352 Econometric Methods: Week 07

Simple Linear Regression the model, estimation and testing

Sample Size Considerations. Todd Alonzo, PhD

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

CSE 258 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Russian Journal of Agricultural and Socio-Economic Sciences, 3(15)

Multiple Regression Analysis

Chapter 14: More Powerful Statistical Methods

WELCOME! Lecture 11 Thommy Perlinger

1.4 - Linear Regression and MS Excel

4. Model evaluation & selection

Knowledge Discovery and Data Mining I

1 Simple and Multiple Linear Regression Assumptions

NORTH SOUTH UNIVERSITY TUTORIAL 2

Study of cigarette sales in the United States Ge Cheng1, a,

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)

Prediction of Malignant and Benign Tumor using Machine Learning

Instrumental Variables Estimation: An Introduction

Implications of Longitudinal Data in Machine Learning for Medicine and Epidemiology

Dr. Kelly Bradley Final Exam Summer {2 points} Name

EECS 433 Statistical Pattern Recognition

Regression Discontinuity Design (RDD)

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Technical Track Session IV Instrumental Variables

Problem set 2: understanding ordinary least squares regressions

Neurons and neural networks II. Hopfield network

Discovering Meaningful Cut-points to Predict High HbA1c Variation

MS&E 226: Small Data

1. The Role of Sample Survey Design

Simple Linear Regression One Categorical Independent Variable with Several Categories

Using Statistical Intervals to Assess System Performance Best Practice

CHILD HEALTH AND DEVELOPMENT STUDY

Statistics for Psychology

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Artificial Intelligence Lecture 7

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Brain Tumour Detection of MR Image Using Naïve Beyer classifier and Support Vector Machine

LINEAR REGRESSION FOR BIVARIATE CENSORED DATA VIA MULTIPLE IMPUTATION

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

1 Pattern Recognition 2 1

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Overview of Non-Parametric Statistics

Empirical Tools of Public Finance. 131 Undergraduate Public Economics Emmanuel Saez UC Berkeley

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Psychology Research Process

The Lens Model and Linear Models of Judgment

Lecture 13: Finding optimal treatment policies

SAMPLING AND SAMPLE SIZE

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A.

UNLOCKING VALUE WITH DATA SCIENCE BAYES APPROACH: MAKING DATA WORK HARDER

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Spatiotemporal models for disease incidence data: a case study

26:010:557 / 26:620:557 Social Science Research Methods

Technical appendix Strengthening accountability through media in Bangladesh: final evaluation

Regression analysis of mortality with respect to seasonal influenza in Sweden

IAPT: Regression. Regression analyses

CHAPTER TWO REGRESSION

Choosing an Approach for a Quantitative Dissertation: Strategies for Various Variable Types

Cointegration: the Engle and Granger approach

Original Article Downloaded from jhs.mazums.ac.ir at 22: on Friday October 5th 2018 [ DOI: /acadpub.jhs ]

Correlation and regression

Confidence Intervals On Subsets May Be Misleading

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Inference with Difference-in-Differences Revisited

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Bootstrapping Residuals to Estimate the Standard Error of Simple Linear Regression Coefficients

STA 3024 Spring 2013 EXAM 3 Test Form Code A UF ID #

A MODEL FOR MIXED CONTINUOUS AND DISCRETE RESPONSES WITH POSSIBILITY OF MISSING RESPONSES

STATISTICS IN CLINICAL AND TRANSLATIONAL RESEARCH

Memorial Sloan-Kettering Cancer Center

Inference beyond significance testing: a gentle primer of model based inference

The Human Side of Science: I ll Take That Bet! Balancing Risk and Benefit. Uncertainty, Risk and Probability: Fundamental Definitions and Concepts

10. LINEAR REGRESSION AND CORRELATION

Non-parametric methods for linkage analysis

Private Health Investments under Competing Risks: Evidence from Malaria Control in Senegal

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Your Task: Find a ZIP code in Seattle where the crime rate is worse than you would expect and better than you would expect.

Daniel Boduszek University of Huddersfield

appstats26.notebook April 17, 2015

Reflection Questions for Math 58B

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT REGRESSION MODELS

Performance of Median and Least Squares Regression for Slightly Skewed Data

lateral organization: maps

Chapter 8: Estimating with Confidence

Quasi-experimental analysis Notes for "Structural modelling".

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Understanding the Hypothesis

Transcription:

Machine Learning Statistical Learning Pro. Matteo Matteucci

Statistical Learning Outline o What Is Statistical Learning? Why estimate? How do we estimate? The trade-o between prediction accuracy & model interpretability Y/G o Some important taxonomies I expect you ll know this by heart! Prediction vs. Inerence Parametric vs. Non Parametric models Regression vs. Classiication problems Supervised vs. Unsupervised learning Pro. Matteo Matteucci - Machine Learning

Example: Increasing Sales by Advertising 3 Pro. Matteo Matteucci - Machine Learning

What is Statistical Learning? 4 Y i i = i1,..., ip i =1,...,n o Suppose we observe and or Assume a relationship exists between Y and at least one o the observed s Assume we can model this relationship as Y i : unknown unction systematic ε i : zero mean random error i i y y -0.10-0.05 0.00 0.05 0.10 ε i 0.0 0. 0.4 0.6 0.8 1.0 x o The term statistical learning reers to using the data to learn Pro. Matteo Matteucci - Machine Learning

Reducible vs Irreducible Error 5 o The error our estimate will have has two components Y i i i Reducible error due to the choice o model complexity Y/G Irreducible error due to the presence o ε i in the training set Pro. Matteo Matteucci - Machine Learning

Because noise matters 6 sd=0.001 sd=0.005 y -0.10-0.05 0.00 0.05 0.10 y -0.10-0.05 0.00 0.05 0.10 0.0 0. 0.4 0.6 0.8 1.0 0.0 0. 0.4 0.6 0.8 1.0 x x sd=0.01 sd=0.03 y y -0.10-0.05 0.00 0.05 0.10 y y -0.10 0.00 0.05 0.10 y y 0.0 0. 0.4 0.6 0.8 1.0 0.0 0. 0.4 0.6 0.8 1.0 x x Pro. Matteo Matteucci - Machine Learning

Reducible vs Irreducible Error Part 7 o The error our estimate will have has two components Y i i i Reducible error due to the choice o model complexity Irreducible error due to the presence o ε i in the training set ˆ o Let assume and ixed or the time being Pro. Matteo Matteucci - Machine Learning

Pro. Matteo Matteucci - Machine Learning Reducible vs Irreducible Error Part 3 o Can you derive this? 8 0.0 0. 0.4 0.6 0.8 1.0 x 0.0 0. 0.4 0.6 0.8 1.0-0.10 0.00 0.05 0.10 sd=0.03 ˆ ˆ Y Y Y 1 1 Y ˆ 0 ] [ ˆ ] [ ˆ ˆ ˆ ˆ ] [ ˆ ˆ ] [ ] [ ˆ ] [ ] ˆ ˆ ˆ [ ] ˆ [ ] ˆ [ Var E E E E E E E E Y Y E

Example: Income vs. Education Seniority 9 o Function might also involve multiple variables Pro. Matteo Matteucci - Machine Learning

Why do we estimate? 10 o There are reasons or estimating Prediction Inerence Y/G o Prediction I we can produce a good estimate or and the variance o ε is not too large we can make accurate predictions or the response, Y/G, based on a new value o. o Inerence We may be interested in the type o relationship between Y/G and the 's to control/inluence Y/G. Which particular predictors actually aect the response? Is the relationship positive or negative? Is the relationship a simple linear one or is it more complicated etc.? Pro. Matteo Matteucci - Machine Learning

Examples or Prediction & Inerence 11 o Direct Mail Prediction Interested in predicting how much money an individual will donate based on observations rom 90,000 people on which we have recorded over 400 dierent characteristics. Don t care too much about each individual characteristic. Just want to know: For a given individual should I send out a mailing? o Medium House Price Which actors have the biggest eect on the response How big the eect is. Want to know: how much impact does a river view have on the house value Pro. Matteo Matteucci - Machine Learning

How Do We Estimate? 1 o We have observed a set o training data {, Y,, Y 1,, n, Y 1 n o Use statistical method/model to estimate so that or any, Y } y -0.10-0.05 0.00 0.05 0.10 0.0 0. 0.4 0.6 0.8 1.0 x o Statistical methods/models are usually divided in Parametric Methods/Models Non-parametric Methods/Models Pro. Matteo Matteucci - Machine Learning

Parametric Methods Part 1 13 o Parametric methods leverage on an assumption about the model underlining They reduce the problem o estimating down to the one o estimating a set o parameters They involve a two-step model based approach o STEP 1: Make some assumption about the unctional orm o, i.e. come up with a model e.g., a linear model i 0 1 i1 i p ip o STEP : Use the training data to it the model, i.e., estimate through the unknown parameters 0 1 p Pro. Matteo Matteucci - Machine Learning

Parametric Methods Part 14 o Parametric methods leverage on an assumption about the model underlining They reduce the problem o estimating down to the one o estimating a set o parameters They involve a two-step model based approach o STEP 1: In this course we will examine ar more complicated, and lexible, models or w.r.t linear ones. In a sense the more lexible the model the more realistic it is. o STEP : The most common approach or estimating the parameters in a linear model is Ordinary Least Squares OLS, but there are oten superior approaches. Pro. Matteo Matteucci - Machine Learning

Example: A Linear Regression Estimate 15 o Even i the standard deviation is low we will still get a bad answer i we use the wrong model. Pro. Matteo Matteucci - Machine Learning = b 0 + b 1 Education+ b Seniority

Non-parametric Methods 16 o Sometimes they are reerred as sample-based or instancebased methods, they do not make explicit assumptions about the unctional orm o, they exploit the training data directly o Advantages: They accurately it a wider range o possible shapes o They do not require a trainining phase o Disadvantages: A very large number o observations is required to obtain an accurate estimate o Higher computational cost at testing time They accurately it a wider range o possible shapes o. Pro. Matteo Matteucci - Machine Learning

Example: A Thin-Plate Spline Estimate 17 Smooth thin-plate spline it o Non-parametric regression methods are more lexible thus they can potentially provide more accurate estimates Pro. Matteo Matteucci - Machine Learning

Prediction Accuracy vs Model Interpretability 18 o Why not just use a more lexible method i it is more realistic? Reason 1: A simple method such as linear regression produces a model which is much easier to interpret the Inerence part is better. E.g., in a linear model, β j is the average increase in Y or a one unit increase in j holding all other variables constant. Reason : Even i you are only interested in prediction, it is oten possible to get more accurate predictions with a simple, instead o a complicated, model. This seems counter intuitive but has to do with the act that it is harder to it properly a more lexible model. Pro. Matteo Matteucci - Machine Learning

A Poor Estimate 19 o Non-parametric regression methods can also be too lexible and produce poor estimates or Pro. Matteo Matteucci - Machine Learning Thin-plate spline it with zero training error

Flexibility vs Model Interpretability 0 Pro. Matteo Matteucci - Machine Learning

Supervised vs. Unsupervised Learning 1 o Machine Learning makes usually a clear distinction between Supervised Models Unsupervised Models o Supervised Learning: Supervised Learning is where both the predictors, i, and the response, Y i, are observed. Pro. Matteo Matteucci - Machine Learning

Supervised vs. Unsupervised Learning o Machine Learning makes usually a clear distinction between Supervised Models Unsupervised Models o Unsupervised Learning: Only the i s are observed and use them to build a high level representation possibly or modeling some Y Pro. Matteo Matteucci - Machine Learning

Regression vs. Classiication 3 o Supervised learning problems can be urther divided into Regression problems cover situations where Y is continuous/numerical Predicting the value o the Dow in 6 months Predicting the value o a given house based on various inputs. Classiication problems cover situations where Y is categorical Will the Dow be up U or down D in 6 months? Is this email a SPAM or not? Pro. Matteo Matteucci - Machine Learning

A Simple Clustering Example 4 Pro. Matteo Matteucci - Machine Learning

What about higher dimensions? 5 Pro. Matteo Matteucci - Machine Learning

Wrap up! 6 o What Is Statistical Learning? Why estimate? How do we estimate? The trade-o between prediction accuracy & model interpretability Y/G o Some important taxonomies I expect you ll know this by heart! Prediction vs. Inerence Parametric vs. Non Parametric models Regression vs. Classiication problems Supervised vs. Unsupervised learning Pro. Matteo Matteucci - Machine Learning