Machine Learning Statistical Learning. Prof. Matteo Matteucci

Machine Learning Statistical Learning Pro. Matteo Matteucci

Statistical Learning Outline o What Is Statistical Learning? Why estimate? How do we estimate? The trade-o between prediction accuracy & model interpretability Y/G o Some important taxonomies I expect you ll know this by heart! Prediction vs. Inerence Parametric vs. Non Parametric models Regression vs. Classiication problems Supervised vs. Unsupervised learning Pro. Matteo Matteucci - Machine Learning

Example: Increasing Sales by Advertising 3 Pro. Matteo Matteucci - Machine Learning

What is Statistical Learning? 4 Y i i = i1,..., ip i =1,...,n o Suppose we observe and or Assume a relationship exists between Y and at least one o the observed s Assume we can model this relationship as Y i : unknown unction systematic ε i : zero mean random error i i y y -0.10-0.05 0.00 0.05 0.10 ε i 0.0 0. 0.4 0.6 0.8 1.0 x o The term statistical learning reers to using the data to learn Pro. Matteo Matteucci - Machine Learning

Reducible vs Irreducible Error 5 o The error our estimate will have has two components Y i i i Reducible error due to the choice o model complexity Y/G Irreducible error due to the presence o ε i in the training set Pro. Matteo Matteucci - Machine Learning

Because noise matters 6 sd=0.001 sd=0.005 y -0.10-0.05 0.00 0.05 0.10 y -0.10-0.05 0.00 0.05 0.10 0.0 0. 0.4 0.6 0.8 1.0 0.0 0. 0.4 0.6 0.8 1.0 x x sd=0.01 sd=0.03 y y -0.10-0.05 0.00 0.05 0.10 y y -0.10 0.00 0.05 0.10 y y 0.0 0. 0.4 0.6 0.8 1.0 0.0 0. 0.4 0.6 0.8 1.0 x x Pro. Matteo Matteucci - Machine Learning

Reducible vs Irreducible Error Part 7 o The error our estimate will have has two components Y i i i Reducible error due to the choice o model complexity Irreducible error due to the presence o ε i in the training set ˆ o Let assume and ixed or the time being Pro. Matteo Matteucci - Machine Learning

Pro. Matteo Matteucci - Machine Learning Reducible vs Irreducible Error Part 3 o Can you derive this? 8 0.0 0. 0.4 0.6 0.8 1.0 x 0.0 0. 0.4 0.6 0.8 1.0-0.10 0.00 0.05 0.10 sd=0.03 ˆ ˆ Y Y Y 1 1 Y ˆ 0 ] [ ˆ ] [ ˆ ˆ ˆ ˆ ] [ ˆ ˆ ] [ ] [ ˆ ] [ ] ˆ ˆ ˆ [ ] ˆ [ ] ˆ [ Var E E E E E E E E Y Y E

Example: Income vs. Education Seniority 9 o Function might also involve multiple variables Pro. Matteo Matteucci - Machine Learning

Why do we estimate? 10 o There are reasons or estimating Prediction Inerence Y/G o Prediction I we can produce a good estimate or and the variance o ε is not too large we can make accurate predictions or the response, Y/G, based on a new value o. o Inerence We may be interested in the type o relationship between Y/G and the 's to control/inluence Y/G. Which particular predictors actually aect the response? Is the relationship positive or negative? Is the relationship a simple linear one or is it more complicated etc.? Pro. Matteo Matteucci - Machine Learning

Examples or Prediction & Inerence 11 o Direct Mail Prediction Interested in predicting how much money an individual will donate based on observations rom 90,000 people on which we have recorded over 400 dierent characteristics. Don t care too much about each individual characteristic. Just want to know: For a given individual should I send out a mailing? o Medium House Price Which actors have the biggest eect on the response How big the eect is. Want to know: how much impact does a river view have on the house value Pro. Matteo Matteucci - Machine Learning

How Do We Estimate? 1 o We have observed a set o training data {, Y,, Y 1,, n, Y 1 n o Use statistical method/model to estimate so that or any, Y } y -0.10-0.05 0.00 0.05 0.10 0.0 0. 0.4 0.6 0.8 1.0 x o Statistical methods/models are usually divided in Parametric Methods/Models Non-parametric Methods/Models Pro. Matteo Matteucci - Machine Learning

Parametric Methods Part 1 13 o Parametric methods leverage on an assumption about the model underlining They reduce the problem o estimating down to the one o estimating a set o parameters They involve a two-step model based approach o STEP 1: Make some assumption about the unctional orm o, i.e. come up with a model e.g., a linear model i 0 1 i1 i p ip o STEP : Use the training data to it the model, i.e., estimate through the unknown parameters 0 1 p Pro. Matteo Matteucci - Machine Learning

Parametric Methods Part 14 o Parametric methods leverage on an assumption about the model underlining They reduce the problem o estimating down to the one o estimating a set o parameters They involve a two-step model based approach o STEP 1: In this course we will examine ar more complicated, and lexible, models or w.r.t linear ones. In a sense the more lexible the model the more realistic it is. o STEP : The most common approach or estimating the parameters in a linear model is Ordinary Least Squares OLS, but there are oten superior approaches. Pro. Matteo Matteucci - Machine Learning

Example: A Linear Regression Estimate 15 o Even i the standard deviation is low we will still get a bad answer i we use the wrong model. Pro. Matteo Matteucci - Machine Learning = b 0 + b 1 Education+ b Seniority

Non-parametric Methods 16 o Sometimes they are reerred as sample-based or instancebased methods, they do not make explicit assumptions about the unctional orm o, they exploit the training data directly o Advantages: They accurately it a wider range o possible shapes o They do not require a trainining phase o Disadvantages: A very large number o observations is required to obtain an accurate estimate o Higher computational cost at testing time They accurately it a wider range o possible shapes o. Pro. Matteo Matteucci - Machine Learning

Example: A Thin-Plate Spline Estimate 17 Smooth thin-plate spline it o Non-parametric regression methods are more lexible thus they can potentially provide more accurate estimates Pro. Matteo Matteucci - Machine Learning

Prediction Accuracy vs Model Interpretability 18 o Why not just use a more lexible method i it is more realistic? Reason 1: A simple method such as linear regression produces a model which is much easier to interpret the Inerence part is better. E.g., in a linear model, β j is the average increase in Y or a one unit increase in j holding all other variables constant. Reason : Even i you are only interested in prediction, it is oten possible to get more accurate predictions with a simple, instead o a complicated, model. This seems counter intuitive but has to do with the act that it is harder to it properly a more lexible model. Pro. Matteo Matteucci - Machine Learning

A Poor Estimate 19 o Non-parametric regression methods can also be too lexible and produce poor estimates or Pro. Matteo Matteucci - Machine Learning Thin-plate spline it with zero training error

Flexibility vs Model Interpretability 0 Pro. Matteo Matteucci - Machine Learning

Supervised vs. Unsupervised Learning 1 o Machine Learning makes usually a clear distinction between Supervised Models Unsupervised Models o Supervised Learning: Supervised Learning is where both the predictors, i, and the response, Y i, are observed. Pro. Matteo Matteucci - Machine Learning

Supervised vs. Unsupervised Learning o Machine Learning makes usually a clear distinction between Supervised Models Unsupervised Models o Unsupervised Learning: Only the i s are observed and use them to build a high level representation possibly or modeling some Y Pro. Matteo Matteucci - Machine Learning

Regression vs. Classiication 3 o Supervised learning problems can be urther divided into Regression problems cover situations where Y is continuous/numerical Predicting the value o the Dow in 6 months Predicting the value o a given house based on various inputs. Classiication problems cover situations where Y is categorical Will the Dow be up U or down D in 6 months? Is this email a SPAM or not? Pro. Matteo Matteucci - Machine Learning

A Simple Clustering Example 4 Pro. Matteo Matteucci - Machine Learning

What about higher dimensions? 5 Pro. Matteo Matteucci - Machine Learning

Wrap up! 6 o What Is Statistical Learning? Why estimate? How do we estimate? The trade-o between prediction accuracy & model interpretability Y/G o Some important taxonomies I expect you ll know this by heart! Prediction vs. Inerence Parametric vs. Non Parametric models Regression vs. Classiication problems Supervised vs. Unsupervised learning Pro. Matteo Matteucci - Machine Learning