Learning from data when all models are wrong
|
|
- Morgan Fletcher
- 5 years ago
- Views:
Transcription
1 Learning from data when all models are wrong Peter Grünwald CWI / Leiden Menu Two Pictures 1. Introduction 2. Learning when Models are Seriously Wrong Joint work with John Langford, Tim van Erven, Steven de Rooij, Wouter Koolen, Wojciech Kotlowski 3. Learning when Models are Almost True Wrong yet useful models Example 1: Regression (Curve Fitting) Scientists routinely use models that are wrong nonlinear relations modeled as linear, dependent variables modeled as independent, yet useful: they lead to reasonable predictions Examples: speech & digit recognition, DNA sequence analysis, Regression Regression Overfitting! Learning when All Models are Wrong 1
2 Regression Regression Standard Setting Standard Setting training data training data Usual assumption: independent identically distributed Usual assumption: independent identically distributed Model or model class : set of distributions Model or model class true distribution : set of distributions Polynomial Regression Example Model class consists of parametric sub-models: is conditional distribution of Y given X, expressing training data Standard Setting Learning based on training data, hypothesize an estimate of Learning Algorithm ( estimator ) is function i.e. has density Example Learning Algorithms: (penalized) least squares, (penalized) maximum likelihood Learning when All Models are Wrong 2
3 Standard Setting 1. When the Model is True training data Learning based on training data, hypothesize an estimate of Prediction Based on training data and and, predict 1. When the Model is True 2. When the Model is Almost True e.g. model (class) M is set of all polynomials, P is square root e.g. 3rd degree polynomial, Gaussian noise P M, inf P M d(p,p)=0 3. When the Model is Wrong e.g. model (class) is set of all polynomials with Gaussian noise, but real noise not Gaussian 4. When there is No Truth! P M, inf P M d(p,p)>0 Learning when All Models are Wrong 3
4 4 Situations, 3 Goals model true Estimation, Model Selection Bayes usually o.k. Prediction Bayes usually o.k. model true Estimation, Model Selection Bayes usually o.k. Prediction Bayes usually o.k. model almost true truth exists no truth Theme 1: Bayes! universal prediction Bayes ok for log-score, not for other scores model almost true truth exists no truth Bayes ok for log-score, not for other scores universal prediction Bayes ok for log-score, not for other scores Theme 2: when the model is wrong, the distance measure becomes much more important model true Estimation, Model Selection Prediction model true Estimation, Model Selection Prediction model almost true truth exists no truth Bayes not ok in general unless model convex Bayes ok for log-score, not for other scores universal prediction Bayes ok for log-score, not for other scores Theme 3: most existing results are from theoretical machine learning community (NIPS, COLT) model almost true truth exists no truth Bayes factor model selection suboptimal Bayes not ok in general unless model convex Bayes can be inconsistent! Bayes factor model averaging suboptimal Statistical Consistency Let ˆP : n 1 (X Y) n M be an estimator/learner ˆP is called consistent (relative to d) if for all P M, with P -probability 1, d(p,ˆp (X n,y n ) ) 0 Statistical Consistency Let ˆP : n 1 (X Y) n M be an estimator/learner ˆP is called consistent (relative to d) if for all P M, with P -probability 1, d(p,ˆp (X n,y n ) ) 0 truth model M Learning when All Models are Wrong 4
5 Consistency when Model is Wrong truth P Consistency when Model is Wrong truth P modelm modelm Distance More Important when Model is Wrong truth P Bayesian Inconsistency when Model is Wrong Grünwald and Langford, Machine Learning 66(2-3), 2007 There exist a countable set of distributions M={P 0,P 1,P 2,...} a true distribution P, and a best approximation P =P 0 P =arg min P M d(p, P)=ɛ>0 a prior distribution Won M with W( P)=1/2...such that, P -almost surely, the standard Bayesian estimators never converge to P modelm Bayesian Inconsistency when Model is Wrong Grünwald and Langford, Machine Learning 66(2-3), 2007 There exist a countable set of distributions M={P 0,P 1,P 2,...} a true distribution P, and a best approximation P =P 0 P =arg min P M d(p, P)=ɛ>0 a prior distribution Won M with W( P)=1/2...such that, P -almost surely, the standard Bayesian estimators never converge to P d=kl divergence! (so this is real bad) logistic regression setting There exist a countable set of distributions M={P 0,P 1,P 2,...} a true distribution P, and a best approximation P =P 0, P =arg min P M d(p, P)=ɛ>0 a prior distribution Won M with W( P)=1/2...such that, P -almost surely, the standard Bayesian estimators never converge to P Learning when All Models are Wrong 5
6 There exist a countable set of distributions M={P 0,P 1,P 2,...} a true distribution P, and a best approximation P =P 0, P =arg min P M d(p, P)=ɛ>0 a prior distribution Won M with W( P)=1/2...such that, P -almost surely, the standard Bayesian estimators never converge to P There exist a countable set of distributions M={P 0,P 1,P 2,...} a true distribution P, and a best approximation P =P 0, P =arg min P M d(p, P)=ɛ>0 a prior distribution Won M with W( P)=1/2...such that, P -almost surely, the standard Bayesian estimators never converge to P Bayesian inference is based on posterior W(P X n,y n ) Bayesian inference is based on posterior W(P X n,y n ) prediction average over posterior estimation output subset of M with large posterior probability prediction average over posterior estimation output subset of M with large posterior probability There exist a countable set of distributions M={P 0,P 1,P 2,...} a true distribution P, and a best approximation P =P 0, P =arg min P M d(p, P)=ɛ>0 a prior distribution Won M with W( P)=1/2...such that, P -almost surely, the standard Bayesian estimators never converge to P The Geometry of Inconsistency For all K >0,W(P :d(p,p) K X n,y n ) 0, P a.s. Bayesian inference is based on posterior W(P X n,y n ) prediction average over posterior estimation output subset of M with large posterior probability Bayes predictive dist. Note that each individual P s posterior weight W(P X n,y n ) does tend to 0, eventually, a.s. The Geometry of Inconsistency The Geometry of Inconsistency essential that we take nonparametric model (or model selection with infinite nr of models) if model is parametric then we get consistency but we need too much data essential that we take nonparametric model (or model selection with infinite nr of models) prediction is o.k. for log loss (relates to KL divergence, shown in this picture) but not other loss functions Bayes predictive dist. Note that each individual P s posterior weight W(P X n,y n ) does tend to 0, eventually, a.s. P Bayes predictive dist. Learning when All Models are Wrong 6
7 The Role of Convexity Bayes predictive distribution is a mixture ofp M Indeed, Bayes is consistent with KL divergence, countable set M,P M if model is convex (closed under taking mixtures) Bayes implicitly punishes complex sub-models. To get consistency when model is wrong, need to punish them significantly more if the prior W(P) is changed to W(P) n then we do get consistency even for nonconvex models Menu Two Pictures 1. Introduction 2. Learning when Models are Seriously Wrong 3. Model Selection when Models are Almost True Model Selection Methods Suppose we observe data We want to know which model in our list of candidate models best explains the data ˆθ k is maximum likelihood estimator (best-fit) within model M k A model selection method is a function mapping data sequences of arbitrary length to model indices is model chosen for data The AIC-BIC Dilemma Two main types of model selection methods: 1. AIC-type Akaike Information Criterion (AIC, 1973) ˆk(y n ) is k minimizing logpˆθ k (y n )+k 2. BIC-type Bayesian Information Criterion (BIC, 1978) ˆk(y n ) is k minimizing logpˆθ k (y n )+ k 2 logn The AIC-BIC Dilemma Two main types of model selection methods: 1. AIC-type Akaike Information Criterion (AIC, 1973) leave-one-out cross-validation DIC, C p 2. BIC-type Bayesian Information Criterion (BIC, 1978) prequential validation Bayes factor model selection standard Minimum Description Length (MDL) The AIC-BIC Dilemma asymptotic overfitting Two main types of model selection methods: 1. AIC-type Akaike Information Criterion leave-one-out cross-validation DIC, C p 2. BIC-type Bayesian Information Criterion prequential validation Bayes factor model selection standard MDL inconsistent consistent Learning when All Models are Wrong 7
8 The AIC-BIC Dilemma Two main types of model selection methods: 1. AIC-type Akaike Information Criterion leave-one-out cross-validation DIC, C p 2. BIC-type Bayesian Information Criterion prequential validation Bayes factor model selection standard MDL inconsistent optimal rate consistent slower rate The Best of Both Worlds We give a novel analysis of the slower convergence rate of BIC-type methods: the catch-up phenomenon asymptotic underfitting The Best of Both Worlds We give a novel analysis of the slower convergence rate of BIC-type methods: the catch-up phenomenon This allows us to define a model selection/averaging method that, in a wide variety of circumstances, 1. is provably consistent 2. provably achieves optimal convergence rates The Best of Both Worlds We give a novel analysis of the slower convergence rate of BIC-type methods: the catch-up phenomenon This allows us to define a model selection/averaging method that, in a wide variety of circumstances, 1. is provably consistent 2. provably achieves optimal convergence rates (often ) even though it had been suggested that this is impossible! Yang 2005, Forster 2001, Sober 2004 For many model classes, method is computationally feasible Sub-Menu Bayes Factor Model Selection 1. Bayes Factor Model Selection Predictive interpretation is k maximizing a posteriori probability 2. The Catch-Up Phenomenon. as exhibited by the Bayes factor method 3. Conclusion is prior are priors is k minimizing Learning when All Models are Wrong 8
9 For, select complex model Bayes factor model selection between 1 st -order and 2 nd -order Markov model for The Picture of Dorian Gray Bayes factor model selection between 1 st -order and 2 nd -order Markov model for The Picture of Dorian Gray The Catch-Up Phenomenon Suppose we select between simple model and complex model Common Phenomenon: for some simple model predicts better if complex model predicts better if this seems to be the very reason why it makes sense to prefer a simple model even if the complex one is true We would expect Bayes factor method to switch at about. but is this really where Bayes switches!? Menu 1. Bayes Factor Model Selection Predictive interpretation 2. The Catch-Up Phenomenon. as exhibited by the Bayes factor method 3. Conclusion Bayesian prediction Logarithmic Loss Given model, Bayesian marginal likelihood is If we measure prediction quality by log loss, Given model, predict by predictive distribution then accumulated loss satisfies so that accumulated log loss = minus log likelihood Learning when All Models are Wrong 9
10 The Most Important Slide Bayes picks the k minimizing Prequential interpretation of Bayes model selection: select the model such that, when used as a sequential prediction strategy, minimizes accumulated sequential prediction error Dawid 84, Rissanen 84 Menu 1. Bayes Factor Model Selection Predictive interpretation 2. The Catch-Up Phenomenon. as exhibited by the Bayes factor method 3. Conclusion Green curve depicts difference in accumulated prediction error between predicting with and predicting with Green curve depicts difference in accumulated prediction error between predicting with and predicting with The Catch-Up Phenomenon starts to outperform at So would like to switch to at but switching only happens at Suppose we select between simple model complex model Common Phenomenon: for some simple model predicts better if complex model predicts better if and Bayes exhibits inertia: complex model has to catch up, so we prefer simpler model for a while even after Learning when All Models are Wrong 10
11 Model averaging does not help! Can we modify Bayes so as to do as well as the black curve? Almost! The Switch Distribution Suppose we switch from to at sample size s Our total prediction error is then The Switch Distribution Suppose we switch from to at sample size s Our total prediction error is then If we define then total prediction error is may be viewed both as a prediction strategy and as a distribution over infinite sequences The Switch Distribution We want to predict using some distribution. such that no matter what data are observed, i.e. for all, where maximizes We achieve this by treating s as a parameter, putting a prior on it, and then integrating s out (adopt a Bayesian solution to a Bayesian problem ) Learning when All Models are Wrong 11
12 Switch Distribution (very briefly) The switch distribution is obtained by putting a (cleverly constructed) prior on sequences of models rather than individual models is the prior mass you put on the meta-hypothesis that is best at sample size 1,2,6,7 is best at sample size 3,4,5,8,9,... readily extended to > 2 models Now we apply Bayes using this new prior for prediction and model selection can be done efficiently by dynamic programming Results Theorem 1: In all situations in which Bayes factor model selection is consistent, model selection based on Switching Prior is consistent (like BIC) Theorem 2: In many models almost true situations prediction based on the Switching Prior achieves the optimal rate of convergence (like AIC) Practical Experiments confirm this Method is formally Bayes, but Bayesian interpretation very tenuous! It s prequential! Discussion Discussion Model(s) almost true : standard Bayes is too slow. We can interpret switch distribution as a reparation of Bayes...but also as a frequentist hypothesis test to check whether the Bayesian model and prior assumptions are justified, and to step out of the model if necessary Model(s) almost true : standard Bayes is too slow. We can interpret switch distribution as a reparation of Bayes...but also as a frequentist hypothesis test to check whether the Bayesian model and prior assumptions are justified, and to step out of the model if necessary Model(s) seriously wrong : standard Bayes can be inconsistent unless model convex. Discussion Discussion Model(s) almost true : standard Bayes is too slow. We can interpret switch distribution as a reparation of Bayes...but also as a frequentist hypothesis test to check whether the Bayesian model and prior assumptions are justified, and to step out of the model if necessary Model(s) seriously wrong : standard Bayes can be inconsistent unless model convex. In recent work we provide a frequentist empirical convexity test (does the convex closure of the model fit the data better?) to check whether Bayesian model and prior assumptions are justified, and to step out of the model if necessary Model(s) almost true : standard Bayes is too slow. We can interpret switch distribution as a reparation of Bayes...but also as a frequentist hypothesis test to check whether the Bayesian model and prior assumptions are justified, and to step out of the model if necessary Model(s) seriously wrong : standard Bayes can be inconsistent unless model convex. In recent work we provide a frequentist empirical convexity test (does the convex closure of the model fit the data better?) to check whether Bayesian model and prior assumptions are justified, and to step out of the model if necessary We are effectively combining Bayes+Popper+(..new principle..) Learning when All Models are Wrong 12
13 Conclusion Conclusion We are effectively combining Bayes+Popper+(..new principle..) Bayes+Popper has been proposed before, on an informal level, by eminent Bayesians & others: A.P. Dawid (1982): the well-calibrated Bayesian A. Gelman (various papers, blog) New principle: our ongoing work! We are effectively combining Bayes+Popper+(..new principle..) Bayes+Popper has been proposed before, on an informal level, by eminent Bayesians & others: A.P. Dawid (1982): the well-calibrated Bayesian A. Gelman (various papers, blog) New principle: our ongoing work! Literature: Grünwald & Langford, Machine Learning, Van Erven, Grünwald and De Rooij, NIPS Van Erven s PhD. Thesis (2010). (notice: machine learning is important!) Learning when All Models are Wrong 13
MS&E 226: Small Data
MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector
More informationIntroduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018
Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this
More informationIntelligent Systems. Discriminative Learning. Parts marked by * are optional. WS2013/2014 Carsten Rother, Dmitrij Schlesinger
Intelligent Systems Discriminative Learning Parts marked by * are optional 30/12/2013 WS2013/2014 Carsten Rother, Dmitrij Schlesinger Discriminative models There exists a joint probability distribution
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and
More informationReview: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections
Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationEECS 433 Statistical Pattern Recognition
EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern
More informationBayesian performance
Bayesian performance In this section we will study the statistical properties of Bayesian estimates. Major topics include: The likelihood principle Decision theory/bayes rules Shrinkage estimators Frequentist
More information4. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2017 4. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationStepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality
Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,
More informationWeek 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.
Week 8 Hour 1: More on polynomial fits. The AIC Hour 2: Dummy Variables what are they? An NHL Example Hour 3: Interactions. The stepwise method. Stat 302 Notes. Week 8, Hour 1, Page 1 / 34 Human growth
More informationBAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS
Sara Garofalo Department of Psychiatry, University of Cambridge BAYESIAN HYPOTHESIS TESTING WITH SPSS AMOS Overview Bayesian VS classical (NHST or Frequentist) statistical approaches Theoretical issues
More information3. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationMBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks
MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples
More informationRussian Journal of Agricultural and Socio-Economic Sciences, 3(15)
ON THE COMPARISON OF BAYESIAN INFORMATION CRITERION AND DRAPER S INFORMATION CRITERION IN SELECTION OF AN ASYMMETRIC PRICE RELATIONSHIP: BOOTSTRAP SIMULATION RESULTS Henry de-graft Acquah, Senior Lecturer
More informationBayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm
Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University
More informationResponse to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008
Journal of Machine Learning Research 9 (2008) 59-64 Published 1/08 Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:1 26, 2008 Jerome Friedman Trevor Hastie Robert
More informationHOW TO BE A BAYESIAN IN SAS: MODEL SELECTION UNCERTAINTY IN PROC LOGISTIC AND PROC GENMOD
HOW TO BE A BAYESIAN IN SAS: MODEL SELECTION UNCERTAINTY IN PROC LOGISTIC AND PROC GENMOD Ernest S. Shtatland, Sara Moore, Inna Dashevsky, Irina Miroshnik, Emily Cain, Mary B. Barton Harvard Medical School,
More informationHierarchy of Statistical Goals
Hierarchy of Statistical Goals Ideal goal of scientific study: Deterministic results Determine the exact value of a ment or population parameter Prediction: What will the value of a future observation
More informationMathematisches Forschungsinstitut Oberwolfach. Statistische und Probabilistische Methoden der Modellwahl
Mathematisches Forschungsinstitut Oberwolfach Report No. 47/2005 Statistische und Probabilistische Methoden der Modellwahl Organised by James O. Berger (Durham) Holger Dette (Bochum) Gabor Lugosi (Barcelona)
More informationBayesians methods in system identification: equivalences, differences, and misunderstandings
Bayesians methods in system identification: equivalences, differences, and misunderstandings Johan Schoukens and Carl Edward Rasmussen ERNSI 217 Workshop on System Identification Lyon, September 24-27,
More informationModel Evaluation using Grouped or Individual Data. Andrew L. Cohen. University of Massachusetts, Amherst. Adam N. Sanborn and Richard M.
Model Evaluation: R306 1 Running head: Model Evaluation Model Evaluation using Grouped or Individual Data Andrew L. Cohen University of Massachusetts, Amherst Adam N. Sanborn and Richard M. Shiffrin Indiana
More informationBayesian Inference Bayes Laplace
Bayesian Inference Bayes Laplace Course objective The aim of this course is to introduce the modern approach to Bayesian statistics, emphasizing the computational aspects and the differences between the
More informationBayesian and Frequentist Approaches
Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law
More informationNatural Scene Statistics and Perception. W.S. Geisler
Natural Scene Statistics and Perception W.S. Geisler Some Important Visual Tasks Identification of objects and materials Navigation through the environment Estimation of motion trajectories and speeds
More informationST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods
(10) Frequentist Properties of Bayesian Methods Calibrated Bayes So far we have discussed Bayesian methods as being separate from the frequentist approach However, in many cases methods with frequentist
More informationModel evaluation using grouped or individual data
Psychonomic Bulletin & Review 8, (4), 692-712 doi:.378/pbr..692 Model evaluation using grouped or individual data ANDREW L. COHEN University of Massachusetts, Amherst, Massachusetts AND ADAM N. SANBORN
More informationA Brief Introduction to Bayesian Statistics
A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon
More information2012 Course : The Statistician Brain: the Bayesian Revolution in Cognitive Science
2012 Course : The Statistician Brain: the Bayesian Revolution in Cognitive Science Stanislas Dehaene Chair in Experimental Cognitive Psychology Lecture No. 4 Constraints combination and selection of a
More informationHow to Choose the Wrong Model. Scott L. Zeger Department of Biostatistics Johns Hopkins Bloomberg School
How to Choose the Wrong Model Scott L. Zeger Department of Biostatistics Johns Hopkins Bloomberg School What is a model? Questions Which is the best (true, right) model? How can you choose a useful model?
More informationSensory Cue Integration
Sensory Cue Integration Summary by Byoung-Hee Kim Computer Science and Engineering (CSE) http://bi.snu.ac.kr/ Presentation Guideline Quiz on the gist of the chapter (5 min) Presenters: prepare one main
More informationNeurons and neural networks II. Hopfield network
Neurons and neural networks II. Hopfield network 1 Perceptron recap key ingredient: adaptivity of the system unsupervised vs supervised learning architecture for discrimination: single neuron perceptron
More informationHellinger Optimal Criterion and HPA- Optimum Designs for Model Discrimination and Probability-Based Optimality
International Journal of Mathematics and Statistics Invention (IJMSI) E-ISSN: 2321 4767 P-ISSN: 2321-4759 Volume 5 Issue 2 February. 2017 PP-71-77 Hellinger Optimal Criterion and HPA- Optimum Designs for
More informationMichael Hallquist, Thomas M. Olino, Paul A. Pilkonis University of Pittsburgh
Comparing the evidence for categorical versus dimensional representations of psychiatric disorders in the presence of noisy observations: a Monte Carlo study of the Bayesian Information Criterion and Akaike
More informationFundamental Clinical Trial Design
Design, Monitoring, and Analysis of Clinical Trials Session 1 Overview and Introduction Overview Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington February 17-19, 2003
More information1. What is the I-T approach, and why would you want to use it? a) Estimated expected relative K-L distance.
Natural Resources Data Analysis Lecture Notes Brian R. Mitchell VI. Week 6: A. As I ve been reading over assignments, it has occurred to me that I may have glossed over some pretty basic but critical issues.
More informationLecture 2: Learning and Equilibrium Extensive-Form Games
Lecture 2: Learning and Equilibrium Extensive-Form Games III. Nash Equilibrium in Extensive Form Games IV. Self-Confirming Equilibrium and Passive Learning V. Learning Off-path Play D. Fudenberg Marshall
More informationContributions to Brain MRI Processing and Analysis
Contributions to Brain MRI Processing and Analysis Dissertation presented to the Department of Computer Science and Artificial Intelligence By María Teresa García Sebastián PhD Advisor: Prof. Manuel Graña
More informationHow to Choose the Wrong Model. Scott L. Zeger Department of Biostatistics Johns Hopkins Bloomberg School
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationClassification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang
Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression
More informationBayesian Models for Combining Data Across Domains and Domain Types in Predictive fmri Data Analysis (Thesis Proposal)
Bayesian Models for Combining Data Across Domains and Domain Types in Predictive fmri Data Analysis (Thesis Proposal) Indrayana Rustandi Computer Science Department Carnegie Mellon University March 26,
More informationSearch e Fall /18/15
Sample Efficient Policy Click to edit Master title style Search Click to edit Emma Master Brunskill subtitle style 15-889e Fall 2015 11 Sample Efficient RL Objectives Probably Approximately Correct Minimizing
More informationBeau Cronin, Ian H. Stevenson, Mriganka Sur and Konrad P. Körding
Beau Cronin, Ian H. Stevenson, Mriganka Sur and Konrad P. Körding J Neurophysiol 103:591-602, 2010. First published Nov 4, 2009; doi:10.1152/jn.00379.2009 You might find this additional information useful...
More informationWhat is probability. A way of quantifying uncertainty. Mathematical theory originally developed to model outcomes in games of chance.
Outline What is probability Another definition of probability Bayes Theorem Prior probability; posterior probability How Bayesian inference is different from what we usually do Example: one species or
More informationYou must answer question 1.
Research Methods and Statistics Specialty Area Exam October 28, 2015 Part I: Statistics Committee: Richard Williams (Chair), Elizabeth McClintock, Sarah Mustillo You must answer question 1. 1. Suppose
More informationModel calibration and Bayesian methods for probabilistic projections
ETH Zurich Reto Knutti Model calibration and Bayesian methods for probabilistic projections Reto Knutti, IAC ETH Toy model Model: obs = linear trend + noise(variance, spectrum) 1) Short term predictability,
More informationComputer Age Statistical Inference. Algorithms, Evidence, and Data Science. BRADLEY EFRON Stanford University, California
Computer Age Statistical Inference Algorithms, Evidence, and Data Science BRADLEY EFRON Stanford University, California TREVOR HASTIE Stanford University, California ggf CAMBRIDGE UNIVERSITY PRESS Preface
More informationA Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models
A Decision-Theoretic Approach to Evaluating Posterior Probabilities of Mental Models Jonathan Y. Ito and David V. Pynadath and Stacy C. Marsella Information Sciences Institute, University of Southern California
More informationA Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China
A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some
More informationLecture 2: Foundations of Concept Learning
Lecture 2: Foundations of Concept Learning Cognitive Systems - Machine Learning Part I: Basic Approaches to Concept Learning Version Space, Candidate Elimination, Inductive Bias last change October 18,
More informationBayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis
Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis Thesis Proposal Indrayana Rustandi April 3, 2007 Outline Motivation and Thesis Preliminary results: Hierarchical
More informationSelection of Linking Items
Selection of Linking Items Subset of items that maximally reflect the scale information function Denote the scale information as Linear programming solver (in R, lp_solve 5.5) min(y) Subject to θ, θs,
More informationHybrid HMM and HCRF model for sequence classification
Hybrid HMM and HCRF model for sequence classification Y. Soullard and T. Artières University Pierre and Marie Curie - LIP6 4 place Jussieu 75005 Paris - France Abstract. We propose a hybrid model combining
More informationHypothesis-Driven Research
Hypothesis-Driven Research Research types Descriptive science: observe, describe and categorize the facts Discovery science: measure variables to decide general patterns based on inductive reasoning Hypothesis-driven
More informationMTAT Bayesian Networks. Introductory Lecture. Sven Laur University of Tartu
MTAT.05.113 Bayesian Networks Introductory Lecture Sven Laur University of Tartu Motivation Probability calculus can be viewed as an extension of classical logic. We use many imprecise and heuristic rules
More informationLecture 13: Finding optimal treatment policies
MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 13: Finding optimal treatment policies Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Peter Bodik for slides on reinforcement learning) Outline
More informationSignal Detection Theory and Bayesian Modeling
Signal Detection Theory and Bayesian Modeling COGS 202: Computational Modeling of Cognition Omar Shanta, Shuai Tang, Gautam Reddy, Reina Mizrahi, Mehul Shah Detection Theory and Psychophysics: A Review
More informationComputational Cognitive Neuroscience
Computational Cognitive Neuroscience Computational Cognitive Neuroscience Computational Cognitive Neuroscience *Computer vision, *Pattern recognition, *Classification, *Picking the relevant information
More informationEXPLORATION FLOW 4/18/10
EXPLORATION Peter Bossaerts CNS 102b FLOW Canonical exploration problem: bandits Bayesian optimal exploration: The Gittins index Undirected exploration: e-greedy and softmax (logit) The economists and
More informationSUPPLEMENTAL MATERIAL
1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across
More informationFor general queries, contact
Much of the work in Bayesian econometrics has focused on showing the value of Bayesian methods for parametric models (see, for example, Geweke (2005), Koop (2003), Li and Tobias (2011), and Rossi, Allenby,
More informationA Race Model of Perceptual Forced Choice Reaction Time
A Race Model of Perceptual Forced Choice Reaction Time David E. Huber (dhuber@psych.colorado.edu) Department of Psychology, 1147 Biology/Psychology Building College Park, MD 2742 USA Denis Cousineau (Denis.Cousineau@UMontreal.CA)
More informationCognitive Modeling. Lecture 12: Bayesian Inference. Sharon Goldwater. School of Informatics University of Edinburgh
Cognitive Modeling Lecture 12: Bayesian Inference Sharon Goldwater School of Informatics University of Edinburgh sgwater@inf.ed.ac.uk February 18, 20 Sharon Goldwater Cognitive Modeling 1 1 Prediction
More informationA Race Model of Perceptual Forced Choice Reaction Time
A Race Model of Perceptual Forced Choice Reaction Time David E. Huber (dhuber@psyc.umd.edu) Department of Psychology, 1147 Biology/Psychology Building College Park, MD 2742 USA Denis Cousineau (Denis.Cousineau@UMontreal.CA)
More informationBayesian Networks Representation. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University
Bayesian Networks Representation Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University March 16 th, 2005 Handwriting recognition Character recognition, e.g., kernel SVMs r r r a c r rr
More informationResearch Questions, Variables, and Hypotheses: Part 2. Review. Hypotheses RCS /7/04. What are research questions? What are variables?
Research Questions, Variables, and Hypotheses: Part 2 RCS 6740 6/7/04 1 Review What are research questions? What are variables? Definition Function Measurement Scale 2 Hypotheses OK, now that we know how
More informationAdvanced Bayesian Models for the Social Sciences. TA: Elizabeth Menninga (University of North Carolina, Chapel Hill)
Advanced Bayesian Models for the Social Sciences Instructors: Week 1&2: Skyler J. Cranmer Department of Political Science University of North Carolina, Chapel Hill skyler@unc.edu Week 3&4: Daniel Stegmueller
More informationAClass: A Simple, Online Probabilistic Classifier. Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL
AClass: A Simple, Online Probabilistic Classifier Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL AClass: A Simple, Online Probabilistic Classifier or How I learned to stop worrying
More informationComparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation
Comparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation U. Baid 1, S. Talbar 2 and S. Talbar 1 1 Department of E&TC Engineering, Shri Guru Gobind Singhji
More informationOn the diversity principle and local falsifiability
On the diversity principle and local falsifiability Uriel Feige October 22, 2012 1 Introduction This manuscript concerns the methodology of evaluating one particular aspect of TCS (theoretical computer
More informationMark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA
Journal of Statistical Science and Application (014) 85-9 D DAV I D PUBLISHING Practical Aspects for Designing Statistically Optimal Experiments Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis,
More information10CS664: PATTERN RECOGNITION QUESTION BANK
10CS664: PATTERN RECOGNITION QUESTION BANK Assignments would be handed out in class as well as posted on the class blog for the course. Please solve the problems in the exercises of the prescribed text
More informationMarcus Hutter Canberra, ACT, 0200, Australia
Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ Australian National University Abstract The approaches to Artificial Intelligence (AI) in the last century may be labelled as (a) trying
More informationResponse to Comment on Cognitive Science in the field: Does exercising core mathematical concepts improve school readiness?
Response to Comment on Cognitive Science in the field: Does exercising core mathematical concepts improve school readiness? Authors: Moira R. Dillon 1 *, Rachael Meager 2, Joshua T. Dean 3, Harini Kannan
More informationOscillatory Neural Network for Image Segmentation with Biased Competition for Attention
Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention Tapani Raiko and Harri Valpola School of Science and Technology Aalto University (formerly Helsinki University of
More informationPredicting the Truth: Overcoming Problems with Poppers Verisimilitude Through Model Selection Theory
Acta Cogitata Volume 4 Article 5 : Overcoming Problems with Poppers Verisimilitude Through Model Selection Theory K. Raleigh Hanson Washington State University Follow this and additional works at: http://commons.emich.edu/ac
More informationCombining Risks from Several Tumors Using Markov Chain Monte Carlo
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln U.S. Environmental Protection Agency Papers U.S. Environmental Protection Agency 2009 Combining Risks from Several Tumors
More informationModel reconnaissance: discretization, naive Bayes and maximum-entropy. Sanne de Roever/ spdrnl
Model reconnaissance: discretization, naive Bayes and maximum-entropy Sanne de Roever/ spdrnl December, 2013 Description of the dataset There are two datasets: a training and a test dataset of respectively
More informationAdvanced Bayesian Models for the Social Sciences
Advanced Bayesian Models for the Social Sciences Jeff Harden Department of Political Science, University of Colorado Boulder jeffrey.harden@colorado.edu Daniel Stegmueller Department of Government, University
More informationAdaptive EAP Estimation of Ability
Adaptive EAP Estimation of Ability in a Microcomputer Environment R. Darrell Bock University of Chicago Robert J. Mislevy National Opinion Research Center Expected a posteriori (EAP) estimation of ability,
More informationSelection and Combination of Markers for Prediction
Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe
More informationSCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA. Sistemi di Elaborazione dell Informazione. Introduzione. Ruggero Donida Labati
SCUOLA DI SPECIALIZZAZIONE IN FISICA MEDICA Sistemi di Elaborazione dell Informazione Introduzione Ruggero Donida Labati Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy http://homes.di.unimi.it/donida
More informationBayesian Analysis by Simulation
408 Resampling: The New Statistics CHAPTER 25 Bayesian Analysis by Simulation Simple Decision Problems Fundamental Problems In Statistical Practice Problems Based On Normal And Other Distributions Conclusion
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationPsychology Research Process
Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:
More informationWhat is a probability? Two schools in statistics: frequentists and Bayesians.
Faculty of Life Sciences Frequentist and Bayesian statistics Claus Ekstrøm E-mail: ekstrom@life.ku.dk Outline 1 Frequentists and Bayesians What is a probability? Interpretation of results / inference 2
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationRobust learning of inhomogeneous PMMs
Ralf Eggeling Teemu Roos Petri Myllymäki Ivo Grosse Helsinki Insitute for Helsinki Insitute for Information Technology Information Technology Finland Finland Martin Luther University Halle-Wittenberg Germany
More informationModelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better solution?
Thorlund et al. BMC Medical Research Methodology 2013, 13:2 RESEARCH ARTICLE Open Access Modelling heterogeneity variances in multiple treatment comparison meta-analysis Are informative priors the better
More informationThe Statistical Analysis of Failure Time Data
The Statistical Analysis of Failure Time Data Second Edition JOHN D. KALBFLEISCH ROSS L. PRENTICE iwiley- 'INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents Preface xi 1. Introduction 1 1.1
More informationBREAST CANCER EPIDEMIOLOGY MODEL:
BREAST CANCER EPIDEMIOLOGY MODEL: Calibrating Simulations via Optimization Michael C. Ferris, Geng Deng, Dennis G. Fryback, Vipat Kuruchittham University of Wisconsin 1 University of Wisconsin Breast Cancer
More informationMeasurement Error in Nonlinear Models
Measurement Error in Nonlinear Models R.J. CARROLL Professor of Statistics Texas A&M University, USA D. RUPPERT Professor of Operations Research and Industrial Engineering Cornell University, USA and L.A.
More informationKARUN ADUSUMILLI OFFICE ADDRESS, TELEPHONE & Department of Economics
LONDON SCHOOL OF ECONOMICS & POLITICAL SCIENCE Placement Officer: Professor Wouter Den Haan +44 (0)20 7955 7669 w.denhaan@lse.ac.uk Placement Assistant: Mr John Curtis +44 (0)20 7955 7545 j.curtis@lse.ac.uk
More informationIndividual Differences in Attention During Category Learning
Individual Differences in Attention During Category Learning Michael D. Lee (mdlee@uci.edu) Department of Cognitive Sciences, 35 Social Sciences Plaza A University of California, Irvine, CA 92697-5 USA
More informationGame Theoretic Statistics
Game Theoretic Statistics Glenn Shafer CWI, Amsterdam, December 17, 2018 Game theory as an alternative foundation for probability Using the game theoretic foundation in statistics 1 May 2019 2 ON THE BACK
More informationAn Efficient Method for the Minimum Description Length Evaluation of Deterministic Cognitive Models
An Efficient Method for the Minimum Description Length Evaluation of Deterministic Cognitive Models Michael D. Lee (michael.lee@adelaide.edu.au) Department of Psychology, University of Adelaide South Australia,
More informationStatistical Decision Theory and Bayesian Analysis
Statistical Decision Theory and Bayesian Analysis Chapter 3: Prior Information and Subjective Probability Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 11 MayApril 2015 Reference 3, James
More informationEmotion Recognition using a Cauchy Naive Bayes Classifier
Emotion Recognition using a Cauchy Naive Bayes Classifier Abstract Recognizing human facial expression and emotion by computer is an interesting and challenging problem. In this paper we propose a method
More informationBayesian growth mixture models to distinguish hemoglobin value trajectories in blood donors
Bayesian growth mixture models to distinguish hemoglobin value trajectories in blood donors Kazem Nasserinejad 1 Joost van Rosmalen 1 Mireille Baart 2 Katja van den Hurk 2 Dimitris Rizopoulos 1 Emmanuel
More informationDescription of components in tailored testing
Behavior Research Methods & Instrumentation 1977. Vol. 9 (2).153-157 Description of components in tailored testing WAYNE M. PATIENCE University ofmissouri, Columbia, Missouri 65201 The major purpose of
More information