Interpreting Confidence Intervals

Similar documents
Using Statistical Intervals to Assess System Performance Best Practice

STA Module 9 Confidence Intervals for One Population Mean

Generalization and Theory-Building in Software Engineering Research

The random variable must be a numeric measure resulting from the outcome of a random experiment.

Slide 1 - Introduction to Statistics Tutorial: An Overview Slide notes

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

Agreement Coefficients and Statistical Inference

Bayesian and Frequentist Approaches

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

STATISTICAL INFERENCE 1 Richard A. Johnson Professor Emeritus Department of Statistics University of Wisconsin

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

9. Interpret a Confidence level: "To say that we are 95% confident is shorthand for..

Chapter 8 Estimating with Confidence. Lesson 2: Estimating a Population Proportion

STATISTICS & PROBABILITY

Linking Assessments: Concept and History

A Case Study: Two-sample categorical data

The Institute of Chartered Accountants of Sri Lanka

Module 28 - Estimating a Population Mean (1 of 3)

Practitioner s Guide To Stratified Random Sampling: Part 1

Unit 1 Exploring and Understanding Data

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball

Chapter 8 Estimating with Confidence. Lesson 2: Estimating a Population Proportion

STA Module 1 Introduction to Statistics and Data

A Brief Introduction to Bayesian Statistics

Chapter 8: Estimating with Confidence

USE AND MISUSE OF MIXED MODEL ANALYSIS VARIANCE IN ECOLOGICAL STUDIES1

Proposal to reduce continuing professional development (CPD) audit sample size from 5% to 2.5% from June 2009 and CPD update

STA Module 1 The Nature of Statistics. Rev.F07 1

STA Rev. F Module 1 The Nature of Statistics. Learning Objectives. Learning Objectives (cont.

Kepler tried to record the paths of planets in the sky, Harvey to measure the flow of blood in the circulatory system, and chemists tried to produce

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Data collection, summarizing data (organization and analysis of data) The drawing of inferences about a population from a sample taken from

Survival Skills for Researchers. Study Design

Implicit Information in Directionality of Verbal Probability Expressions

Instructions for doing two-sample t-test in Excel

Chapter 8: Estimating with Confidence

Elaboration for p. 20:

Chapter 12. The One- Sample

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

STA 291 Lecture 4 Jan 26, 2010

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Statistical Issues in Road Safety Part I: Uncertainty, Variability, Sampling

Welcome to this series focused on sources of bias in epidemiologic studies. In this first module, I will provide a general overview of bias.

6. Unusual and Influential Data

Chapter 5: Field experimental designs in agriculture

Basic Concepts in Research and DATA Analysis

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Methods for Determining Random Sample Size

Chapter 19. Confidence Intervals for Proportions. Copyright 2010 Pearson Education, Inc.

Fixed-Effect Versus Random-Effects Models

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Statistical Decision Theory and Bayesian Analysis

STA Learning Objectives. What is Population Proportion? Module 7 Confidence Intervals for Proportions

STA Module 7 Confidence Intervals for Proportions

AP STATISTICS 2013 SCORING GUIDELINES

SEVERAL aspects of performance are important characteristics

Section 1: Exploring Data

10.1 Estimating with Confidence. Chapter 10 Introduction to Inference

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Creative Commons Attribution-NonCommercial-Share Alike License

Lecture Outline Biost 517 Applied Biostatistics I

Statistics and Probability

Inference About Magnitudes of Effects

9 research designs likely for PSYC 2100

Math 243 Sections , 6.1 Confidence Intervals for ˆp

Creative Commons Attribution-NonCommercial-Share Alike License

MATH& 146 Lesson 6. Section 1.5 Experiments

Bayesian Analysis by Simulation

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials

How to interpret results of metaanalysis

Bayesian Confidence Intervals for Means and Variances of Lognormal and Bivariate Lognormal Distributions

Chapter 6: Confidence Intervals

Using historical data for Bayesian sample size determination

Chapter 1 Data Types and Data Collection. Brian Habing Department of Statistics University of South Carolina. Outline

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

MAY 1, 2001 Prepared by Ernest Valente, Ph.D.

CHAPTER 8 Estimating with Confidence

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Chapter 19. Confidence Intervals for Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

LOCF and MMRM: Thoughts on Comparisons

The Confidence Interval. Finally, we can start making decisions!

Introduction to Statistics

The Effect of Extremes in Small Sample Size on Simple Mixed Models: A Comparison of Level-1 and Level-2 Size

Chapter 1: Data Collection Pearson Prentice Hall. All rights reserved

Paper presentation: Preliminary Guidelines for Empirical Research in Software Engineering (Kitchenham et al. 2002)

Algebra 2 P Experimental Design 11 5 Margin of Error

Math 2a: Lecture 1. Agenda. Course organization. History of the subject. Why is the subject relevant today? First examples

Introduction. Lecture 1. What is Statistics?

January 2, Overview

Introduction. Patrick Breheny. January 10. The meaning of probability The Bayesian approach Preview of MCMC methods

COMMITMENT &SOLUTIONS UNPARALLELED. Assessing Human Visual Inspection for Acceptance Testing: An Attribute Agreement Analysis Case Study

Chapter-2 RESEARCH DESIGN

Chapter 17 Sensitivity Analysis and Model Validation

I. Introduction and Data Collection B. Sampling. 1. Bias. In this section Bias Random Sampling Sampling Error

Introduction to ROC analysis

Data Collection. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

STP226 Brief Class Notes Instructor: Ela Jackiewicz

Section Introduction

Transcription:

STAT COE-Report-0-014 Interpreting Confidence Intervals Authored by: Jennifer Kensler, PhD Luis A. Cortes 4 December 014 Revised 1 October 018 The goal of the STAT COE is to assist in developing rigorous, defensible test strategies to more effectively quantify and characterize system performance and provide information that reduces risk. This and other COE products are available at www.afit.edu/stat. STAT Center of Excellence 950 Hobson Way Wright-Patterson AFB, OH 45433

STAT COE-Report-0-014 Table of Contents Executive Summary... 1 Introduction... 1 Statistical Inference... 1 An Illustrative Example... DoD Testing... Foundations of Confidence Intervals... Confidence Interval Fundamentals... Sample Size for Confidence Intervals... 5 Calculating a Confidence Interval... 5 Planning... 5 Analyzing the Data... 6 Conclusion... 7 References... 7 Revision 1, 1 Oct 018: Formatting and minor typographical/grammatical edits.

STAT COE-Report-0-014 Executive Summary This best practice introduces confidence intervals, which use observed data to obtain an interval estimate of an unknown population parameter such as a mean or proportion. Confidence intervals are a key tool of inferential statistics and appear in many contexts. While there are many types of confidence intervals, the underlying principles are the same. This best practice provides an overview of confidence intervals and introduces key concepts and terminology. It highlights the importance of understanding and correctly interpreting a confidence interval as well as common errors and misunderstandings. A simple example of calculating a confidence interval for a proportion illustrates the concepts presented in the context of Department of Defense (DoD) testing. Keywords: confidence interval, statistical inference, confidence level, sample size, margin of error Introduction Statistical Inference Often one is interested in drawing conclusions about a population, but examining the entire population is usually impractical or impossible. Statistical inference involves using information obtained from a sample to draw conclusions about a larger population. Statistical inference is key to having rigorous and adequate DoD tests because we are often interested in future performance of a system under similar conditions. Since we do not know what the future holds we are dependent on statistical inference to make statements about future performance. Therefore, the sample must be representative of the population; this objective is usually achieved by obtaining a randomly selected sample. Figure 1 illustrates that a sample is a subset of the population. A confidence interval, one form of statistical inference, uses data observed from a sample to estimate a population parameter. The data one observes will be different depending on which individuals of the population the sample captures. Confidence intervals address this random sampling error (i.e. variation) and allow one to estimate the value of a single parameter or function of parameters. Figure 1. Sample vs. population Page 1

STAT COE-Report-0-014 An Illustrative Example The Navy wants to estimate the probability that a missile will hit its target. The population contains all missiles, and the proportion of missiles in the entire population that hit their target is denoted by π. The proportion of missiles in the sample that hit their target is denoted by p (read p-hat ). While p will be used to estimate π, the Navy knows that p will not be exactly π. Therefore, the Navy wants to use the sample data to come up with an interval estimate likely to contain π. This best practice uses the simple missile example to illustrate the concept of a confidence interval. Usually a continuous response variable (e.g. miss distance from target) is preferable to a binary response variable (e.g. hit or miss). For a discussion on the advantages and disadvantages of continuous and binary response variables, see Ortiz (013). Furthermore, it is often necessary to examine the performance of the missile across the operating environment instead of at a single point. DoD Testing Throughout the acquisition lifecycle many questions must be answered regarding the performance and suitability of a system. A common objective is to summarize the sample data with a mean value and to estimate the real but unknown population mean. What is the mean miles between system failures? What is the mean target location error? What is the maximum range of a weapon? What is the mean detection range of a radar? Confidence intervals are a method for answering these questions. Foundations of Confidence Intervals Confidence Interval Fundamentals Estimation is a key objective of many statistical analyses. A point estimate is a single numerical value used to estimate a population parameter. For example, the sample proportion, p, is a point estimate used to estimate the population proportion, π. A point estimate is our best guess estimate for the population parameter; however, we know that the point estimate will not be exactly the population parameter. A confidence interval provides an estimate of the population parameter and the accompanying confidence level indicates the proportion of intervals that will cover the parameter. In other words, a confidence interval provides a range of values that would contain the true population parameter for a specified confidence level. A 100(1 )% confidence interval is an interval estimate where if we could repeat the process of interval estimation an infinite number of times the intervals would contain the true value of the parameter 100(1 )% of the time. For example, a 95% confidence interval means that in the long run 95% of confidence intervals constructed in this manner will contain the true parameter. However, we cannot know whether the interval estimate we calculated is one of the intervals that contains the true parameter or one of the intervals that does not. Figure shows 90% confidence intervals for 100 samples (each with 0 observations) drawn from a binomial population with π = 0.8. Note that 90 of the intervals cover the population proportion of 0.8. Page

STAT COE-Report-0-014 Figure : 90% confidence intervals Note that the probability refers to the method, not the individual interval. A 90% confidence interval does not mean there is a 90% probability that the parameter is in the interval. The interval either covers the parameter or it does not. So the probability that the parameter is in the interval is either 1 or 0. Consider a telecommunications system that correctly receives 99% of messages, thus there is a 99% probability that a message you send will be received. However, after you send the message there is no longer any randomness. The message was either received or not received. In the case of the confidence interval the data sample is what is random. Hence, once the data is collected and the confidence interval constructed, it no longer makes sense to talk about the probability of a parameter being in the interval. The interval either covers the parameter or it does not. Salsburg (001) explains Note that, to Neyman, the probability associated with the confidence interval was not the probability that we are correct. It was the frequency of correct statements that a statistician who uses his method will make in the long run. It says nothing about how accurate the current estimate is. A confidence interval is valid or accurate if it contains the population parameter. On the other hand, the narrower the confidence interval the more precise it is. Figure 3 illustrates the effect of changing the confidence level to 80%. Compared to the intervals in Figure, the intervals in Figure 3 are more precise (narrower), but less accurate (more intervals fail to cover 0.8). Note that a higher confidence level requires a wider interval in order to cover the parameter more often. Page 3

STAT COE-Report-0-014 Figure 3: 80% confidence intervals Figure 4: 90% confidence intervals Page 4

STAT COE-Report-0-014 Figure 4 shows 90% confidence intervals from samples of size 50. The intervals in Figure 4 are more precise that the intervals in Figure, but have the same level of accuracy. The narrower intervals reflect the increased information from the larger sample. Sample Size for Confidence Intervals One may be interested in estimating a parameter with a confidence interval of a certain precision. That is, we may want to control the width of the confidence interval. In the case of a proportion, quantities that affect the width of the confidence interval include the confidence level, sample size, and the sample proportion. Using the confidence level, desired interval width, and a planning value for the proportion; the necessary sample size can be calculated. Calculating a Confidence Interval Planning The Navy wants to estimate the proportion of missiles that hit their targets. Currently, the information the Navy has suggests that a missile has an 80% probability of hitting the target, this value is used as the planning value for the population proportion. The navy decides to use a 90% confidence level and decides to compare the sample sizes required to obtain various interval widths (Table 1). Table 1: Required sample sizes Interval Half-Width Sample Size 0.05 17 0.10 43 0.15 18 0.0 10 The sample size, n, required to obtain a Wilson score 100(1 )% confidence interval with a halfwidth e (two-sided confidence interval for continuous variables) is: n = z 1 [π 0 (1 π 0 ) e + π 0 (1 π 0 ) 4e π 0 (1 π 0 ) + e ] e where z 1 is the 1 quantile of the standard normal distribution and π 0 is the planning value of the proportion. If no information is known about proportion, then π 0 = 1 should be used. The interval with the half-width of 0.05 is ruled out because the sample size of 17 is too expensive. The interval with a half-width of 0.0 is ruled out because it is too wide to be useful. Given their current budget, the Navy decides to use a sample size of 18 to obtain an interval with a half-width of 0.15. Page 5

STAT COE-Report-0-014 Analyzing the Data The Navy runs the test and observes 13 of 15 missiles hitting the target. Figure 5 shows the statistical output from JMP (Cary, NC). The sample proportion of hits is p = 16 = 0.89. The 90% Wilson score confidence interval for the 18 population proportion of hits is (0.71, 0.96). The Wilson score 100(1 )% confidence interval for a population proportion (two-sided confidence interval for binomial variables) is: p ( n + z 1 n ) + 1 z 1 ( ) ± z n + z 1 1 1 [p (1 p ) ( n + z 1 n ) + 1 z 1 4 ( )] n + z 1 n + z 1 where z 1 is the 1 quantile of the standard normal distribution, p is the sample proportion, and n is the sample size. Figure 5: Statistical output Page 6

STAT COE-Report-0-014 Conclusion Confidence intervals allow us to take information from a sample and use it to form an interval estimate for a population parameter or function of parameters. In DoD testing, confidence intervals are often calculated for almost every performance measure (such as mean time between failures, proportions, etc.) required for the evaluation. Up front planning scopes the sample size needed to obtain an interval of a certain width. This planning helps avoid the situation where not enough data is collected resulting in a confidence interval too wide to be useful. Regardless of the particular interval being calculated, it is important to correctly interpret the confidence interval. References Agresti, Alan. Categorical Data Analysis. nd ed., John Wiley & Sons, 00. Ortiz, Francisco. Categorical Data in a Designed Experiment Part 1: Avoiding Categorical Data. Scientific Test and Analysis Techniques Center of Excellence (STAT COE), 014. Salsburg, D. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. W.H. Freeman and Company, 001. Sullivan, Michael. Statistics: Informed Decisions Using Data. Prentice Hall/Pearson Education, 004. Page 7