Multi-Stage Stratified Sampling for the Design of Large Scale Biometric Systems

Similar documents
AP Statistics Exam Review: Strand 2: Sampling and Experimentation Date:

Unit 1 Exploring and Understanding Data

Dr. Allen Back. Sep. 30, 2016

BIOSTATISTICAL METHODS

Analysis of TB prevalence surveys

SAMPLING AND SCREENING PROBLEMS IN RHEUMATIC HEART DISEASE CASE. FINDING STUDY

Chapter 3. Producing Data

Chapter 1: Exploring Data

Big Data Challenges & Opportunities. L. Miriam Dickinson, PhD

CHL 5225 H Advanced Statistical Methods for Clinical Trials. CHL 5225 H The Language of Clinical Trials

Face Recognition Performance Under Aging

Chapter 14: More Powerful Statistical Methods

Dr. Allen Back. Oct. 7, 2016

INTERNAL VALIDITY, BIAS AND CONFOUNDING

Sampling. (James Madison University) January 9, / 13

Ch. 1 Collecting and Displaying Data

aps/stone U0 d14 review d2 teacher notes 9/14/17 obj: review Opener: I have- who has

1.4 - Linear Regression and MS Excel

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

Section on Survey Research Methods JSM 2009

Identity Verification Using Iris Images: Performance of Human Examiners

Selection and estimation in exploratory subgroup analyses a proposal

IAPT: Regression. Regression analyses

heterogeneity in meta-analysis stats methodologists meeting 3 September 2015

A NEW TRIAL DESIGN FULLY INTEGRATING BIOMARKER INFORMATION FOR THE EVALUATION OF TREATMENT-EFFECT MECHANISMS IN PERSONALISED MEDICINE

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Vocabulary. Bias. Blinding. Block. Cluster sample

Examining Relationships Least-squares regression. Sections 2.3

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Statistical Power Sampling Design and sample Size Determination

Weight Adjustment Methods using Multilevel Propensity Models and Random Forests

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Statistics Mathematics 243

Estimating HIV incidence in the United States from HIV/AIDS surveillance data and biomarker HIV test results

Supplementary Appendix

Issues in Randomization

Pre-Calculus Multiple Choice Questions - Chapter S4

Supplementary Material. other ethnic backgrounds. All but six of the yoked pairs were matched on ethnicity. Results

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

Mediation Analysis With Principal Stratification

Practitioner s Guide To Stratified Random Sampling: Part 1

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Applied Econometrics for Development: Experiments II

MA 250 Probability and Statistics. Nazar Khan PUCIT Lecture 7

Chapter 1: The Nature of Probability and Statistics

Critical Appraisal Series

Observation Studies, Sampling Designs and Bias

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Nature Methods: doi: /nmeth.3115

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

Observational study is a poor way to gauge the effect of an intervention. When looking for cause effect relationships you MUST have an experiment.

AP Stats Review for Midterm

Introduction to Statistics

Math 140 Introductory Statistics

Predicting clinical outcomes in neuroblastoma with genomic data integration

Supplementary Appendix

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

How many speakers? How many tokens?:

Types of Data. Systematic Reviews: Data Synthesis Professor Jodie Dodd 4/12/2014. Acknowledgements: Emily Bain Australasian Cochrane Centre

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Sample Size, Power and Sampling Methods

CARE Cross-project Collectives Analysis: Technical Appendix


Section 6.1 Sampling. Population each element (or person) from the set of observations that can be made (entire group)

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Greg Pond Ph.D., P.Stat.

Selection and Combination of Markers for Prediction

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP

Analysis of gene expression in blood before diagnosis of ovarian cancer

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Measuring cancer survival in populations: relative survival vs cancer-specific survival

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

Designing Studies of Diagnostic Imaging

etable 3.1: DIABETES Name Objective/Purpose

RESPONSE SURFACE MODELING AND OPTIMIZATION TO ELUCIDATE THE DIFFERENTIAL EFFECTS OF DEMOGRAPHIC CHARACTERISTICS ON HIV PREVALENCE IN SOUTH AFRICA

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

AP Statistics Practice Test Ch. 3 and Previous

In this second module in the clinical trials series, we will focus on design considerations for Phase III clinical trials. Phase III clinical trials

How to select study subjects using Sampling Technique

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Chapter 1 Overview. Created by Tom Wegleitner, Centreville, Virginia. Copyright 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Summary Report: The Effectiveness of Online Ads: A Field Experiment

Problems for Chapter 8: Producing Data: Sampling. STAT Fall 2015.

How to select study subjects using Sampling Technique

Chapter 1 - Sampling and Experimental Design

ExperimentalPhysiology

DATA is derived either through. Self-Report Observation Measurement

Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1. Chapter 1 Chapter 1

Design, Sampling, and Probability

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study

Subject index. bootstrap...94 National Maternal and Infant Health Study (NMIHS) example

Transcription:

Multi-Stage Stratified Sampling for the Design of Large Scale Biometric Systems Jad Ramadan, Mark Culp, Ken Ryan, Bojan Cukic West Virginia University 1

Problem How to create a set of biometric samples for research? How many subjects to include in a sample? How are subjects chosen? Performance prediction requires adequate population samples too. Convenience sampling introduces strong bias. Alternative sampling methods have cost and practicality implications for data collections. 2

Stratification Benefits Stratification - the process of dividing population into homogeneous, mutually exclusive subgroups. Multi-stage stratified sampling design increases trustworthiness of match rate estimates Lower costs and smaller performance prediction errors. We address the following specific questions: 1. How can a researcher use existing large datasets to generate stratified samples for the purpose of biometric performance prediction? 2. What are practical benefits of stratification? 3

Our Approach The process: Face Images Performance Prediction Grouping Data by Covariate Sample Size Estimation Get Sample Deployed Biometric System Feature Extraction Preprocessing Matching / Decision Making We investigate the Performance Prediction phase. Sample size estimation approach for Rank 1 identification rate estimation. 4

Stratified Sampling in Biometrics Stratified Sampling first partitions the population into L available groups (e.g. males, females). Within each group, a sample is created by taking an independent simple random sample. Goal: Participants within each group are as similar as possible. Individual stratum variances are minimized. What is the criteria for effective grouping? There should be clear differences in match rates between strata. May be algorithm dependent! Strata based on eye color, facial hair or hair color do not exhibit this. In face recognition, age group, ethnicity and gender could be used as strata. 5

Stratified and Simple Random Sampling: Difference Simple Random Sampling takes a sample from a population in a way so that each sample has the same chance of being selected. In stratified random sampling, the population is first separated into non-overlapping strata. A sample is created by simple random sampling from each stratum. Sample size from each strata may differ. 6

Intuition: How tall are NBA players? # Players: 434; Mean height: 79.04in; Variance: 12.9 in 2 How many players must be sampled to estimate the average height to within one inch? Grouping the players by position reduces variance 5.94 in. 2 (guards), 2.32 in. 2 (forwards), 1.85 in. 2 (centers) Simple random sampling: 47 observations. Stratified sampling: 13 (optimally allocated) observations. A stratified sample of 7 guards, 4 forwards, and 2 centers selected from any NBA season will yield an estimate of the mean height from that season, within an inch, 95% of the time. 7

Large Face Data Sets In large data sets, the number of false matches tends to increase. Imposter score correlations close to 0 within each cluster helps reduce the FMR. We investigated imposter score correlations within the strata (e.g. African American females, Caucasian males). Pinellas County Sherriff s Office data set. Most of the subjects are white males. 2.5K each for male/female and black/white demographics. Experiment: FaceVACS 8.6.0, 10,000x10,000 match scores. 8

Genuine/Imposter Score Distributions FaceVACS Similarity Score Distribution FaceVACS Genuine Score Comparison Density Density Genuine Score FaceVACS Imposter Score Comparison Score distributions change with demographic information. Black female similarity scores exhibit a larger variance. Added uncertainty will have a significant impact in matching. Density Similarity Score Imposter Score 9

Cohort Interactions ROC curve for 4 cohort combinations Black/White ROC Curve Comparison GAR GAR FMR Male/Female ROC Curve Comparison If the variation in similarity scores among black females is reduced, And If no imposter score correlation existed, Black females would become more identifiable. GAR FMR FMR 10

Impostor Score Correlation Imposter Scores of 2 Black Females Imposter Scores of 2 White Females Subject 2 Subject 2 Subject 1 The correlation coefficient above is 0.595. Similarity scores between different, unrelated subjects exhibit almost a linear relationship. The matcher has difficulties differentiating between the black females. Subject 1 The correlation coefficient here is -0.006 (no relationship). This is desirable because the matcher is having a much easier time differentiating the individuals in these images. 11

Stratified Sampling, Correlation, Large Samples Stratified sampling assigns a higher weight to cohorts that are seen to cause difficulties in facial recognition Higher Variance Lower Variance Lower Variance Correlations among imposter scores of black females likely due to insufficient training with black female samples [Klare et al.] 12

Match Results Table: Genuine Accept Rates (GAR) at a fixed False Accept Rate of 0.01% Demographic Black Females Black Males White Females White Males Overall GAR 0.8048 0.87 0.8684 0.916 0.853 Grouped by Gender Male Female 94.4% 89.5% Grouped by Ethnicity Black White Hispanic 88.7% 94.4% 95.7% Grouped by Age 18-30 30-50 50-70 91.7% 94.6% 94.4% Large difference in GAR between black females and white males. Face Recognition Performance: Role of Demographic Information [Klare et al.] There seem to be extra interactions with gender and ethnicity that increase differences in match rates. Dynamic face matcher selection. 13

Sample Size Equations B represents a chosen bound. N (N k ) is overall sample (strata) size. p (p k ) is the GAR at FAR 0.01%. Stratified Random Sampling: n = L 4 N p 1 p ( ) k k k k = 1 L 2 2 NB + Np i i pi i= 1 4 (1 ) 2 Simple Random Sampling: n = 4 Np(1 p) 2 ( N 1) B + 4 p(1 p) 14

Data Stratification Required Sample Size (probes) Results using our 4 Cohorts as Strata SRS Stratified Note: The error bound for the plot above ranges from 0.9% to 1%. Table: Allocation of the sample based on Stratification Black Female Black Male White Female White male ~30% ~25% ~25% ~20% Stratified sampling, using the 4 cohorts of interest, now allows for 230 fewer tests to estimate performance within 1%. An added bonus: The next collection may emphasize the sampling of black females, the most troublesome cohort. 15

Results The total sample sizes below were obtained using an error bound of 1% Simple Random Sampling Total Size: 3341 Allocation: 843 black females 834 black males 842 white females 822 white males Estimated GAR of 85.3% at an FMR of 0.01%. Stratified Random Sampling Total Size: 3109 Allocation: 933 black females 777 black males 777 white females 622 white males Estimated GAR of 85.3% at an FMR of 0.01%. Stratified random sampling achieved the same performance using 232 fewer subjects. 16

Data Extrapolation The total sample sizes below were obtained using an error bound of 1%. Differences when predicting a population of one billion. From previous studies, we are assuming a GAR of 85.3% at.01% FMR SRS vs. Stratified Comparison Allocation for Stratified Sampling Required Sample Size 4,544 5,016 Difference: 472 1,136 909 1,363 Bound 17

Data Extrapolation Now, we calculate the necessary sample size using an error bound of 0.01%. How many samples from a population of 1 billion would we need to estimate the GAR at 0.01% FMR to within 0.0001? SRS vs. Stratified Comparison Allocation for Stratified Sampling Required Sample Size 43,430,444 47,760,886 Difference: 4,330,442 Required Sample Size 13,029,133 8,686,089 10,857,611 Bound Bound 18

Sample Size Reduction Required Sample Size Stratified sampling requires around 10% fewer subjects to achieve the same performance estimate, regardless of the chosen error bound. In general, the choice of error bound will not have an impact on the sample size reduction due to stratification. Bound 19

The Effect of Errors in Stratification In a simulation, 10% and 33% of African American population was reclassified as white and vice versa. Simulate the effects of an incorrect classification by an algorithm or experimenter. Results (baseline is the leftmost table): GAR at.01% FMR (no errors) GAR at.01% FMR (10% errors) 33% errors Black White Black White Black White 83.3 % 89.02 % 83.4 % 88.26 % 84.3 % 86.06 % Differences: Black White +0.1 % -0.76% Black White +1.0 % -2.96 % 20

Summary Applied a stratified sampling design to face recognition. Approach offers savings in performance prediction for large systems. Offers guidance for performance prediction from existing collections. Unbiased performance predictions from a stratified sample. Given valid assumptions, performance predictions are accurate The reward comes from the ability to allocate the sample. Investigated the effect of errors in demographic information. The strata seem robust to small strata misclassification. Should be extended to other biometric modalities. The role of matching algorithms. 21