Evaluation: Controlled Experiments. Title Text

Similar documents
Evaluation: Scientific Studies. Title Text

Experimental Research in HCI. Alma Leora Culén University of Oslo, Department of Informatics, Design

Psychology Research Process

Psychology Research Process

Introduction to research methods and critical appraisal WEEK 11

9 research designs likely for PSYC 2100

Lecture 18: Controlled experiments. April 14

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Human-Computer Interaction IS4300. I6 Swing Layout Managers due now

Research Methods & Design Outline. Types of research design How to choose a research design Issues in research design

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Who? What? What do you want to know? What scope of the product will you evaluate?

Quantitative Evaluation

Experiments. Outline. Experiment. Independent samples designs. Experiment Independent variable

AP Psychology -- Chapter 02 Review Research Methods in Psychology

Chapter 8. Learning Objectives 9/10/2012. Research Principles and Evidence Based Practice

Empirical Research Methods for Human-Computer Interaction. I. Scott MacKenzie Steven J. Castellucci

Biostatistics for Med Students. Lecture 1

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

HS Exam 1 -- March 9, 2006

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

Übung zur Vorlesung Informationsvisualisierung

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

Understandable Statistics

Research Design. Source: John W. Creswell RESEARCH DESIGN. Qualitative, Quantitative, and Mixed Methods Approaches Third Edition

The Beauty and Necessity of Good Research Design

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Statistics: Making Sense of the Numbers

copyright D. McCann, 2006) PSYCHOLOGY is

Designing Psychology Experiments: Data Analysis and Presentation

Still important ideas

CogSysIII Lecture 4/5: Empirical Evaluation of Software-Systems

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Political Science 15, Winter 2014 Final Review

STATISTICS & PROBABILITY

2 Critical thinking guidelines

Understanding Statistics for Research Staff!

Homework Exercises for PSYC 3330: Statistics for the Behavioral Sciences

Clever Hans the horse could do simple math and spell out the answers to simple questions. He wasn t always correct, but he was most of the time.

Research Landscape. Qualitative = Constructivist approach. Quantitative = Positivist/post-positivist approach Mixed methods = Pragmatist approach

SEMINAR ON SERVICE MARKETING

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Samples, Sample Size And Sample Error. Research Methodology. How Big Is Big? Estimating Sample Size. Variables. Variables 2/25/2018

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

Research Design Overview. Heather M. Gray, Ph.D. January 26, 2010 Research Methods for the Social Sciences: An Introductory Course

EMPIRICAL RESEARCH METHODS IN VISUALIZATION

UNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS6813 MODULE TEST 2

Chapter 3. Research Methodology. This chapter mentions the outline of research methodology and gives

Do not write your name on this examination all 40 best

Research Methods. for Business. A Skill'Building Approach SEVENTH EDITION. Uma Sekaran. and. Roger Bougie

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Research Methods in Human Computer Interaction by J. Lazar, J.H. Feng and H. Hochheiser (2010)

Lecture Notes Module 2

Variable Measurement, Norms & Differences

Conducting Research. Research Methods Chapter 1. Descriptive Research Methods. Conducting Research. Case Study

investigate. educate. inform.

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Collecting & Making Sense of

Topics. Experiment Terminology (Part 1)

Lecturer: Dr. Emmanuel Adjei Department of Information Studies Contact Information:

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

Designing Experiments... Or how many times and ways can I screw that up?!?

First author. Title: Journal

Survey Errors and Survey Costs

Intro to HCI evaluation. Measurement & Evaluation of HCC Systems

Figure: Presentation slides:

Unit 1 Exploring and Understanding Data

PRINCIPLES OF STATISTICS

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

The Research Enterprise in Psychology Chapter 2

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Methods for Determining Random Sample Size

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

PSYC 221 Introduction to General Psychology

Measures. David Black, Ph.D. Pediatric and Developmental. Introduction to the Principles and Practice of Clinical Research

UNIT II: RESEARCH METHODS

Princeton Invitational Disease Detectives Answer Key

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Experimental Psychology

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

Chapter 2 Methodology: How Social Psychologists Do Research

Business Statistics Probability

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

The Research Roadmap Checklist

HPS301 Exam Notes- Contents

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)

Research Methods. It is actually way more exciting than it sounds!!!!

Psych 1Chapter 2 Overview

- Triangulation - Member checks - Peer review - Researcher identity statement

Investigative Biology (Advanced Higher)

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

PMT GCE. Psychology. Advanced Subsidiary GCE Unit G541: Psychological Investigations. Mark Scheme for June Oxford Cambridge and RSA Examinations

Causal Research Design- Experimentation

Transcription:

Evaluation: Controlled Experiments Title Text 1

Outline Evaluation beyond usability tests Controlled Experiments Other Evaluation Methods 2

Evaluation Beyond Usability Tests 3

Usability Evaluation (last week) Expert tests / walkthroughs Usability Tests with users Main goal: formative identify usability problems improve the tool 4

Summative Evaluation (focus today) How good is it? Useful? Better than other tools? 5

Formative and Summative: Usually combined formative summative Evaluation over time 6

Evaluation goals (summative) Generalizability Results can be applied to other people Precision We measured what we wanted to measure (controlling factors that were not intended to study) Realism Study context is realistic... usually trade-off between them! 7

McGrath / Carpendale The selection of a research method depends on the research question and the object under study! 8

Controlled Experiments 9

Controlled experiment Or: Laboratory Experiment Lab study User Study A/B Testing (used in marketing) 10

Focus Precision Generalizability (?) Overall goal Reveal cause-effect relationships e.g. smoking causes cancer 11

Scenario A B Which is better? 12

Test it with users! Carpendale 13

Hypothesis A precise problem statement Example: H 1 = Participants will buy more beer when using variant B than variant A Null-Hypothese H 0 = no difference in beer purchase A B 14

Independent Variables Factors to be studied Typical independent variables (in HCI) Different types of design Task type: e.g., searching/browsing Participant demographics: e.g., male/female Different technologies: touch pad vs. keyboard Control of Independent Variable Levels: The number of variables in each factor Limited by the length of the study and the number of participants How different? Entire interfaces vs. very specific parts A B 15

Control Environment Make sure nothing else could cause your effect Control confounding variables Randomization! A B 16

Different Designs: Between-Subjects Divide the participants into groups, each group does one condition Randomize: Group Assignment Potential problem? Group 1 A Group 2 B 17

Different Designs: Within-Subjects Everybody does all the conditions Can account for individual differences and reduce noise (that s why it may be more powerful and requires less participants) Severely limits the number of conditions, and even types of tasks tested (may be able to workaround by having multiple sessions) Can lead to ordering effects > Randomize Order A B 18

Dependent Variable The things that you measure Performance indicators: task completion time, error rates, mouse movement (numbers of beers bought) Subjective participant feedback: satisfaction ratings, closed-ended questions, interviews questionnaires (HCI lecture last week) Observations: behaviors, signs of frustrations 19

Tasks Specifying good tasks for controlled experiments is tricky Specifically, if you are measuring performance criteria Task criteria comparability for different interfaces clear end point Example usability test: >>buy a book for a 4 year old<< controlled experiment: >>find and buy the book Doctor Faustus by Thomas Mann<< 20

Results: Application of Statistics Descriptive Statistics Describes the data you gathered (e.g. visually) Inferential Statistics Make predictions/inferences from your study to the larger population 21

Descriptive statistics Central tendency mean {1, 2, 4, 5} median {15, 19, 22, 29, 33, 45, 50} mode {12, 15, 22, 22, 22, 34, 34} 22

Descriptive statistics Central tendency mean {1, 2, 4, 5} 3 median {15, 19, 22, 29, 33, 45, 50} 29 mode {12, 15, 22, 22, 22, 34, 34} 22 23

Descriptive statistics Central tendency mean {1, 2, 4, 5} 3 median {15, 19, 22, 29, 33, 45, 50} 29 mode {12, 15, 22, 22, 22, 34, 34} 22 Measures of spread range variance standard deviation = = 24 note: for inferential standard deviation N becomes (N-1) > estimate for sampled population

Visualization of descriptive statistics e.g., Boxplot Mean 25/75% Quartiles Min / Max (alternative: with outliers) 25

Inferential statistics Goal: Generalize findings to the larger population http://www.latrobe.edu.au/psy/research/cognitive-and-developmental-psychology/esci 26

Excursus: Tragedy of the error bars CI = Confidence intervals SE = Standard Error (SD of the sampling distribution of the sample mean) SD = Standard Deviation 27

Excursus: 95% Confidence intervals USE THEM! Interpretation: We can be 95% confident that the real mean lies within our confidence interval! More intuition about stats: Seeing theory: http://students.brown.edu/seeing-theory/ 28

Null Hypothesis Testing Statistically significant results p <.05 The probability that we incorrectly reject the Null-Hypothesis (Type I error) Many different tests t-test, ANOVA, A B 29

Validity Errors: Type I: False positives Type II: False negatives External Validity Can we generalize the study? E.g. generalizable to the larger population of undergrad students Internal Validity Is there a causal relationship? Are there alternate causes? 30

Internal Validity: Storks deliver babies!? R. Matthews, Storks Deliver Babies. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001; There is a correlation coefficient of r=0.62 (reasonably high) A statistical test can be employed that shows that this correlation is in fact significant (p = 0.008) What are the flaws? 31

Pragmatically A step-by-step how-to 32

Experimental Procedure: Typical example Identify research hypothesis Specify the design of the study Think about statistics *before* you run the study Run a pilot study Recruit participants Run the actual data collection sessions Analyze the data Report the results 33

Experimental Procedure: Typical example Identify research hypothesis Specify the design of the study Think about statistics *before* you run the study Run a pilot study Recruit participants Run the actual data collection sessions Analyze the data Report the results 34

Run a pilot study to test the study design to test the system to test the study instruments 35

Recruit participants Reflecting the larger population? in the best case yes pragmatic decision though How many? Depends on effect size and study design--power of experiment Usually 15+ (per group) Note: much higher than for usability test (~5) 36

Run the actual data collection process System and instruments ready? Greet participants Introduce purpose of study and procedure or deliberately don t Don t bias: compare my interface vs. this other interface, Get consent of the participants ethics! Assign participants to specific experiment condition according to pre-defined randomization method Introduction to system(s) and/or training tasks Participants complete the actual tasks take measures of dependent variables Participants answer questionnaire (if any) Debriefing session Payment (if any). monetary, coupons, chocolate 37

Report the results Introduction / motivation Study design Results Discussion Conclusions References / Appendix See, for instance, Saul Greenberg s recommendation: http://pages.cpsc.ucalgary.ca/~saul/hci_topics/ assignments/controlled_expt/ass1_reports.html 38

Other Evaluation Methods 39

Field Studies Realism Reveal: a richer understanding by using a more holistic approach (Carpendale, 08) 40

Qualitative Methods Observation Techniques fly-on-wall techniques interruptions by observer Interview Techniques contextual? 41

Qualitative Methods as Add-on Often controlled experiment + Experimenter Observations Collecting Participants Opinions Think-Aloud Protocol (be careful!) Helpful for... Usability Improvement (cf. HCI last weeks) New insights, explanation of unforeseen results, new questions Can help to confirm results 42

Qualitative Methods as Primary Pre-design studies Rich understanding of a complex domain Problems, challenges, domain language During-, Post-design studies Case studies/ Field studies Helpful for... holistic understanding 43

Qualitative Methods as Primary In Situ Observations Participatory Observations Laboratory Observational Studies Contextual Interviews Focus Groups 44

Qualitative Challenges Sample Sizes Doing intensive studies with a lot of participants? Time? Data produced? Subjectivity Social relationship? Analyzing the data Grounded theory Open and axial coding 45

New Ways of Evaluation Mechanical Turk (more and more popular) Measuring brain activities... 46