Cognitive Modeling. Empirical Research Methods. Ute Schmid. Kognitive Systeme, Angewandte Informatik, Universität Bamberg

Similar documents
CogSysIII Lecture 4/5: Empirical Evaluation of Software-Systems

CogSysIII Lecture 4/5: Empirical Evaluation of Software-Systems

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Psychology Research Process

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

9 research designs likely for PSYC 2100

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

The following are questions that students had difficulty with on the first three exams.

investigate. educate. inform.

Designing Psychology Experiments: Data Analysis and Presentation

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Still important ideas

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Research Landscape. Qualitative = Constructivist approach. Quantitative = Positivist/post-positivist approach Mixed methods = Pragmatist approach

HPS301 Exam Notes- Contents

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

Business Statistics Probability

Chapter 7. Marketing Experimental Research. Business Research Methods Verónica Rosendo Ríos Enrique Pérez del Campo Marketing Research

EXPERIMENTAL RESEARCH DESIGNS

26:010:557 / 26:620:557 Social Science Research Methods

Experimental Research. Types of Group Comparison Research. Types of Group Comparison Research. Stephen E. Brock, Ph.D.

Designing Psychology Experiments: Data Analysis and Presentation

Georgina Salas. Topics EDCI Intro to Research Dr. A.J. Herrera

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Sample Exam Questions Psychology 3201 Exam 1

Empirical Research Methods for Human-Computer Interaction. I. Scott MacKenzie Steven J. Castellucci

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Evaluation: Controlled Experiments. Title Text

Collecting & Making Sense of

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Understanding and serving users

Experimental Research I. Quiz/Review 7/6/2011

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Research Approach & Design. Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

Study Guide for the Final Exam

RESEARCH METHODS. Winfred, research methods,

Samples, Sample Size And Sample Error. Research Methodology. How Big Is Big? Estimating Sample Size. Variables. Variables 2/25/2018

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

Measuring the User Experience

Experimental and Quasi-Experimental designs

Analysis A step in the research process that involves describing and then making inferences based on a set of data.

RESEARCH METHODS. A Process of Inquiry. tm HarperCollinsPublishers ANTHONY M. GRAZIANO MICHAEL L RAULIN

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

AS Psychology Curriculum Plan & Scheme of work

CHAPTER VI RESEARCH METHODOLOGY

UNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS6813 MODULE TEST 2

Experimental Research in HCI. Alma Leora Culén University of Oslo, Department of Informatics, Design

One slide on research question Literature review: structured; holes you will fill in Your research design

PRINCIPLES OF STATISTICS

Designing Experiments... Or how many times and ways can I screw that up?!?

Classical Psychophysical Methods (cont.)

Political Science 15, Winter 2014 Final Review

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Psychology Research Process

HOW STATISTICS IMPACT PHARMACY PRACTICE?

Experimental Design Part II

OBSERVATION METHODS: EXPERIMENTS

Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies

Lecture 3. Previous lecture. Learning outcomes of lecture 3. Today. Trustworthiness in Fixed Design Research. Class question - Causality

Measures of Dispersion. Range. Variance. Standard deviation. Measures of Relationship. Range. Variance. Standard deviation.

Experimental Design. Dewayne E Perry ENS C Empirical Studies in Software Engineering Lecture 8

Who? What? What do you want to know? What scope of the product will you evaluate?

Review and Wrap-up! ESP 178 Applied Research Methods Calvin Thigpen 3/14/17 Adapted from presentation by Prof. Susan Handy

Collecting & Making Sense of

Last week: Introduction, types of empirical studies. Also: scribe assignments, project details

Types of questions. You need to know. Short question. Short question. Measurement Scale: Ordinal Scale

Introduction to Statistical Data Analysis I

Previous Example. New. Tradition

Experimental Research

UNIT III: Research Design. In designing a needs assessment the first thing to consider is level for the assessment

Topic #2. A key criterion in evaluating any test, measure, or piece of research is validity.

Formulating Research Questions and Designing Studies. Research Series Session I January 4, 2017

The essential focus of an experiment is to show that variance can be produced in a DV by manipulation of an IV.

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

Research Methods. for Business. A Skill'Building Approach SEVENTH EDITION. Uma Sekaran. and. Roger Bougie

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Chapter 11. Experimental Design: One-Way Independent Samples Design

POLS 5377 Scope & Method of Political Science. Correlation within SPSS. Key Questions: How to compute and interpret the following measures in SPSS

Evaluation: Scientific Studies. Title Text

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Controlled Experiments

Topic #6. Quasi-experimental designs are research studies in which participants are selected for different conditions from pre-existing groups.

Lecture 4: Research Approaches

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

DEPENDENT VARIABLE. TEST UNITS - subjects or entities whose response to the experimental treatment are measured or observed. Dependent Variable

Communication Research Practice Questions

CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN. This chapter contains explanations that become a basic knowledge to create a good

Evaluation of CBT for increasing threat detection performance in X-ray screening

Measures. David Black, Ph.D. Pediatric and Developmental. Introduction to the Principles and Practice of Clinical Research

SINGLE-CASE RESEARCH. Relevant History. Relevant History 1/9/2018

VALIDITY OF QUANTITATIVE RESEARCH

SECTION A. You are advised to spend at least 5 minutes reading the information provided.

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Transcription:

Cognitive Modeling Empirical Research Methods Ute Schmid Kognitive Systeme, Angewandte Informatik, Universität Bamberg last change: 30. Oktober 2013 U. Schmid (CogSys) KogMod-EmpRes 1 / 45

Why and When Empirical Methods (in Computer Science)? Analysis of (heuristic) algorithms, e.g. performance comparisons, if formal evaluation not feasible: is the run time of algorithm A significantly faster than algorithm B on a set of sample problems? User-centred approach to design of interactive systems: design based on real data not on imagination abour users: collecting small sets of data from observations/interviews during the development process; posthoc usability evaluations Psychonic : exploring human problem solving strategies as inspiration for algorithm design Cognitive Modeling: deriving hypotheses from cognitive models and testing them U. Schmid (CogSys) KogMod-EmpRes 2 / 45

Empirical Research Methods Research Methods: soft skills How to generate a operational hypothesis e.g., x 1 < x 2 Number of errors in test group (with help menu) is significantly smaller than number of errors in control group (without help menu) starting point: a verbally described hypothesis (derived from some theory of human information processing) Take care about scale niveau of data, select appropriate test Present results of inference statistics in combination with descriptive charts Knowledge about statistics necessary but: focus on the general procedure of empirical research see: Jürgen Bortz, Lehrbuch der empirischen Sozialforschung U. Schmid (CogSys) KogMod-EmpRes 3 / 45

Kinds of Empirical Studies Case-study vs. random sample Look at a small number of people in detail (typically qualitative data from log-files, other protocolls) Use a representative sample of users (typically in experimental settings) Single shot vs. longitudinal general characteristics vs. learnability questions special problems of longitudinal studies Regression to the mean In natural setting or in laboratory question of external validity Quasi-experimental or experimental U. Schmid (CogSys) KogMod-EmpRes 4 / 45

What is an Experiment? Wilhelm Wundt (1886): Das Experiment besteht in einer Beobachtung, die sich mit der willkürlichen Einwirkung des Beobachers auf die Entstehung und den Verlauf der zu beobachtenden Erscheinung verbindet. (1908) additionally: Wiederholbarkeit and Variierbarkeit Willkürlichkeit: assign a treatment (independent variable) to a probe/subject Wiederholbarkeit: experiment can be performed at another time (another place, with other probes/subjects) leading to the same results (dependent variable) Variierbarkeit: treatment can be assigned deliberatly and systematically U. Schmid (CogSys) KogMod-EmpRes 5 / 45

Example Word superiority effect (Wheeler, 1970) Treatment: Subjects are randomly assigned to one of the following conditions: (a) short presentation of a letter (e.g. d ), (b) presentation of a word (e.g. word ) Afterwards: recognition (a) either d or k, (b) either word or work Dependent variable: recognition errors Result, significantly less errors in condition (b) U. Schmid (CogSys) KogMod-EmpRes 6 / 45

Quasi-Experiment Some characteristics are invariably correlated with the subject (no random assignment of treatment) Examples: gender, intelligence quotient, color of a person Example: Patients in single bed rooms vs. multi-bed hospital rooms Dependent variable: days until discharge Problem: unrecognized factors correlated with the dependet variable (single rooms are more sunny, doctors take more time with patients in single rooms etc.) U. Schmid (CogSys) KogMod-EmpRes 7 / 45

Experiment and Causal Explanation If a treatment can be assigned randomly to subjects, we can assume that all other factors which might influence the outcome of the experiment are distributed randomly over the subjects In experiments, we can test hypotheses of effects (differences between different treatments, typically including a control group without treatment) Hypotheses about correlations do not allow causal explanations, they only allow statements about the kind and intensity of the co-variation of variables! U. Schmid (CogSys) KogMod-EmpRes 8 / 45

Correlation and Causal Explanation Possible causal explanations for a correlation: x y x y x y x z y x y x z y z w U. Schmid (CogSys) KogMod-EmpRes 9 / 45

Internal/External Validity Internal validity: no (not many) alternative explanations for results (effect can be ascribed to treatment) External validity: Generalizability of results from the experimental setting to a larger domain (necessary: representativeness of sample) U. Schmid (CogSys) KogMod-EmpRes 10 / 45

Example: Training Often randomization not possible (school classes, trainees) Therefore: Measurement of baseline (before, O 1 ) and treatment effect (after, O 2 ) U. Schmid (CogSys) KogMod-EmpRes 11 / 45

Factors Jeopardizing Internal Validity History: events occuring between O 1 and O 2 in addition to treatment Maturation: growing older, hungrier, more tired Testing: effect of O 1 on O 2 Instrumentation: changes in calibration of a measuring instrument (e.g. bias of raters) Statistical regression (when groups have been selected on basis of extreme scores) Selection biases (selection of subjects for groups) Experimental mortality: differential loss of subjects U. Schmid (CogSys) KogMod-EmpRes 12 / 45

Factors Jeopardizing External Validity Reactive effect of testing: pretest increases/decreases sensitivity/responsiveness to treatment Reactive effects to the experimental setting Multiple-treatment interference U. Schmid (CogSys) KogMod-EmpRes 13 / 45

Experimental Designs Classical experiment: random assignement of subjects to experimental group(s) (with treatment) and control group (without treatment); high internal validity X O - O One-shot case study: bad internal and external validity X O Pretest-Posttest Control Group Design: high internal validity O X O O - O U. Schmid (CogSys) KogMod-EmpRes 14 / 45

Experimental Designs Solomon Four-Group Design: high internal and good external validity O X O O - O - X O - - O U. Schmid (CogSys) KogMod-EmpRes 15 / 45

Number of Factors One factor: e.g. with/without training hypothesis: effect of treatment on dependent variable Two factors: e.g. gender and training two main effects and one interaction effect d.v. d.v. A1 A2 A1 A2 d.v. d.v. A1 A2 A1 A2 U. Schmid (CogSys) KogMod-EmpRes 16 / 45

Types of Statistics Descriptive Statistics: Means and standard deviations of data (kind of measure depends on scale niveau, e.g. median vs. arithmetic middle) Inference Statistics: Generalize from sample to population with a certain amount of error (e.g. analysis of variance, chi-square,...) Data Aggregation: cluster analysis, multidimensional scaling, principal component analysis U. Schmid (CogSys) KogMod-EmpRes 17 / 45

Presentation of Results Example: RT ms 1750 1500 1250 1000 not neg neg ANOVA F (1, 31) = 17.09, p <.01 U. Schmid (CogSys) KogMod-EmpRes 18 / 45

Testing of Hypotheses about Differences general case: analyses of variance (ANOVA) special case: two independent treatments, t-test Null-Hypothesis (no effect of treatment) vs. alternative hypothesis (effect of treatment) Inference statistics: tests whether the result can occur if it is assumed that the null-hypothesis holds in the population (from which we tested a sample), tests whether differences in the mean of the dependent variable are significant Significance: with an error (typically 5 or 1 percent) we can conclude that the null hypothesis does not hold U. Schmid (CogSys) KogMod-EmpRes 19 / 45

Accept/Reject Hypothesis H 1 : x 1 > x 2 H 0 : x 1 x 2 = 0 U. Schmid (CogSys) KogMod-EmpRes 20 / 45

Two Kinds of Errors In population H 0 H 1 In sample H 0 correct dec. β error H 1 α error correct dec. U. Schmid (CogSys) KogMod-EmpRes 21 / 45

t-test for Independent Samples H 0 : the means of two populations which are normally distributed (Gauss) and with homogenuous variances are not different (µ 1 = µ 2 ) Specific (directed) alternative hypothesis: µ 1 > µ 2 If samples of size n are drawn from a normally-distributed population, the sample means are t-distributed (with n 1 degrees of freedom because n 1 means determine the last mean) t = x 1 x 2 (µ 1 µ 2 ) ˆσ x1 x 2 U. Schmid (CogSys) KogMod-EmpRes 22 / 45

t-test cont. Because null hypothesis µ 1 µ 2 = 0 t = x 1 x 2 ˆσ x1 x 2 Because population variance is unknown, it is estimated from the sample ˆσ x1 x 2 = n1 i=1 (x i1 x 1 ) 2 + n 2 i=1 (x i2 x 1 ) 2 ( 1 + 1 ) (n 1 1) + (n 2 1) n 1 n 2 U. Schmid (CogSys) KogMod-EmpRes 23 / 45

Analysis of Variance More general than t-test For different designs one factor with arbitrary number of groups e.g. factor with 4 groups source of noise: sea and wind, street, construction work, airport multi-factorial with repeated measurement F-distribution: H 0 : no difference in the means, i.e. variance of means is zero H 0 : µ 0 = µ 1 = µ p H 0 : σ µ = 0 H 1 : σ µ = c Degrees of freedom (df) for one factor: p 1 U. Schmid (CogSys) KogMod-EmpRes 24 / 45

Multifactorial Analysis of Variance 5 4 3 2 1 male female E.g., two-factorial: factor A with p steps, factor B with q steps Interaction: A B Example (quasi-experimental): Dependent variable: attitude to a new law for parental leave (rating 5 to +5) Factor A Political orientation (3 groups), Factor B Gender (2 groups) X Interaction: gender influence differs for the different political groups 0 1 G1 G2 G3 X 2 X 3 4 5 U. Schmid (CogSys) KogMod-EmpRes 25 / 45

Chi-Square Tests For categorial data e.g. two independent variables H 0 : An attribute with k values is independent of an attribute with l values Example: Frequency of depression and schizophrenia is distributed differently in different social groups U. Schmid (CogSys) KogMod-EmpRes 26 / 45

Which Kind of Statistical Test Scale niveau of data Distribution of measurements Sample Size U. Schmid (CogSys) KogMod-EmpRes 27 / 45

Scale Niveau Nominal Ordinal: rank order Intervall: determined up to shift of zero and scaling αx + β Rational: determined zero Absolute U. Schmid (CogSys) KogMod-EmpRes 28 / 45

Experimental Ethics Trade off between scientific progress and human dignity (e.g., threshold of pain, learned helplessness,...) Obligation to inform subjects Free decision to participate/quit participation Anonymity of results Experiments in social psychology (e.g. Milgram) vs. in cognitive psychology U. Schmid (CogSys) KogMod-EmpRes 29 / 45

Report of Empirical Studies Highly standardized General structure Introduction Theoretical Background Experiment/Study General Discussion U. Schmid (CogSys) KogMod-EmpRes 30 / 45

Introduction Section name: Introduction Give a short motivation for your study What is the general empirical question? Relevance of the question Need for the study/experiment Advanced organizer for rest of paper U. Schmid (CogSys) KogMod-EmpRes 31 / 45

Theoretical Background Section-name: specific for your study E.g.: Color Perception and Interpretation of Diagrams Describe state of research related to your study Argue why a (further) study is needed to give more precise information Formulate the hypothesis in a general way and explain how you derived this hypothesis U. Schmid (CogSys) KogMod-EmpRes 32 / 45

Experiment Section name: Experiment or Empirical Study Subsections Design (operational hypothesis and resulting design: independent and dependent variables) Method Participants Material Procedure Results Discussion U. Schmid (CogSys) KogMod-EmpRes 33 / 45

Participants The study was conducted in May 2004 with students from Bamberg university. 42 students participated in the study (23 male, 19 female, average age 21.3, sd = 2.4). 2 subjects were excluded from the analysis because of missing values. U. Schmid (CogSys) KogMod-EmpRes 34 / 45

Material Four charts were drawn with XX (see fig. 1 to 4). All charts depicted the same imaginary technical device, called Megafux. Two charts represented temperatur distribution and two charts distribution of frequencies (factor: content type). For each content type, factor representation type was varied such that one chart color coded the distributed value directly on the device and the other chart gave values in a line graph. For each chart, the subjects had to answer one question The temperature/frequence is higher on the upper part than on the lower part of the Megaflux: yes/no. The information was presented in a webbrowser via http. The dependent variables were obtained via keystrokes using php. U. Schmid (CogSys) KogMod-EmpRes 35 / 45

Procedure Subjects were tested individually. One experminental session took about five minutes time. First subjects read a webpage giving general instructions. Afterwards, the question was presented. The subject clicked the OK button if he/she was sure that he/she understood the question. Then a chart appeared. The subjects now clicked either yes or no and the answer together with the time from presentation of the chart until mouse click were stored. U. Schmid (CogSys) KogMod-EmpRes 36 / 45

Results Give descriptive statistics, such as: overall number of correct answers in percent, overall mean and standard deviation of times. Present results for hypothesis in a bar or line graph and give the results of the statistical test If you have controlled some possible extraneous variables, such as position of yes/no button, (gender, age), report their influence U. Schmid (CogSys) KogMod-EmpRes 37 / 45

Discussion Only now relate the findings to your hypothesis Is your hypothesis confirmed by the results? Are the alternative explanations for your result? If your hypothesis could be not confirmed, have you hypotheses about additional factors which have influenced the results? a more specific hypothesis? U. Schmid (CogSys) KogMod-EmpRes 38 / 45

General Discussion Give a short summary of your hypothesis and the main result What are the conclusions of the results (e.g. implication for information presentation) Which follow-up studies/experiments should be conducted? U. Schmid (CogSys) KogMod-EmpRes 39 / 45

User Study Methods Remember: user-centered approach to design Different data collection methods Behavioral data: solution times, errors (high quality) Subjective data: extracted from from interviews, observations, questionnaires U. Schmid (CogSys) KogMod-EmpRes 40 / 45

Interviews Can/should be conducted very early in design (no prototype available) unstructured (if you have really no idea) vs. structured Points to be covered Explain the purpose of the interview Enumerate activities to be supported e.g., form general questions to more specific questions Explore work methods (hardest part, may be difficult to understand for an outsider) Tracing interconnections uncover performance issues U. Schmid (CogSys) KogMod-EmpRes 41 / 45

Analysis of Interview Data Recording of interviews get transcribed Extract categories for the different aspects of users activities There are some software tools for interview analysis http://www.qualitative-research.net If necessary: try to make analysis more objective by using different raters which assign text segments to categories and evaluate interrater reliability U. Schmid (CogSys) KogMod-EmpRes 42 / 45

Observations Best used to observe task performance of users Best not interrupt users activities (non-reactive approach) Data collection via Video Logfiles Thinking aloud U. Schmid (CogSys) KogMod-EmpRes 43 / 45

Data from Observations As in interviews: qualitative data Be aware of Hawthorne effect: workers productivity might increase simply as response to being studied Most important is a careful design of tasks to be studied (representative, varying from routine to special,...) Usability criteria should be translated into dependent variables! U. Schmid (CogSys) KogMod-EmpRes 44 / 45

Questionnairs Best used to assess additional aspects of system use which cannot be derived more directly E.g.: Subjective impressions as satisfaction, feeling in control Questionaire design issues Items should be unambiguous Answer modalities: yes/no, multiple choice, rating scales, open answers Test theoretical questions: reliability, validity For tests which are developped for a broader application! U. Schmid (CogSys) KogMod-EmpRes 45 / 45