Course summary, final remarks

Similar documents
The Scientific Method

Case studies. Course "Empirical Evaluation in Informatics" Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik

Who? What? What do you want to know? What scope of the product will you evaluate?

CSC2130: Empirical Research Methods for Software Engineering

Evidence Informed Practice Online Learning Module Glossary

Selecting a research method

What is Science 2009 What is science?

What Case Study means? Case Studies. Case Study in SE. Key Characteristics. Flexibility of Case Studies. Sources of Evidence

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

Controlled Experiments

PLANNING THE RESEARCH PROJECT

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

Evaluating you relationships

EXPERIMENTAL METHODS IN PSYCHOLINGUISTIC RESEARCH SUMMER SEMESTER 2015

KOM 5113: Communication Research Methods (First Face-2-Face Meeting)

Qualitative and Quantitative Approaches Workshop. Comm 151i San Jose State U Dr. T.M. Coopman Okay for non-commercial use with attribution

The Scientific Method

Carrying out an Empirical Project

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

LAB 7: THE SCIENTIFIC METHOD

Lab 2: The Scientific Method. Summary

The Role and Importance of Research

Doing High Quality Field Research. Kim Elsbach University of California, Davis

POLI 343 Introduction to Political Research

Audio: In this lecture we are going to address psychology as a science. Slide #2

The Regression-Discontinuity Design

Applying GRADE-CERQual to qualitative evidence synthesis findings paper 4: how to assess coherence

Inferences: What inferences about the hypotheses and questions can be made based on the results?

Experimental Research in HCI. Alma Leora Culén University of Oslo, Department of Informatics, Design

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

What Constitutes a Good Contribution to the Literature (Body of Knowledge)?

Guiding Questions

Conducting Research in the Social Sciences. Rick Balkin, Ph.D., LPC-S, NCC

Chapter 13 Summary Experiments and Observational Studies

Clinical problems and choice of study designs

2 Critical thinking guidelines

Answers to end of chapter questions

Is Web Eye-Tracking Good Enough?

A to Z OF RESEARCH METHODS AND TERMS APPLICABLE WITHIN SOCIAL SCIENCE RESEARCH

Anamnesis via the Internet - Prospects and Pilot Results

Psych 1Chapter 2 Overview

GCSE PSYCHOLOGY UNIT 2 FURTHER RESEARCH METHODS

Chapter 13. Experiments and Observational Studies. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Analysis of complex patterns of evidence in legal cases: Wigmore charts vs. Bayesian networks

Endogeneity is a fancy word for a simple problem. So fancy, in fact, that the Microsoft Word spell-checker does not recognize it.

Economics 2010a. Fall Lecture 11. Edward L. Glaeser

Choosing an Approach for a Quantitative Dissertation: Strategies for Various Variable Types

Final Exam: PSYC 300. Multiple Choice Items (1 point each)

It is crucial to follow specific steps when conducting a research.

Chapter 1. Research : A way of thinking

Empirical Research Methods for Human-Computer Interaction. I. Scott MacKenzie Steven J. Castellucci

Guidelines for Incorporating & Strengthening Perspective-Taking & Self-Authorship into Division of Student Life Programs

SOCIOLOGICAL RESEARCH

Chapter 1. Research : A way of thinking

UNIT 5 - Association Causation, Effect Modification and Validity

THE TRUST EDGE. TRUST is. THE TRUST EDGE is the gained when others confidently believe in you.

Speak Out! Sam Trychin, Ph.D. Copyright 1990, Revised Edition, Another Book in the Living With Hearing Loss series

Correlation vs. Causation - and What Are the Implications for Our Project? By Michael Reames and Gabriel Kemeny

Welcome to next lecture in the class. During this week we will introduce the concepts of risk and hazard analysis and go over the processes that

Principles of Sociology

Status of Empirical Research in Software Engineering

Behavioral Interventions The TEAMcare Approach. Bernadette G. Overstreet BSH Tatiana E. Ramirez DDS., MBA Health Educators Project Turning Point

A supported model of decisionmaking:

BIOLOGY. The range and suitability of the work submitted

Presentation Preparation

382C Empirical Studies in Software Engineering. Lecture 1. Introduction. Dewayne E Perry ENS present, Dewayne E Perry

PHO MetaQAT Guide. Critical appraisal in public health. PHO Meta-tool for quality appraisal

Is a Mediterranean diet best for preventing heart disease?

One slide on research question Literature review: structured; holes you will fill in Your research design

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

Following is a list of topics in this paper:

Causal inference nuts and bolts

MODULE 3 APPRAISING EVIDENCE. Evidence-Informed Policy Making Training

The Conference That Counts! March, 2018

Chapter Three: Sampling Methods

Introduction to Applied Research in Economics Kamiljon T. Akramov, Ph.D. IFPRI, Washington, DC, USA

Introduction to Research Methods

Why do Psychologists Perform Research?

Self-Esteem Discussion Points

Arts Administrators and Healthcare Providers

SINGLE-CASE RESEARCH. Relevant History. Relevant History 1/9/2018

Climate-Related Decision Making Under Uncertainty. Chris Weaver February 2016

M2. Positivist Methods

Me 9. Time. When It's. for a. By Robert Tannenboum./ Industrial'Relations

In this chapter we discuss validity issues for quantitative research and for qualitative research.

Study on perceptually-based fitting line-segments

THEORY OF CHANGE FOR FUNDERS

Lecture 5 Conducting Interviews and Focus Groups

Mastering Emotions. 1. Physiology

Getting the Design Right Daniel Luna, Mackenzie Miller, Saloni Parikh, Ben Tebbs

Reliability and Validity checks S-005

Perfectionism and mindset

Options in HIV Prevention A Participant-Centered Counseling Approach

Step 2 Challenging negative thoughts "Weeding"

Bradford Hill Criteria for Causal Inference Based on a presentation at the 2015 ANZEA Conference

DOING SOCIOLOGICAL RESEARCH C H A P T E R 3

Fitting the Method to the Question

Nature and significance of the local problem

Radon: A Behavioural as well as a Technical & Health Challenge.

Top Ten Things to Know About Motivational Interviewing

Transcription:

Course "Empirical Evaluation in Informatics" Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http://www.inf.fu-berlin.de/inst/ag-se/ Role of empiricism Generic method Concrete methods: Benchmarking Controlled experiment Quasi-experiment Survey Case study Course summary, final remarks Data analysis Presenting the results Checking study quality What we have not discussed 1/ 29

"Empirische Bewertung in der Informatik" Zusammenfassung, Nachbemerkungen Prof. Dr. Lutz Prechelt Freie Universität Berlin, Institut für Informatik http://www.inf.fu-berlin.de/inst/ag-se/ Rolle von Empirie Allgemeines Vorgehen Konkrete Methoden: Benchmarking Kontrolliertes Experiment Quasi-Experiment Umfrage Fallstudie Datenanalyse Präsentation Qualitätsprüfung Was wir nicht besprochen haben 2/ 29

The role of empiricism Activities in Informatics research or development occur in either of three modes: T Theory: Creating formal systems involving terminology and rules C Construction: Creating software systems E Empiricism: Learning about the characteristics of systems or processes Empirical methods can be applied to support both T and C Empirical methods can be applied to both the inputs and the outputs of T or C: validating or quantifying assumptions evaluating results 3/ 29

The role of empiricism (2) Roles of empiricism: Validation: Empirically testing whether a theory is correct by testing a hypothesis derived from the theory Quantification: Empirically providing quantitative information about phenomena that are qualitatively known by measuring phenomena Exploration: Providing insight in order to create ideas for forming theories or constructing systems e.g. exploratory case studies, usability studies, benchmarking Refinement: Adding detail to theories or models a form of exploration; e.g. via surveys and case studies 4/ 29

Quality criteria for empirical work In order to have impact, empirical work must be taken seriously by its audience In order to take it seriously, the audience must believe in its usefulness Usefulness is bounded by two phenomena: Internal validity: The results are correct as they stand External validity: The results are applicable to other contexts than those observed The audience's willingness to believe in validity (and thus take the work seriously) is described by: Credibility: A reasonable degree of internal and external validty is reasonably obvious; the conclusions are warranted Relevance: The results appear applicable to some situations of real interest 5/ 29

Generic method How to conduct an empirical study: 1. Decide on ultimate goal 2. Formulate question for the study 3. Characterize the observations sought 4. Design the study 5. Find or create the observation context 6. Observe 7. Analyze observations 8. Interpret results Requirements Design & Impl. Use Credibility can be ruined in any step Relevance can be ruined in any of 1-5 6/ 29

Hints: How to make a good study Each step is difficult and has potential for disaster but mistakes in later steps are more easily repaired so make sure you have a really good question Just like for software development, a waterfall model is often not a good approach to performing a study Iterate all the design phases if you possibly can Prototyping is almost always helpful to get a really good study Attacking the same question with more than one concrete empirical method increases the chances of meaningful and credible results. Example: Deciding for/against using method X: Check X by a small-scale controlled experiment first, then perform a medium-scale case study ("pilot project") plus a broad-scale survey "multi-method research" 7/ 29

Concrete methods What we have talked about in some depth: Benchmarking Controlled experiment Quasi-experiment Survey Case study What we have talked about shortly: Simulation studies Literature studies Analysis of legacy data ("software archeology") 8/ 29

Criteria for method selection Criteria and illustrative negative examples: Practical feasibility e.g. a controlled experiment comparing risk management methods Size of potential contribution to research goal e.g. a survey on issues of the subconscious Potential for answering a relevant question successfully e.g. a quasi-experiment on the project-level impact of improved compiler error messages Expected cost/benefit ratio e.g. expensive experiments that do not generalize well The empiricist's skill with the method e.g. doing a case study without ever having practiced qualitative research 9/ 29

Benchmarking Description: Measure performance of a system (or method) in a standardized way; collect many results Advantages: Objective and repeatable; high credibility Results easy to understand Supports accumulation of results over many studies Disadvantages: Not practically feasible in many fields Requires a shared previous understanding of the performance criteria Obtaining high relevance requires much work 10 / 29

Controlled experiment Description: Change one thing, keep everything else constant, observe what happens Advantages: The only method for proving causal relationships High credibility (if done well) Supports strong quantitative statistical analysis Results easy to interpret Disadvantages: Usually rather costly Does not scale to large-scale, human-related questions Difficult to generalize, hence relevance is dubious 11 / 29

Quasi-experiment Description: Change one thing, keep everything else as constant as possible, observe what happens Advantages: Can have very good cost/benefit ratio Reasonably high credibility (if done well) Disadvantages: Opportunistic model Can often not be designed as necessary Can be difficult to argue why credibility is good Often difficult to generalize, hence relevance is often dubious 12 / 29

Survey Description: Ask many people what you want to know Advantages: Cheap Very flexible method Disadvantages: Subjective; validating correctness of answers is difficult Hard-to-resolve credibility problems (at least for many kinds of questions) Results are almost always ambiguous 13 / 29

Case study Description: Observe something specific as it happens and broadly include as many information sources as possible Advantages: Very rich results Extremely credible (if done well) Disadvantages: Difficult method, requires many skills Generalizing any results is difficult; hence relevance is often hard to judge 14 / 29

Simulation Description: Create and run an executable model of something; tweak parameters; observe Advantages: Can investigate questions that are otherwise unfeasible Flexible, cheap, yet credible and relevant (if done well) Disadvantages: Difficult to validate correctness of the model 15 / 29

Literature study (meta study) Description: Review and analyze the data and/or results of several published studies together In particular: Combined statistical analysis of multiple experiments ("meta analysis") Advantages: May obtain results not possible with any one study May have high robustness, hence good credibility Disadvantages: Limitations of the given reports can not be overcome Biased by non-publication of "uninteresting" results 16 / 29

Studies of legacy data ("software archeology") Description: Analyze data gathered in some pre-existing process Advantages: Perhaps large amount of data Low cost Disadvantages: Limitations of the data can not be overcome Data may be biased in difficult-to-detect ways 17 / 29

Data analysis Process of turning raw data (as collected) into results data that directly allows drawing conclusions by exploring by measuring by comparing by modeling Data analysis steps: Make data available Collect, collate, reformat, pre-process, read Validate data Find and correct gaps, mistakes, and inconcistencies Explore data Check for expected and unexpected coarse characteristics Perform analysis: measure, model, or compare 18 / 29

Data validation advice Be very sceptical: Have some redundancy in your data Check redundancy e.g. invariants, impossible combinations etc. Double-check manually entered data Check expectations e.g. counts, frequencies, ranges, limits, etc. Mistrust unexpected regularities Mistrust unexpected irregularities Mistrust outliers Mistrust data anywhere near where you found an error 19 / 29

Data exploration advice Use your common sense as much as possible! Make sure you understand what your variables really mean Formulate your expectation before you look at the data Graphics! Graphics! Graphics! Try out many things Explain to outsiders what the data are Ask outsiders what they think the data mean Ask outsiders for ideas what to analyze 20 / 29

Data analysis advice Stick to techniques you understand Make sure you know (and respect) the assumptions of the techniques you use If you need to think hard about what the result would mean, this is not an appropriate analysis Graphics! Graphics! Graphics! Credibility Validity Illustrativeness is much more important than precision is much more important than precision is much more important than precision Get professional help if you can 21 / 29

Conclusion-drawing advice Checklist: Do all your conclusions really contribute to answering the research question? If not, are they really worth mentioning? At least separate them from the others ("Further results") Are all your conclusions solidly backed up by your data? If not, formulate them very weakly Can all conclusions easily be traced backwards through the study: back to the analysis results from there to the analysis technique from there to the raw data from there to the study design and setup? 22 / 29

Conclusion-drawing advice (2) Do you really trust all your conclusions? If not, why should anybody else? Can you characterize to where you presume your conclusions generalize? And why you think so? (plausibility, evidence) 23 / 29

Presentation of results A good writeup (article, technical report) or interactive presentation of an empirical study makes the elements of the generic method clearly visible 1. Decide on ultimate goal 2. Formulate question for the study 3. Characterize the observations sought 4. Design the study 5. Find or create the observation context 6. Observe 7. Analyze observations 8. Interpret results provides much detail about setup and raw data perhaps in appendices 24 / 29

Presentation of results (2) uses plain, simple language wherever possible presents the data analysis in an easy-to-grasp manner using graphical presentation whenever appropriate openly discusses strengths and weaknesses of the study threats to internal validity threats to external validity lists newly found open questions summarizes the results in the Abstract rather than just announcing them Advice: Prepare a rather long, detailed, comprehensive report first then a short version, focused on the most interesting parts. 25 / 29

Quick-check for empirical study quality The quality of an empirical study is determined by its credibility and its relevance If they are high, many deficiencies can be tolerated If they are low, technical perfection does not help Good studies (which are not too common) can often be recognized quickly by these simple checks Is there a clear research question at the beginning? Is there a clear study result at the end? Note this does not mean a clear answer. A good study may well be inconclusive. Can the result easily be traced back to the data analysis result(s)? Is the connection from analysis results to conclusion convincing? Most bad studies are clearly bad in at least one of these aspects 26 / 29

Things not covered in this course What we have not (or almost not) talked about: Plenty of methodological details of the individual methods, e.g. experiments: design of experiments when manipulating more than one variable surveys: instrument development case studies: annotating and analyzing qualitative data Practical technical issues of the methods, e.g. measurement infrastructure calibration and validation of measurements data handling and archiving ethical considerations (e.g. privacy, copyright, informed consent) and more 27 / 29

So what? Where will you apply empirical methods? Note: Most diploma/master's theses can benefit a lot from a good empirical evaluation In fact, most would be essentially worthless without one In fact, many are worthless for that reason And many of these would never even have been done, had an evaluation been planned from the start Please consider this when you choose a thesis topic 28 / 29

Thank you! 29 / 29