Knowledge discovery tools 381

Size: px
Start display at page:

Download "Knowledge discovery tools 381"

Transcription

1 Knowledge discovery tools 381 hours, and prime time is prime time precisely because more people tend to watch television at that time.. Compare histograms from di erent periods of time. Changes in histogram patterns from one time period to the next can be very useful in nding ways to improve the process.. Stratify the data by plotting separate histograms for di erent sources of data. For example, with the rod diameter histogram we might want to plot separate histograms for shafts made from di erent vendors materials or made by di erent operators or machines. This can sometimes reveal things that even control charts don t detect. Exploratory data analysis Data analysis can be divided into two broad phases: an exploratory phase and a confirmatory phase. Data analysis can be thought of as detective work. Before the trial one must collect evidence and examine it thoroughly. One must have a basis for developing a theory of cause and effect. Is there a gap in the data? Are there patterns that suggest some mechanism? Or, are there patterns that are simply mysterious (e.g., are all of the numbers even or odd)? Do outliers occur? Are there patterns in the variation of the data? What are the shapes of the distributions? This activity is known as exploratory data analysis (EDA). Tukey s 1977 book with this title elevated this task to acceptability among serious devotees of statistics. Four themes appear repeatedly throughout EDA: resistance, residuals, reexpression, and visual display. Resistance refers to the insensitivity of a method to a small change in the data. If a small amount of the data is contaminated, the method shouldn t produce dramatically different results. Residuals are what remain after removing the effect of a model or a summary. For example, one might subtract the mean from each value, or look at deviations about a regression line. Re-expression involves examination of different scales on which the data are displayed. Tukeyp focused most of his attention on simple power transformations such as y ¼ ffiffi x, y ¼ x 2, y ¼ 1=x. Visual display helps the analyst examine the data graphically to grasp regularities and peculiarities in the data. EDA is based on a simple basic premise: it is important to understand what you can do before you learn to measure how well you seem to have done it (Tukey, 1977). The objective is to investigate the appearance of the data, not to confirm some prior hypothesis. While there are a large number of EDA methods and techniques, there are two which are commonly encountered in Six Sigma work: stem-and-leaf plots and boxplots. These techniques are commonly included in most statistics packages. (SPSS was used to create the figures used

2 382 KNOWLEDGE DISCOVERY in this book.) However, the graphics of EDA are simple enough to be done easily by hand. STEM-AND-LEAF PLOTS Stem-and-leaf plots are a variation of histograms and are especially useful for smaller data sets (n<200). A major advantage of stem-and-leaf plots over the histogram is that the raw data values are preserved, sometimes completely and sometimes only partially. There is a loss of information in the histogram because the histogram reduces the data by grouping several values into a single cell. Figure is a stem-and-leaf plot of diastolic blood pressures. As in a histogram, the length of each row corresponds to the number of cases that fall into a particular interval. However, a stem-and-leaf plot represents each case with a numeric value that corresponds to the actual observed value. This is done by dividing observed values into two componentsöthe leading digit or digits, called the stem, and the trailing digit, called the leaf. For example, the value 75 has a stem of 7 and a leaf of 5. Figure Stem-and-leaf plot of diastolic blood pressures. From SPSS for W ndows Base System User s Guide, p Copyright # Used by permission of the publisher, SPSS, Inc., Chicago, IL.

3 Knowledge discovery tools 383 In this example, each stem is divided into two rows. The first row of each pair has cases with leaves of 0 through 4, while the second row has cases with leaves of 5 through 9. Consider the two rows that correspond to the stem of 11. In the first row, we can see that there are four cases with diastolic blood pressure of 110 and one case with a reading of 113. In the second row, there are two cases with a value of 115 and one case each with a value of 117, 118, and 119. The last row of the stem-and-leaf plot is for cases with extreme values (values far removed from the rest). In this row, the actual values are displayed in parentheses. In the frequency column, we see that there are four extreme cases. Their values are 125, 133, and 160. Only distinct values are listed. When there are few stems, it is sometimes useful to subdivide each stem even further. Consider Figure a stem-and-leaf plot of cholesterol levels. In this figure, stems 2 and 3 are divided into five parts, each representing two leaf values. The first row, designated by an asterisk, is for leaves of 0 and 1; the next, designated by t, is for leaves of 2 s and 3 s; the third, designated by f, is for leaves of 4 s and 5 s; the fourth, designated by s, is for leaves of 6 s and 7 s; and the fifth, designated by a period, is for leaves of 8 s and 9 s. Rows without cases are not represented in the plot. For example, in Figure 11.15, the first two rows for stem 1 (corresponding to 0-1 and 2-3) are omitted. Figure Stem-and-leaf plot of cholesterol levels. From SPSS for W ndows Base System User s Guide,p.185.Copyright# Used by permission of the publisher, SPSS, Inc., Chicago, IL.

4 384 KNOWLEDGE DISCOVERY This stem-and-leaf plot differs from the previous one in another way. Since cholesterol values have a wide rangeöfrom 106 to 515 in this exampleöusing the first two digits for the stem would result in an unnecessarily detailed plot. Therefore, we will use only the hundreds digit as the stem, rather than the first two digits. The stem setting of 100 appears in the row labeled Stem width. The leaf is then the tens digit. The last digit is ignored. Thus, from this particular stem-and-leaf plot, it is not possible to determine the exact cholesterol level for a case. Instead, each is classified by only its first two digits. BOXPLOTS A display that further summarizes information about the distribution of the values is the boxplot. Instead of plotting the actual values, a boxplot displays summary statistics for the distribution. It is a plot of the 25th, 50th, and 75th percentiles, as well as values far removed from the rest. Figure shows an annotated sketch of a boxplot. The lower boundary of the box is the 25th percentile. Tukey refers to the 25th and 75th percentile hinges. Note that the 50th percentile is the median of the overall data set, the 25th percentile is the median of those values below the median, and the 75th percentile is the median of those values above the median. The horizontal line inside the box represents the median. 50% of the cases are included within the box. The box length corresponds to the interquartile range, which is the difference between the 25th and 75th percentiles. The boxplot includes two categories of cases with outlying values. Cases with values that are more than 3 box-lengths from the upper or lower edge of the box are called extreme values. On the boxplot, these are designated with an asterisk (*). Cases with values that are between 1.5 and 3 box-lengths from the upper or lower edge of the box are called outliers and are designated with a circle. The largest and smallest observed values that aren t outliers are also shown. Lines are drawn from the ends of the box to these values. (These lines are sometimes called whiskers and the plot is then called a box-and-whiskers plot.) Despite its simplicity, the boxplot contains an impressive amount of information. From the median you can determine the central tendency, or location. From the length of the box, you can determine the spread, or variability, of your observations. If the median is not in the center of the box, you know that the observed values are skewed. If the median is closer to the bottom of the box than to the top, the data are positively skewed. If the median is closer to the top of the box than to the bottom, the opposite is true: the distribution is negatively skewed. The length of the tail is shown by the whiskers and the outlying and extreme points.

5 328 C hap te r Ten 2. Write the names of the categories above and below the horizontal line. Think of these as branches from the main trunk of the tree. 3. Draw in the detailed cause data for each category. Think of these as limbs and twigs on the branches. A good cause and effect diagram will have many "twigs," as shown in Fig. loa. If your cause and effect diagram doesn't have a lot of smaller branches and twigs, it shows that the understanding of the problem is superficial. Chances are that you need the help of someone outside of your group to aid in the understanding, perhaps someone more closely associated with the problem. Cause and effect diagrams come in several basic types. The dispersion analysis type is created by repeatedly asking "why does this dispersion occur?" For example, we might want to know why all of our fresh peaches don't have the same color. The production process class cause and effect diagram uses production processes as the main categories, or branches of the diagram. The processes are shown joined by the horizontal line. Figure 10.5 is an example of this type of diagram. The cause enumeration cause and effect diagram simply displays all possible causes of a given problem grouped according to rational categories. This type of cause and effect diagram lends itself readily to the brainstorming approach we are using. A variation of the basic cause and effect diagram, developed by Dr. Ryuji Fukuda of Japan, is cause and effect diagrams with the addition of cards, or CEDAC. The main difference is that the group gathers ideas outside of the meeting room on small cards, as well as in group meetings. The cards also serve as a vehicle for gathering input from people who are not in the group; they can be distributed to anyone involved with the process. Often the cards provide more information than the brief entries on a standard cause and effect diagram. The cause and effect diagram is built by actually placing the cards on the branches. Boxplots A boxplot displays summary statistics for a set of distributions. It is a plot of the 25th, 50th, and 75th percentiles, as well as values far removed from the rest. Figure 10.6 shows an annotated sketch of a boxplot. The lower boundary of the box is the 25th percentile. Tukey refers to the 25th and 75th percentile "hinges." Note that the 50th percentile is the median of the overall data set, the 25th percentile is the median of those values below the median, and the 75th percentile is the median of those values above the median. The horizontal line inside the box represents the median. Fifty percent of the cases are included within the box. The box length corresponds to the interquartile range, which is the difference between the 25th and 75th percentiles. The boxplot includes two categories of cases with outlying values. Cases with values that are more than 3 box-lengths from the upper or lower edge of the box are called extreme values. On the boxplot, these are designated with an asterisk (*). Cases with values that are between 1.5 and 3 box-lengths from the upper or lower edge of the box are called outliers and are designated with a circle. The largest and smallest observed values that aren't outliers are also shown. Lines are drawn from the ends of the box to these values. (These lines are sometimes called whiskers and the plot is then called a box-and-whiskers plot.) Despite its simplicity, the boxplot contains an impressive amount of information. From the median you can determine the central tendency, or location. From the length

6 330 C hap te r Ten Cause A- / Subcause Cause A- -Cause B J~ I Process l~ I Process ~ ~'IL- p_ro_b_le_m Cause A- / / - Cause B Cause C - / / Cause A- / / / -Cause B Subcause / _ Cause C / - Cause D FIGURE 10.5 Production process class cause and effect diagram. ~ * o Values more than 3 box-lengths above the 75th percentile (extremes) Values more than 1.5 box-lengths above the 75th percentile (outliers) Largest observed value that isn't an outlier 75th percentile Median (50th percentile) 25th percentile o * FIGURE 10.6 Annotated boxplot. Smallest observed value that isn't an outlier Values more than 1.5 box-lengths below the 25th percentile (outliers) Values more than 3 box-lengths below the 25th percentile (extremes) of the box, you can determine the spread, or variability, of your observations. If the median is not in the center of the box, you know that the observed values are skewed. If the median is closer to the bottom of the box than to the top, the data are positively skewed. If the median is closer to the top of the box than to the bottom, the opposite is true: the distribution is negatively skewed. The length of the tail is shown by the whiskers and the outlying and extreme points.

7 Analyze Phase O~----, ,------,,------, ,----- N = ' V~ 00 0' i-.~ fl' ~G ~~ ~O~ ()0 -S 0 -S ~~ 0 ~~ 0 ~G d> ~0~ ~0 0 0'0 2S ~~ (j «l FIGURE 10.7 Boxplots of salary by job category. Employment category Boxplots are particularly useful for comparing the distribution of values in several groups. Figure 10.7 shows boxplots for the salaries for several different job titles. The boxplot makes it easy to see the different properties of the distributions. The location, variability, and shapes of the distributions are obvious at a glance. This ease of interpretation is something that statistics alone cannot provide. Statistical Inference This section discusses the basic concept of statistical inference. The reader should also consult the glossary in the Appendix for additional information. Inferential statistics belong to the enumerative class of statistical methods. All statements made in this section are valid only for stable processes, that is, processes in statistical control. Although most applications of Six Sigma are analytic, there are times when enumerative statistics prove useful. The term inference is defined as (1) the act or process of deriving logical conclusions from premises known or assumed to be true, or (2) the act of reasoning from factual knowledge or evidence. Inferential statistics provide information that is used in the process of inference. As can be seen from the definitions, inference involves two domains: the premises and the evidence or factual knowledge. Additionally, there are two conceptual frameworks for addressing premises questions in inference: the design-based approach and the model-based approach. As discussed by Koch and Gillings (1983), a statistical analysis whose only assumptions are random selection of units or random allocation of units to experimental conditions results in design-based inferences; or, equivalently, randomization-based inferences. The objective is to structure sampling such that the sampled population has the same

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Introduction to Statistical Data Analysis I

Introduction to Statistical Data Analysis I Introduction to Statistical Data Analysis I JULY 2011 Afsaneh Yazdani Preface What is Statistics? Preface What is Statistics? Science of: designing studies or experiments, collecting data Summarizing/modeling/analyzing

More information

V. Gathering and Exploring Data

V. Gathering and Exploring Data V. Gathering and Exploring Data With the language of probability in our vocabulary, we re now ready to talk about sampling and analyzing data. Data Analysis We can divide statistical methods into roughly

More information

Undertaking statistical analysis of

Undertaking statistical analysis of Descriptive statistics: Simply telling a story Laura Delaney introduces the principles of descriptive statistical analysis and presents an overview of the various ways in which data can be presented by

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis: Section 1.0 Making Sense of Data Statistics: Data Analysis: Individuals objects described by a set of data Variable any characteristic of an individual Categorical Variable places an individual into one

More information

Descriptive statistics

Descriptive statistics CHAPTER 3 Descriptive statistics 41 Descriptive statistics 3 CHAPTER OVERVIEW In Chapter 1 we outlined some important factors in research design. In this chapter we will be explaining the basic ways of

More information

Unit 7 Comparisons and Relationships

Unit 7 Comparisons and Relationships Unit 7 Comparisons and Relationships Objectives: To understand the distinction between making a comparison and describing a relationship To select appropriate graphical displays for making comparisons

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Lesson 9 Presentation and Display of Quantitative Data

Lesson 9 Presentation and Display of Quantitative Data Lesson 9 Presentation and Display of Quantitative Data Learning Objectives All students will identify and present data using appropriate graphs, charts and tables. All students should be able to justify

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Statistics is a broad mathematical discipline dealing with

Statistics is a broad mathematical discipline dealing with Statistical Primer for Cardiovascular Research Descriptive Statistics and Graphical Displays Martin G. Larson, SD Statistics is a broad mathematical discipline dealing with techniques for the collection,

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Chapter 1: Introduction to Statistics Variables A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship

More information

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60 M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points 1-10 10 11 3 12 4 13 3 14 10 15 14 16 10 17 7 18 4 19 4 Total 60 Multiple choice questions (1 point each) For questions

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

Probability and Statistics. Chapter 1

Probability and Statistics. Chapter 1 Probability and Statistics Chapter 1 Individuals and Variables Individuals and Variables Individuals are objects described by data. Individuals and Variables Individuals are objects described by data.

More information

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data. Question 1 PRINTABLE VERSION Quiz 1 True or False: The amount of rainfall in your state last month is an example of continuous data. a) True b) False Question 2 True or False: The standard deviation is

More information

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017 Essential Statistics for Nursing Research Kristen Carlin, MPH Seattle Nursing Research Workshop January 30, 2017 Table of Contents Plots Descriptive statistics Sample size/power Correlations Hypothesis

More information

Appendix B Statistical Methods

Appendix B Statistical Methods Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon

More information

Measuring the User Experience

Measuring the User Experience Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide

More information

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months? Medical Statistics 1 Basic Concepts Farhad Pishgar Defining the data Population and samples Except when a full census is taken, we collect data on a sample from a much larger group called the population.

More information

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social

More information

Frequency distributions

Frequency distributions Applied Biostatistics distributions Martin Bland Professor of Health Statistics University of York http://www-users.york.ac.uk/~mb55/ Types of data Qualitative data arise when individuals may fall into

More information

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution Organizing Data Frequency How many of the data are in a category or range Just count up how many there are Notation x = number in one category n = total number in sample (all categories combined) Relative

More information

Biostatistics for Med Students. Lecture 1

Biostatistics for Med Students. Lecture 1 Biostatistics for Med Students Lecture 1 John J. Chen, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 14, 2018 Lecture note: http://biostat.jabsom.hawaii.edu/education/training.html

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics : Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different

More information

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible Pledge: 10/4/2007 MATH 171 Name: Dr. Lunsford Test 1 100 Points Possible I. Short Answer and Multiple Choice. (36 points total) 1. Circle all of the items below that are measures of center of a distribution:

More information

HS Exam 1 -- March 9, 2006

HS Exam 1 -- March 9, 2006 Please write your name on the back. Don t forget! Part A: Short answer, multiple choice, and true or false questions. No use of calculators, notes, lab workbooks, cell phones, neighbors, brain implants,

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

Section I: Multiple Choice Select the best answer for each question.

Section I: Multiple Choice Select the best answer for each question. Chapter 1 AP Statistics Practice Test (TPS- 4 p78) Section I: Multiple Choice Select the best answer for each question. 1. You record the age, marital status, and earned income of a sample of 1463 women.

More information

9.0 L '- ---'- ---'- --' X

9.0 L '- ---'- ---'- --' X 352 C hap te r Ten 11.0 10.5 Y 10.0 9.5 9.0 L...- ----'- ---'- ---'- --' 0.0 0.5 1.0 X 1.5 2.0 FIGURE 10.23 Interpreting r = 0 for curvilinear data. Establishing causation requires solid scientific understanding.

More information

Missy Wittenzellner Big Brother Big Sister Project

Missy Wittenzellner Big Brother Big Sister Project Missy Wittenzellner Big Brother Big Sister Project Evaluation of Normality: Before the analysis, we need to make sure that the data is normally distributed Based on the histogram, our match length data

More information

Statistics: Making Sense of the Numbers

Statistics: Making Sense of the Numbers Statistics: Making Sense of the Numbers Chapter 9 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display, including

More information

Student Performance Q&A:

Student Performance Q&A: Student Performance Q&A: 2009 AP Statistics Free-Response Questions The following comments on the 2009 free-response questions for AP Statistics were written by the Chief Reader, Christine Franklin of

More information

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION Variables In the social sciences data are the observed and/or measured characteristics of individuals and groups

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

DO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO

DO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO NATS 1500 Mid-term test A1 Page 1 of 8 Name (PRINT) Student Number Signature Instructions: York University DIVISION OF NATURAL SCIENCE NATS 1500 3.0 Statistics and Reasoning in Modern Society Mid-Term

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Pitfalls in Linear Regression Analysis

Pitfalls in Linear Regression Analysis Pitfalls in Linear Regression Analysis Due to the widespread availability of spreadsheet and statistical software for disposal, many of us do not really have a good understanding of how to use regression

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

M 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75 M 140 est 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75 Multiple choice questions (1 point each) For questions

More information

STATISTICS & PROBABILITY

STATISTICS & PROBABILITY STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness

More information

Statistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic

Statistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic Statistics: A Brief Overview Part I Katherine Shaver, M.S. Biostatistician Carilion Clinic Statistics: A Brief Overview Course Objectives Upon completion of the course, you will be able to: Distinguish

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

PROBABILITY Page 1 of So far we have been concerned about describing characteristics of a distribution.

PROBABILITY Page 1 of So far we have been concerned about describing characteristics of a distribution. PROBABILITY Page 1 of 9 I. Probability 1. So far we have been concerned about describing characteristics of a distribution. That is, frequency distribution, percentile ranking, measures of central tendency,

More information

AP Stats Review for Midterm

AP Stats Review for Midterm AP Stats Review for Midterm NAME: Format: 10% of final grade. There will be 20 multiple-choice questions and 3 free response questions. The multiple-choice questions will be worth 2 points each and the

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. MODULE 02: DESCRIBING DT SECTION C: KEY POINTS C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. C-2:

More information

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of numbers. Also, students will understand why some measures

More information

Observational studies; descriptive statistics

Observational studies; descriptive statistics Observational studies; descriptive statistics Patrick Breheny August 30 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 38 Observational studies Association versus causation

More information

ANOVA in SPSS (Practical)

ANOVA in SPSS (Practical) ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

Human-Computer Interaction IS4300. I6 Swing Layout Managers due now

Human-Computer Interaction IS4300. I6 Swing Layout Managers due now Human-Computer Interaction IS4300 1 I6 Swing Layout Managers due now You have two choices for requirements: 1) try to duplicate the functionality of an existing applet; or, 2) create your own (ideally

More information

International Statistical Literacy Competition of the ISLP Training package 3

International Statistical Literacy Competition of the ISLP   Training package 3 International Statistical Literacy Competition of the ISLP http://www.stat.auckland.ac.nz/~iase/islp/competition Training package 3 1.- Drinking Soda and bone Health http://figurethis.org/ 1 2 2.- Comparing

More information

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 4 Chapter 2 CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 1. A. Age: name/interval; military dictatorship: value/nominal; strongly oppose: value/ ordinal; election year: name/interval; 62 percent: value/interval;

More information

Copyright 2014, 2011, and 2008 Pearson Education, Inc. 1-1

Copyright 2014, 2011, and 2008 Pearson Education, Inc. 1-1 1-1 Statistics for Business and Economics Chapter 1 Statistics, Data, & Statistical Thinking 1-2 Contents 1. The Science of Statistics 2. Types of Statistical Applications in Business 3. Fundamental Elements

More information

Instructions and Checklist

Instructions and Checklist BIOSTATS 540 Fall 2015 Exam 1 Corrected 9-28-2015 Page 1 of 11 BIOSTATS 540 - Introductory Biostatistics Fall 2015 Examination 1 Due: Monday October 5, 2015 Last Date for Submission with Credit: Monday

More information

Examining differences between two sets of scores

Examining differences between two sets of scores 6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you

More information

Six Sigma Glossary Lean 6 Society

Six Sigma Glossary Lean 6 Society Six Sigma Glossary Lean 6 Society ABSCISSA ACCEPTANCE REGION ALPHA RISK ALTERNATIVE HYPOTHESIS ASSIGNABLE CAUSE ASSIGNABLE VARIATIONS The horizontal axis of a graph The region of values for which the null

More information

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj Statistical Techniques Masoud Mansoury and Anas Abulfaraj What is Statistics? https://www.youtube.com/watch?v=lmmzj7599pw The definition of Statistics The practice or science of collecting and analyzing

More information

q2_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

q2_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. q2_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A sporting goods retailer conducted a customer survey to determine its customers primary reason

More information

2.4.1 STA-O Assessment 2

2.4.1 STA-O Assessment 2 2.4.1 STA-O Assessment 2 Work all the problems and determine the correct answers. When you have completed the assessment, open the Assessment 2 activity and input your responses into the online grading

More information

AP Statistics. Semester One Review Part 1 Chapters 1-5

AP Statistics. Semester One Review Part 1 Chapters 1-5 AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically

More information

The normal curve and standardisation. Percentiles, z-scores

The normal curve and standardisation. Percentiles, z-scores The normal curve and standardisation Percentiles, z-scores The normal curve Frequencies (histogram) Characterised by: Central tendency Mean Median Mode uni, bi, multi Positively skewed, negatively skewed

More information

STP226 Brief Class Notes Instructor: Ela Jackiewicz

STP226 Brief Class Notes Instructor: Ela Jackiewicz CHAPTER 2 Organizing Data Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that can be assigned a numerical value or nonnumerical

More information

Introduction & Basics

Introduction & Basics CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...

More information

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive

More information

Math 214 REVIEW SHEET EXAM #1 Exam: Wednesday March, 2007

Math 214 REVIEW SHEET EXAM #1 Exam: Wednesday March, 2007 Math 214 REVIEW SHEET EXAM #1 Exam: Wednesday March, 2007 THOUGHT QUESTIONS: 1. Suppose you are interested in determining if women are safer drivers than men in New York. Can you go to the Dept. of Motor

More information

Before we get started:

Before we get started: Before we get started: http://arievaluation.org/projects-3/ AEA 2018 R-Commander 1 Antonio Olmos Kai Schramm Priyalathta Govindasamy Antonio.Olmos@du.edu AntonioOlmos@aumhc.org AEA 2018 R-Commander 2 Plan

More information

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. 1. The bell-shaped frequency curve is so common that if a population has this shape, the measurements are

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Overview of statistical methods 283. Figure 9.5. Linearity illustrated.

Overview of statistical methods 283. Figure 9.5. Linearity illustrated. Overview of statistical methods 283 Figure 9.5. Linearity illustrated. OVERVIEW OF STATISTICAL METHODS Enumerative versus analytic statistical methods How would you respond to the following question? A

More information

DOWNLOAD PDF SUMMARIZING AND INTERPRETING DATA : USING STATISTICS

DOWNLOAD PDF SUMMARIZING AND INTERPRETING DATA : USING STATISTICS Chapter 1 : Summarizing Numerical Data Sets Worksheets Stem and Leaf Activity Sheets with Answers. Students first create the stem and leaf plot. Then they use it to answer questions. This is a great way

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics Clinical Biostatistics Data, frequencies, and distributions Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Types of data Qualitative data arise when individuals

More information

Two-Way Independent ANOVA

Two-Way Independent ANOVA Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There

More information

Chapter 1. Picturing Distributions with Graphs

Chapter 1. Picturing Distributions with Graphs Chapter 1 Picturing Distributions with Graphs Statistics Statistics is a science that involves the extraction of information from numerical data obtained during an experiment or from a sample. It involves

More information

Section 1.2 Displaying Quantitative Data with Graphs. Dotplots

Section 1.2 Displaying Quantitative Data with Graphs. Dotplots Section 1.2 Displaying Quantitative Data with Graphs Dotplots One of the simplest graphs to construct and interpret is a dotplot. Each data value is shown as a dot above its location on a number line.

More information

Research Methodology in Social Sciences. by Dr. Rina Astini

Research Methodology in Social Sciences. by Dr. Rina Astini Research Methodology in Social Sciences by Dr. Rina Astini Email : rina_astini@mercubuana.ac.id What is Research? Re ---------------- Search Re means (once more, afresh, anew) or (back; with return to

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

How to interpret scientific & statistical graphs

How to interpret scientific & statistical graphs How to interpret scientific & statistical graphs Theresa A Scott, MS Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott 1 A brief introduction Graphics:

More information

Statistical Summaries. Kerala School of MathematicsCourse in Statistics for Scientists. Descriptive Statistics. Summary Statistics

Statistical Summaries. Kerala School of MathematicsCourse in Statistics for Scientists. Descriptive Statistics. Summary Statistics Kerala School of Mathematics Course in Statistics for Scientists Statistical Summaries Descriptive Statistics T.Krishnan Strand Life Sciences, Bangalore may be single numerical summaries of a batch, such

More information

Quality Digest Daily, March 3, 2014 Manuscript 266. Statistics and SPC. Two things sharing a common name can still be different. Donald J.

Quality Digest Daily, March 3, 2014 Manuscript 266. Statistics and SPC. Two things sharing a common name can still be different. Donald J. Quality Digest Daily, March 3, 2014 Manuscript 266 Statistics and SPC Two things sharing a common name can still be different Donald J. Wheeler Students typically encounter many obstacles while learning

More information

Graphic Organizers. Compare/Contrast. 1. Different. 2. Different. Alike

Graphic Organizers. Compare/Contrast. 1. Different. 2. Different. Alike 1 Compare/Contrast When you compare and contrast people, places, objects, or ideas, you are looking for how they are alike and how they are different. One way to organize your information is to use a Venn

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

4.3 Measures of Variation

4.3 Measures of Variation 4.3 Measures of Variation! How much variation is there in the data?! Look for the spread of the distribution.! What do we mean by spread? 1 Example Data set:! Weight of contents of regular cola (grams).

More information

Table of Contents. EHS EXERCISE 1: Risk Assessment: A Case Study of an Investigation of a Tuberculosis (TB) Outbreak in a Health Care Setting

Table of Contents. EHS EXERCISE 1: Risk Assessment: A Case Study of an Investigation of a Tuberculosis (TB) Outbreak in a Health Care Setting Instructions: Use this document to search by topic (e.g., exploratory data analysis or study design), by discipline (e.g., environmental health sciences or health policy and management) or by specific

More information

Test 1 Version A STAT 3090 Spring 2018

Test 1 Version A STAT 3090 Spring 2018 Multiple Choice: (Questions 1 20) Answer the following questions on the scantron provided using a #2 pencil. Bubble the response that best answers the question. Each multiple choice correct response is

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

UNIT V: Analysis of Non-numerical and Numerical Data SWK 330 Kimberly Baker-Abrams. In qualitative research: Grounded Theory

UNIT V: Analysis of Non-numerical and Numerical Data SWK 330 Kimberly Baker-Abrams. In qualitative research: Grounded Theory UNIT V: Analysis of Non-numerical and Numerical Data SWK 330 Kimberly Baker-Abrams In qualitative research: analysis is on going (occurs as data is gathered) must be careful not to draw conclusions before

More information

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc. Chapter 25 Paired Samples and Blocks Copyright 2010 Pearson Education, Inc. Paired Data Data are paired when the observations are collected in pairs or the observations in one group are naturally related

More information

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival* LAB ASSIGNMENT 4 1 INFERENCES FOR NUMERICAL DATA In this lab assignment, you will analyze the data from a study to compare survival times of patients of both genders with different primary cancers. First,

More information

Using Lertap 5 in a Parallel-Forms Reliability Study

Using Lertap 5 in a Parallel-Forms Reliability Study Lertap 5 documents series. Using Lertap 5 in a Parallel-Forms Reliability Study Larry R Nelson Last updated: 16 July 2003. (Click here to branch to www.lertap.curtin.edu.au.) This page has been published

More information

Descriptive Statistics Lecture

Descriptive Statistics Lecture Definitions: Lecture Psychology 280 Orange Coast College 2/1/2006 Statistics have been defined as a collection of methods for planning experiments, obtaining data, and then analyzing, interpreting and

More information