Knowledge discovery tools 381
|
|
- Spencer Dennis
- 6 years ago
- Views:
Transcription
1 Knowledge discovery tools 381 hours, and prime time is prime time precisely because more people tend to watch television at that time.. Compare histograms from di erent periods of time. Changes in histogram patterns from one time period to the next can be very useful in nding ways to improve the process.. Stratify the data by plotting separate histograms for di erent sources of data. For example, with the rod diameter histogram we might want to plot separate histograms for shafts made from di erent vendors materials or made by di erent operators or machines. This can sometimes reveal things that even control charts don t detect. Exploratory data analysis Data analysis can be divided into two broad phases: an exploratory phase and a confirmatory phase. Data analysis can be thought of as detective work. Before the trial one must collect evidence and examine it thoroughly. One must have a basis for developing a theory of cause and effect. Is there a gap in the data? Are there patterns that suggest some mechanism? Or, are there patterns that are simply mysterious (e.g., are all of the numbers even or odd)? Do outliers occur? Are there patterns in the variation of the data? What are the shapes of the distributions? This activity is known as exploratory data analysis (EDA). Tukey s 1977 book with this title elevated this task to acceptability among serious devotees of statistics. Four themes appear repeatedly throughout EDA: resistance, residuals, reexpression, and visual display. Resistance refers to the insensitivity of a method to a small change in the data. If a small amount of the data is contaminated, the method shouldn t produce dramatically different results. Residuals are what remain after removing the effect of a model or a summary. For example, one might subtract the mean from each value, or look at deviations about a regression line. Re-expression involves examination of different scales on which the data are displayed. Tukeyp focused most of his attention on simple power transformations such as y ¼ ffiffi x, y ¼ x 2, y ¼ 1=x. Visual display helps the analyst examine the data graphically to grasp regularities and peculiarities in the data. EDA is based on a simple basic premise: it is important to understand what you can do before you learn to measure how well you seem to have done it (Tukey, 1977). The objective is to investigate the appearance of the data, not to confirm some prior hypothesis. While there are a large number of EDA methods and techniques, there are two which are commonly encountered in Six Sigma work: stem-and-leaf plots and boxplots. These techniques are commonly included in most statistics packages. (SPSS was used to create the figures used
2 382 KNOWLEDGE DISCOVERY in this book.) However, the graphics of EDA are simple enough to be done easily by hand. STEM-AND-LEAF PLOTS Stem-and-leaf plots are a variation of histograms and are especially useful for smaller data sets (n<200). A major advantage of stem-and-leaf plots over the histogram is that the raw data values are preserved, sometimes completely and sometimes only partially. There is a loss of information in the histogram because the histogram reduces the data by grouping several values into a single cell. Figure is a stem-and-leaf plot of diastolic blood pressures. As in a histogram, the length of each row corresponds to the number of cases that fall into a particular interval. However, a stem-and-leaf plot represents each case with a numeric value that corresponds to the actual observed value. This is done by dividing observed values into two componentsöthe leading digit or digits, called the stem, and the trailing digit, called the leaf. For example, the value 75 has a stem of 7 and a leaf of 5. Figure Stem-and-leaf plot of diastolic blood pressures. From SPSS for W ndows Base System User s Guide, p Copyright # Used by permission of the publisher, SPSS, Inc., Chicago, IL.
3 Knowledge discovery tools 383 In this example, each stem is divided into two rows. The first row of each pair has cases with leaves of 0 through 4, while the second row has cases with leaves of 5 through 9. Consider the two rows that correspond to the stem of 11. In the first row, we can see that there are four cases with diastolic blood pressure of 110 and one case with a reading of 113. In the second row, there are two cases with a value of 115 and one case each with a value of 117, 118, and 119. The last row of the stem-and-leaf plot is for cases with extreme values (values far removed from the rest). In this row, the actual values are displayed in parentheses. In the frequency column, we see that there are four extreme cases. Their values are 125, 133, and 160. Only distinct values are listed. When there are few stems, it is sometimes useful to subdivide each stem even further. Consider Figure a stem-and-leaf plot of cholesterol levels. In this figure, stems 2 and 3 are divided into five parts, each representing two leaf values. The first row, designated by an asterisk, is for leaves of 0 and 1; the next, designated by t, is for leaves of 2 s and 3 s; the third, designated by f, is for leaves of 4 s and 5 s; the fourth, designated by s, is for leaves of 6 s and 7 s; and the fifth, designated by a period, is for leaves of 8 s and 9 s. Rows without cases are not represented in the plot. For example, in Figure 11.15, the first two rows for stem 1 (corresponding to 0-1 and 2-3) are omitted. Figure Stem-and-leaf plot of cholesterol levels. From SPSS for W ndows Base System User s Guide,p.185.Copyright# Used by permission of the publisher, SPSS, Inc., Chicago, IL.
4 384 KNOWLEDGE DISCOVERY This stem-and-leaf plot differs from the previous one in another way. Since cholesterol values have a wide rangeöfrom 106 to 515 in this exampleöusing the first two digits for the stem would result in an unnecessarily detailed plot. Therefore, we will use only the hundreds digit as the stem, rather than the first two digits. The stem setting of 100 appears in the row labeled Stem width. The leaf is then the tens digit. The last digit is ignored. Thus, from this particular stem-and-leaf plot, it is not possible to determine the exact cholesterol level for a case. Instead, each is classified by only its first two digits. BOXPLOTS A display that further summarizes information about the distribution of the values is the boxplot. Instead of plotting the actual values, a boxplot displays summary statistics for the distribution. It is a plot of the 25th, 50th, and 75th percentiles, as well as values far removed from the rest. Figure shows an annotated sketch of a boxplot. The lower boundary of the box is the 25th percentile. Tukey refers to the 25th and 75th percentile hinges. Note that the 50th percentile is the median of the overall data set, the 25th percentile is the median of those values below the median, and the 75th percentile is the median of those values above the median. The horizontal line inside the box represents the median. 50% of the cases are included within the box. The box length corresponds to the interquartile range, which is the difference between the 25th and 75th percentiles. The boxplot includes two categories of cases with outlying values. Cases with values that are more than 3 box-lengths from the upper or lower edge of the box are called extreme values. On the boxplot, these are designated with an asterisk (*). Cases with values that are between 1.5 and 3 box-lengths from the upper or lower edge of the box are called outliers and are designated with a circle. The largest and smallest observed values that aren t outliers are also shown. Lines are drawn from the ends of the box to these values. (These lines are sometimes called whiskers and the plot is then called a box-and-whiskers plot.) Despite its simplicity, the boxplot contains an impressive amount of information. From the median you can determine the central tendency, or location. From the length of the box, you can determine the spread, or variability, of your observations. If the median is not in the center of the box, you know that the observed values are skewed. If the median is closer to the bottom of the box than to the top, the data are positively skewed. If the median is closer to the top of the box than to the bottom, the opposite is true: the distribution is negatively skewed. The length of the tail is shown by the whiskers and the outlying and extreme points.
5 328 C hap te r Ten 2. Write the names of the categories above and below the horizontal line. Think of these as branches from the main trunk of the tree. 3. Draw in the detailed cause data for each category. Think of these as limbs and twigs on the branches. A good cause and effect diagram will have many "twigs," as shown in Fig. loa. If your cause and effect diagram doesn't have a lot of smaller branches and twigs, it shows that the understanding of the problem is superficial. Chances are that you need the help of someone outside of your group to aid in the understanding, perhaps someone more closely associated with the problem. Cause and effect diagrams come in several basic types. The dispersion analysis type is created by repeatedly asking "why does this dispersion occur?" For example, we might want to know why all of our fresh peaches don't have the same color. The production process class cause and effect diagram uses production processes as the main categories, or branches of the diagram. The processes are shown joined by the horizontal line. Figure 10.5 is an example of this type of diagram. The cause enumeration cause and effect diagram simply displays all possible causes of a given problem grouped according to rational categories. This type of cause and effect diagram lends itself readily to the brainstorming approach we are using. A variation of the basic cause and effect diagram, developed by Dr. Ryuji Fukuda of Japan, is cause and effect diagrams with the addition of cards, or CEDAC. The main difference is that the group gathers ideas outside of the meeting room on small cards, as well as in group meetings. The cards also serve as a vehicle for gathering input from people who are not in the group; they can be distributed to anyone involved with the process. Often the cards provide more information than the brief entries on a standard cause and effect diagram. The cause and effect diagram is built by actually placing the cards on the branches. Boxplots A boxplot displays summary statistics for a set of distributions. It is a plot of the 25th, 50th, and 75th percentiles, as well as values far removed from the rest. Figure 10.6 shows an annotated sketch of a boxplot. The lower boundary of the box is the 25th percentile. Tukey refers to the 25th and 75th percentile "hinges." Note that the 50th percentile is the median of the overall data set, the 25th percentile is the median of those values below the median, and the 75th percentile is the median of those values above the median. The horizontal line inside the box represents the median. Fifty percent of the cases are included within the box. The box length corresponds to the interquartile range, which is the difference between the 25th and 75th percentiles. The boxplot includes two categories of cases with outlying values. Cases with values that are more than 3 box-lengths from the upper or lower edge of the box are called extreme values. On the boxplot, these are designated with an asterisk (*). Cases with values that are between 1.5 and 3 box-lengths from the upper or lower edge of the box are called outliers and are designated with a circle. The largest and smallest observed values that aren't outliers are also shown. Lines are drawn from the ends of the box to these values. (These lines are sometimes called whiskers and the plot is then called a box-and-whiskers plot.) Despite its simplicity, the boxplot contains an impressive amount of information. From the median you can determine the central tendency, or location. From the length
6 330 C hap te r Ten Cause A- / Subcause Cause A- -Cause B J~ I Process l~ I Process ~ ~'IL- p_ro_b_le_m Cause A- / / - Cause B Cause C - / / Cause A- / / / -Cause B Subcause / _ Cause C / - Cause D FIGURE 10.5 Production process class cause and effect diagram. ~ * o Values more than 3 box-lengths above the 75th percentile (extremes) Values more than 1.5 box-lengths above the 75th percentile (outliers) Largest observed value that isn't an outlier 75th percentile Median (50th percentile) 25th percentile o * FIGURE 10.6 Annotated boxplot. Smallest observed value that isn't an outlier Values more than 1.5 box-lengths below the 25th percentile (outliers) Values more than 3 box-lengths below the 25th percentile (extremes) of the box, you can determine the spread, or variability, of your observations. If the median is not in the center of the box, you know that the observed values are skewed. If the median is closer to the bottom of the box than to the top, the data are positively skewed. If the median is closer to the top of the box than to the bottom, the opposite is true: the distribution is negatively skewed. The length of the tail is shown by the whiskers and the outlying and extreme points.
7 Analyze Phase O~----, ,------,,------, ,----- N = ' V~ 00 0' i-.~ fl' ~G ~~ ~O~ ()0 -S 0 -S ~~ 0 ~~ 0 ~G d> ~0~ ~0 0 0'0 2S ~~ (j «l FIGURE 10.7 Boxplots of salary by job category. Employment category Boxplots are particularly useful for comparing the distribution of values in several groups. Figure 10.7 shows boxplots for the salaries for several different job titles. The boxplot makes it easy to see the different properties of the distributions. The location, variability, and shapes of the distributions are obvious at a glance. This ease of interpretation is something that statistics alone cannot provide. Statistical Inference This section discusses the basic concept of statistical inference. The reader should also consult the glossary in the Appendix for additional information. Inferential statistics belong to the enumerative class of statistical methods. All statements made in this section are valid only for stable processes, that is, processes in statistical control. Although most applications of Six Sigma are analytic, there are times when enumerative statistics prove useful. The term inference is defined as (1) the act or process of deriving logical conclusions from premises known or assumed to be true, or (2) the act of reasoning from factual knowledge or evidence. Inferential statistics provide information that is used in the process of inference. As can be seen from the definitions, inference involves two domains: the premises and the evidence or factual knowledge. Additionally, there are two conceptual frameworks for addressing premises questions in inference: the design-based approach and the model-based approach. As discussed by Koch and Gillings (1983), a statistical analysis whose only assumptions are random selection of units or random allocation of units to experimental conditions results in design-based inferences; or, equivalently, randomization-based inferences. The objective is to structure sampling such that the sampled population has the same
Understandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationIntroduction to Statistical Data Analysis I
Introduction to Statistical Data Analysis I JULY 2011 Afsaneh Yazdani Preface What is Statistics? Preface What is Statistics? Science of: designing studies or experiments, collecting data Summarizing/modeling/analyzing
More informationV. Gathering and Exploring Data
V. Gathering and Exploring Data With the language of probability in our vocabulary, we re now ready to talk about sampling and analyzing data. Data Analysis We can divide statistical methods into roughly
More informationUndertaking statistical analysis of
Descriptive statistics: Simply telling a story Laura Delaney introduces the principles of descriptive statistical analysis and presents an overview of the various ways in which data can be presented by
More informationSPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.
SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods
More informationWDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?
WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters
More informationWhat you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu
What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test
More informationPopulation. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:
Section 1.0 Making Sense of Data Statistics: Data Analysis: Individuals objects described by a set of data Variable any characteristic of an individual Categorical Variable places an individual into one
More informationDescriptive statistics
CHAPTER 3 Descriptive statistics 41 Descriptive statistics 3 CHAPTER OVERVIEW In Chapter 1 we outlined some important factors in research design. In this chapter we will be explaining the basic ways of
More informationUnit 7 Comparisons and Relationships
Unit 7 Comparisons and Relationships Objectives: To understand the distinction between making a comparison and describing a relationship To select appropriate graphical displays for making comparisons
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More informationCHAPTER 3 DATA ANALYSIS: DESCRIBING DATA
Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such
More informationLesson 9 Presentation and Display of Quantitative Data
Lesson 9 Presentation and Display of Quantitative Data Learning Objectives All students will identify and present data using appropriate graphs, charts and tables. All students should be able to justify
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationStatistics is a broad mathematical discipline dealing with
Statistical Primer for Cardiovascular Research Descriptive Statistics and Graphical Displays Martin G. Larson, SD Statistics is a broad mathematical discipline dealing with techniques for the collection,
More informationChapter 1: Introduction to Statistics
Chapter 1: Introduction to Statistics Variables A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship
More informationM 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60
M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points 1-10 10 11 3 12 4 13 3 14 10 15 14 16 10 17 7 18 4 19 4 Total 60 Multiple choice questions (1 point each) For questions
More informationQuantitative Methods in Computing Education Research (A brief overview tips and techniques)
Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu
More informationProbability and Statistics. Chapter 1
Probability and Statistics Chapter 1 Individuals and Variables Individuals and Variables Individuals are objects described by data. Individuals and Variables Individuals are objects described by data.
More informationPRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.
Question 1 PRINTABLE VERSION Quiz 1 True or False: The amount of rainfall in your state last month is an example of continuous data. a) True b) False Question 2 True or False: The standard deviation is
More informationTable of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017
Essential Statistics for Nursing Research Kristen Carlin, MPH Seattle Nursing Research Workshop January 30, 2017 Table of Contents Plots Descriptive statistics Sample size/power Correlations Hypothesis
More informationAppendix B Statistical Methods
Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon
More informationMeasuring the User Experience
Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide
More informationMedical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?
Medical Statistics 1 Basic Concepts Farhad Pishgar Defining the data Population and samples Except when a full census is taken, we collect data on a sample from a much larger group called the population.
More informationResearch Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process
Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social
More informationFrequency distributions
Applied Biostatistics distributions Martin Bland Professor of Health Statistics University of York http://www-users.york.ac.uk/~mb55/ Types of data Qualitative data arise when individuals may fall into
More informationOrganizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution
Organizing Data Frequency How many of the data are in a category or range Just count up how many there are Notation x = number in one category n = total number in sample (all categories combined) Relative
More informationBiostatistics for Med Students. Lecture 1
Biostatistics for Med Students Lecture 1 John J. Chen, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 14, 2018 Lecture note: http://biostat.jabsom.hawaii.edu/education/training.html
More informationSection 6: Analysing Relationships Between Variables
6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations
More informationChapter 7: Descriptive Statistics
Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of
More informationHere are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :
Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different
More information10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible
Pledge: 10/4/2007 MATH 171 Name: Dr. Lunsford Test 1 100 Points Possible I. Short Answer and Multiple Choice. (36 points total) 1. Circle all of the items below that are measures of center of a distribution:
More informationHS Exam 1 -- March 9, 2006
Please write your name on the back. Don t forget! Part A: Short answer, multiple choice, and true or false questions. No use of calculators, notes, lab workbooks, cell phones, neighbors, brain implants,
More informationAnalysis and Interpretation of Data Part 1
Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying
More informationSection I: Multiple Choice Select the best answer for each question.
Chapter 1 AP Statistics Practice Test (TPS- 4 p78) Section I: Multiple Choice Select the best answer for each question. 1. You record the age, marital status, and earned income of a sample of 1463 women.
More information9.0 L '- ---'- ---'- --' X
352 C hap te r Ten 11.0 10.5 Y 10.0 9.5 9.0 L...- ----'- ---'- ---'- --' 0.0 0.5 1.0 X 1.5 2.0 FIGURE 10.23 Interpreting r = 0 for curvilinear data. Establishing causation requires solid scientific understanding.
More informationMissy Wittenzellner Big Brother Big Sister Project
Missy Wittenzellner Big Brother Big Sister Project Evaluation of Normality: Before the analysis, we need to make sure that the data is normally distributed Based on the histogram, our match length data
More informationStatistics: Making Sense of the Numbers
Statistics: Making Sense of the Numbers Chapter 9 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display, including
More informationStudent Performance Q&A:
Student Performance Q&A: 2009 AP Statistics Free-Response Questions The following comments on the 2009 free-response questions for AP Statistics were written by the Chief Reader, Christine Franklin of
More informationMBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION
MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION Variables In the social sciences data are the observed and/or measured characteristics of individuals and groups
More informationChapter 1: Explaining Behavior
Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring
More informationDO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO
NATS 1500 Mid-term test A1 Page 1 of 8 Name (PRINT) Student Number Signature Instructions: York University DIVISION OF NATURAL SCIENCE NATS 1500 3.0 Statistics and Reasoning in Modern Society Mid-Term
More informationStatistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions
Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated
More informationPitfalls in Linear Regression Analysis
Pitfalls in Linear Regression Analysis Due to the widespread availability of spreadsheet and statistical software for disposal, many of us do not really have a good understanding of how to use regression
More informationSTATISTICS AND RESEARCH DESIGN
Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have
More informationStatistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions
Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated
More informationM 140 Test 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75
M 140 est 1 A Name (1 point) SHOW YOUR WORK FOR FULL CREDI! Problem Max. Points Your Points 1-10 10 11 10 12 3 13 4 14 18 15 8 16 7 17 14 otal 75 Multiple choice questions (1 point each) For questions
More informationSTATISTICS & PROBABILITY
STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness
More informationStatistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic
Statistics: A Brief Overview Part I Katherine Shaver, M.S. Biostatistician Carilion Clinic Statistics: A Brief Overview Course Objectives Upon completion of the course, you will be able to: Distinguish
More informationCHAPTER ONE CORRELATION
CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to
More informationPROBABILITY Page 1 of So far we have been concerned about describing characteristics of a distribution.
PROBABILITY Page 1 of 9 I. Probability 1. So far we have been concerned about describing characteristics of a distribution. That is, frequency distribution, percentile ranking, measures of central tendency,
More informationAP Stats Review for Midterm
AP Stats Review for Midterm NAME: Format: 10% of final grade. There will be 20 multiple-choice questions and 3 free response questions. The multiple-choice questions will be worth 2 points each and the
More informationPsychology Research Process
Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:
More informationC-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.
MODULE 02: DESCRIBING DT SECTION C: KEY POINTS C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. C-2:
More informationStudents will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of
Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of numbers. Also, students will understand why some measures
More informationObservational studies; descriptive statistics
Observational studies; descriptive statistics Patrick Breheny August 30 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 38 Observational studies Association versus causation
More informationANOVA in SPSS (Practical)
ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling
More informationOne-Way Independent ANOVA
One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.
More informationHuman-Computer Interaction IS4300. I6 Swing Layout Managers due now
Human-Computer Interaction IS4300 1 I6 Swing Layout Managers due now You have two choices for requirements: 1) try to duplicate the functionality of an existing applet; or, 2) create your own (ideally
More informationInternational Statistical Literacy Competition of the ISLP Training package 3
International Statistical Literacy Competition of the ISLP http://www.stat.auckland.ac.nz/~iase/islp/competition Training package 3 1.- Drinking Soda and bone Health http://figurethis.org/ 1 2 2.- Comparing
More informationCHAPTER 2. MEASURING AND DESCRIBING VARIABLES
4 Chapter 2 CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 1. A. Age: name/interval; military dictatorship: value/nominal; strongly oppose: value/ ordinal; election year: name/interval; 62 percent: value/interval;
More informationCopyright 2014, 2011, and 2008 Pearson Education, Inc. 1-1
1-1 Statistics for Business and Economics Chapter 1 Statistics, Data, & Statistical Thinking 1-2 Contents 1. The Science of Statistics 2. Types of Statistical Applications in Business 3. Fundamental Elements
More informationInstructions and Checklist
BIOSTATS 540 Fall 2015 Exam 1 Corrected 9-28-2015 Page 1 of 11 BIOSTATS 540 - Introductory Biostatistics Fall 2015 Examination 1 Due: Monday October 5, 2015 Last Date for Submission with Credit: Monday
More informationExamining differences between two sets of scores
6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you
More informationSix Sigma Glossary Lean 6 Society
Six Sigma Glossary Lean 6 Society ABSCISSA ACCEPTANCE REGION ALPHA RISK ALTERNATIVE HYPOTHESIS ASSIGNABLE CAUSE ASSIGNABLE VARIATIONS The horizontal axis of a graph The region of values for which the null
More informationStatistical Techniques. Masoud Mansoury and Anas Abulfaraj
Statistical Techniques Masoud Mansoury and Anas Abulfaraj What is Statistics? https://www.youtube.com/watch?v=lmmzj7599pw The definition of Statistics The practice or science of collecting and analyzing
More informationq2_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
q2_2 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. A sporting goods retailer conducted a customer survey to determine its customers primary reason
More information2.4.1 STA-O Assessment 2
2.4.1 STA-O Assessment 2 Work all the problems and determine the correct answers. When you have completed the assessment, open the Assessment 2 activity and input your responses into the online grading
More informationAP Statistics. Semester One Review Part 1 Chapters 1-5
AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically
More informationThe normal curve and standardisation. Percentiles, z-scores
The normal curve and standardisation Percentiles, z-scores The normal curve Frequencies (histogram) Characterised by: Central tendency Mean Median Mode uni, bi, multi Positively skewed, negatively skewed
More informationSTP226 Brief Class Notes Instructor: Ela Jackiewicz
CHAPTER 2 Organizing Data Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that can be assigned a numerical value or nonnumerical
More informationIntroduction & Basics
CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...
More informationChapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE
Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive
More informationMath 214 REVIEW SHEET EXAM #1 Exam: Wednesday March, 2007
Math 214 REVIEW SHEET EXAM #1 Exam: Wednesday March, 2007 THOUGHT QUESTIONS: 1. Suppose you are interested in determining if women are safer drivers than men in New York. Can you go to the Dept. of Motor
More informationBefore we get started:
Before we get started: http://arievaluation.org/projects-3/ AEA 2018 R-Commander 1 Antonio Olmos Kai Schramm Priyalathta Govindasamy Antonio.Olmos@du.edu AntonioOlmos@aumhc.org AEA 2018 R-Commander 2 Plan
More informationPart 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.
Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. 1. The bell-shaped frequency curve is so common that if a population has this shape, the measurements are
More informationReadings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F
Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
More informationOverview of statistical methods 283. Figure 9.5. Linearity illustrated.
Overview of statistical methods 283 Figure 9.5. Linearity illustrated. OVERVIEW OF STATISTICAL METHODS Enumerative versus analytic statistical methods How would you respond to the following question? A
More informationDOWNLOAD PDF SUMMARIZING AND INTERPRETING DATA : USING STATISTICS
Chapter 1 : Summarizing Numerical Data Sets Worksheets Stem and Leaf Activity Sheets with Answers. Students first create the stem and leaf plot. Then they use it to answer questions. This is a great way
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationData, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics
Clinical Biostatistics Data, frequencies, and distributions Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Types of data Qualitative data arise when individuals
More informationTwo-Way Independent ANOVA
Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There
More informationChapter 1. Picturing Distributions with Graphs
Chapter 1 Picturing Distributions with Graphs Statistics Statistics is a science that involves the extraction of information from numerical data obtained during an experiment or from a sample. It involves
More informationSection 1.2 Displaying Quantitative Data with Graphs. Dotplots
Section 1.2 Displaying Quantitative Data with Graphs Dotplots One of the simplest graphs to construct and interpret is a dotplot. Each data value is shown as a dot above its location on a number line.
More informationResearch Methodology in Social Sciences. by Dr. Rina Astini
Research Methodology in Social Sciences by Dr. Rina Astini Email : rina_astini@mercubuana.ac.id What is Research? Re ---------------- Search Re means (once more, afresh, anew) or (back; with return to
More informationBusiness Statistics Probability
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationHow to interpret scientific & statistical graphs
How to interpret scientific & statistical graphs Theresa A Scott, MS Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott 1 A brief introduction Graphics:
More informationStatistical Summaries. Kerala School of MathematicsCourse in Statistics for Scientists. Descriptive Statistics. Summary Statistics
Kerala School of Mathematics Course in Statistics for Scientists Statistical Summaries Descriptive Statistics T.Krishnan Strand Life Sciences, Bangalore may be single numerical summaries of a batch, such
More informationQuality Digest Daily, March 3, 2014 Manuscript 266. Statistics and SPC. Two things sharing a common name can still be different. Donald J.
Quality Digest Daily, March 3, 2014 Manuscript 266 Statistics and SPC Two things sharing a common name can still be different Donald J. Wheeler Students typically encounter many obstacles while learning
More informationGraphic Organizers. Compare/Contrast. 1. Different. 2. Different. Alike
1 Compare/Contrast When you compare and contrast people, places, objects, or ideas, you are looking for how they are alike and how they are different. One way to organize your information is to use a Venn
More informationStill important ideas
Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement
More information4.3 Measures of Variation
4.3 Measures of Variation! How much variation is there in the data?! Look for the spread of the distribution.! What do we mean by spread? 1 Example Data set:! Weight of contents of regular cola (grams).
More informationTable of Contents. EHS EXERCISE 1: Risk Assessment: A Case Study of an Investigation of a Tuberculosis (TB) Outbreak in a Health Care Setting
Instructions: Use this document to search by topic (e.g., exploratory data analysis or study design), by discipline (e.g., environmental health sciences or health policy and management) or by specific
More informationTest 1 Version A STAT 3090 Spring 2018
Multiple Choice: (Questions 1 20) Answer the following questions on the scantron provided using a #2 pencil. Bubble the response that best answers the question. Each multiple choice correct response is
More informationStill important ideas
Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still
More informationUNIT V: Analysis of Non-numerical and Numerical Data SWK 330 Kimberly Baker-Abrams. In qualitative research: Grounded Theory
UNIT V: Analysis of Non-numerical and Numerical Data SWK 330 Kimberly Baker-Abrams In qualitative research: analysis is on going (occurs as data is gathered) must be careful not to draw conclusions before
More informationChapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.
Chapter 25 Paired Samples and Blocks Copyright 2010 Pearson Education, Inc. Paired Data Data are paired when the observations are collected in pairs or the observations in one group are naturally related
More informationLAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*
LAB ASSIGNMENT 4 1 INFERENCES FOR NUMERICAL DATA In this lab assignment, you will analyze the data from a study to compare survival times of patients of both genders with different primary cancers. First,
More informationUsing Lertap 5 in a Parallel-Forms Reliability Study
Lertap 5 documents series. Using Lertap 5 in a Parallel-Forms Reliability Study Larry R Nelson Last updated: 16 July 2003. (Click here to branch to www.lertap.curtin.edu.au.) This page has been published
More informationDescriptive Statistics Lecture
Definitions: Lecture Psychology 280 Orange Coast College 2/1/2006 Statistics have been defined as a collection of methods for planning experiments, obtaining data, and then analyzing, interpreting and
More information