DOWNLOAD PDF TAKE FULL ADVANTAGE OF DESCRIPTIVE STATISTICS

Size: px
Start display at page:

Download "DOWNLOAD PDF TAKE FULL ADVANTAGE OF DESCRIPTIVE STATISTICS"

Transcription

1 Chapter 1 : Understanding Descriptive and Inferential Statistics The above 8 descriptive statistics examples, problems and solutions are simple but aim to make you understand the descriptive data better. As you saw, descriptive statistics are used just to describe some basic features of the data in a study. Statistics The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling. Descriptive Statistics Descriptive statistics include the numbers, tables, charts, and graphs used to Return to Table of Contents describe, organize, summarize, and present raw data. Advantages Of Descriptive Statistics: Advantages Of Descriptive Statistics be essential for arranging and displaying data form the basis of rigorous data analysis be much easier to work with, interpret, and discuss than raw data help examine the tendencies, spread, normality, and reliability of a data set be rendered both graphically and numerically include useful techniques for summarizing data in visual form form the basis for more advanced statistical methods Disadvantages Of Descriptive Statistics: Disadvantages Of Descriptive Statistics be misused, misinterpreted, and incomplete be of limited use when samples and populations are small demand a fair amount of calculation and explanation fail to fully specify the extent to which non-normal data are a problem offer little information about causes and effects be dangerous if not analyzed completely Mean: Mean Mean is the average, the most common measure of central tendency. The mean of a population is designated by the Greek letter mu. The mean of a sample is designated by the symbol x-bar. The mean may not always be the best measure of central tendency, especially if data are skewed. For example, average income is often misleading since those few individuals with extremely high incomes may raise the overall average. Median Median is the value in the middle of the data set when the measurements are arranged in order of magnitude. For example, if 11 individuals were weighed and their weights arranged in ascending or descending order, the sixth value is the median since five values fall both above and below the sixth value. Mode Mode is the value occurring most often in the data. If the largest group of people in a sample measuring age were 25 years old, then 25 would be the mode. The mode is the least commonly used measure of central tendency, particularly in large data sets. However, the mode is still important for describing a data set, especially when more than one value occurs frequently. Variance Variance is expressed as the sum of the squares of the differences between each observation and the mean, which quantity is then divided by the sample size. For populations, it is designated by the square of the Greek letter sigma square. For samples, it is designated by the square of the letter s s2. Variance is used less frequently than standard deviation as a measure of dispersion. Variance can be used when we want to quickly compare the variability of two or more sets of interval data. Standard Deviation Standard deviation is expressed as the positive square root of the variance, i. F for populations and s for samples. It is the average difference between observed values and the mean. The standard deviation is used when expressing dispersion in the same units as the original measurements. It is used more commonly than the variance in expressing the degree to which data are spread out. Coefficient Of Variation Coefficient of variation measures relative dispersion by dividing the standard deviation by the mean and then multiplying by to render a percent. This number is designated as V for populations and v for samples and describes the variance of two data sets better than the standard deviation. Range Range measures the distance between the lowest and highest values in the data set and generally describes how spread out data are. For example, after an exam, an instructor may tell the class that the lowest score was 65 and the highest was The range would then be Percentiles Percentiles measure the percentage of data points which lie below a certain value when the values are ordered. Her scorecard informs her she is in the 90th percentile of students taking the exam. Thus, 90 percent of the students scored lower than she did. Quartiles Quartiles group observations such that 25 percent are arranged together according to their values. The top 25 percent of values are referred to as the upper quartile. The lowest 25 percent of the values are referred to as the lower quartile. Often the two quartiles on either side of the median are reported together as the interquartile range. Measures Of Skew Measures of skew describe how concentrated data points are at the high or low end of the scale of measurement. Skew is designated by the symbols Sk for populations and Sk Page 1

2 for samples. Skew indicates the degree of symmetry in a data set. The more skewed the distribution, the higher the variability of the measures, and the higher the variability, the less reliable are the data. But, if the distribution is skewed left negative skew, the mean lies to the left of the median and the mode. Measures Of Kurtosis Measures of kurtosis describe how concentrated data are around a single value, usually the mean. Thus, kurtosis assesses how peaked or flat is the data distribution. The more peaked or flat the distribution, the less normally distributed the data. And the less normal the distribution, the less reliable the data. Mesokurtic distributions are, like the normal bell curve, neither peaked nor flat. Platykurtic distributions are flatter than the normal bell curve. Leptokurtic distributions are more peaked than the normal bell curve. Inferential Statistics pertain to the procedures used to make forecasts, estimates, or judgments about a large set of data on the basis of the statistical characteristics of a smaller set a sample. Inferential statistics Are Most Often Used To Inferential statistics are frequently used to answer cause-and-effect questions and make predictions. Advantages Of Inferential statistics: Advantages Of Inferential statistics provide more detailed information than descriptive statistics yield insight into relationships between variables reveal causes and effects and make predictions generate convincing support for a given theory be generally accepted due to widespread use in business and academia Disadvantages Of Inferential statistics: Such tests are normally used with contingency tables which group observations based on common characteristics. ANOVA does this by comparing the dispersion of samples in order to make inferences about their means. Ideally, variables should move independently of one another, regardless of their means. Unfortunately, in the real world, groups of observations usually differ on a number of dimensions, making simple analyses of variance tests problematic since differences in other characteristics could cause observed differences in the values of the variables of interest. Correlation Correlation D, like ACOVA, is used to measure the similarity in the changes of values of interval variables but is not influenced by the units of measure. Another advantage of correlation is that it is always bounded by the interval: A value of 0 indicates no relationship. Regression analysis Regression analysis is often used to determine the effect of independent variables on a dependent variable. Regression measures the relative impact of each independent variable and is useful in forecasting. It is used most appropriately when both the independent and dependent variables are interval, though some social scientists also use regression on ordinal data. Like correlation, regression analysis assumes that the relationship between variables is linear. Logistic regression analysis Logistic regression analysis is used to examine relationships between variables when the dependent variable is nominal, even though independent variables are nominal, ordinal, interval, or some mixture thereof. One could then use several independent variables such as GED completion, job training, post-secondary education and the like to predict the odds of getting a job. Discriminate analysis Discriminate analysis is similar to logistic regression in that the outcome variable is categorical. However, here the independent variables must be interval. Factor analysis Factor analysis simultaneously examines multiple variables to determine if they reflect larger underlying dimensions. Factor analysis is commonly used when analyzing data from multi-question surveys to reduce the numerous questions to a smaller set of more global issues. Forecasting Forecasting exists in many variations. The predictive power of regression analysis can be an effective forecasting tool, but time series forecasting is more common when time is a significant independent variable. Page 2

3 Chapter 2 : Descriptive Statistics Statistics for Engineers 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one. Descriptive statistics are distinguished from inferential statistics or inductive statistics, in that descriptive statistics aim to summarize a data set quantitatively without employing a probabilistic formulation, rather than use the data to make inferences about the population that the data are thought to represent. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example in a paper reporting on a study involving human subjects, there typically appears a table giving the overall sample size, sample sizes in important subgroups e. Inferential statistics Inferential statistics tries to make inferences about a population from the sample data. We also use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one, or that it might have happened by chance in this study. Use in statistical analyses Descriptive statistics provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of quantitative analysis of data. Descriptive statistics summarize data. For example, the shooting percentage in basketball is a descriptive statistic that summarizes the performance of a player or a team. This number is the number of shots made divided by the number of shots taken. The percentage summarizes or describes multiple discrete events. Or, consider the scourge of many students, the grade point average. This single number describes the general performance of a student across the range of their course experiences. Describing a large set of observations with a single indicator risks distorting the original data or losing important detail. Despite these limitations, descriptive statistics provide a powerful summary that may enable comparisons across people or other units. Univariate analysis Univariate analysis involves the examination across cases of a single variable, focusing on three characteristics: It is common to compute all three for each study variable. Distribution The distribution is a summary of the frequency of individual or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of cases who had that value. For instance, computing the distribution of gender in the study population means computing the percentages that are male and female. The gender variable has only two, making it possible and meaningful to list each one. However, this does not work for a variable such as income that has many possible values. Typically, specific values are not particularly meaningful income of 50, is typically not meaningfully different from 51, Grouping the raw scores using ranges of values reduces the number of categories to something for meaningful. For instance, we might group incomes into ranges of,, 10,,, etc. Frequency distributions are depicted as a table or as a graph. Table 1 shows an age frequency distribution with five categories of age ranges defined. The same frequency distribution can be depicted in a graph as shown in Figure 2. This type of graph is often referred to as a histogram or bar chart. Central tendency The central tendency of a distribution locates the "center" of a distribution of values. The three major types of estimates of central tendency are the mean, the median, and the mode. The mean is the most commonly used method of describing central tendency. To compute the mean, take the sum of the values and divide by the count. For example, the mean quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values: The median is the score found at the middle of the set of values, i. One way to compute the median is to sort the values in numerical order, and then locate the value in the middle of the list. For example, if there are values, the median is the average of the two values in th and st positions. If there are values, the value in th position is the median. Sorting the 7 scores above produces: The median is If there are an even number of observations, then the median is the mean of the two middle scores. In the example, if there were an 8th observation, with a value of 25, the median becomes the average of the 4th and 5th scores, in this case The mode is the most frequently occurring value in the set. To determine the mode, compute the distribution as above. The mode is the value with the greatest frequency. In the example, the modal value 15, occurs three times. In some distributions there is a "tie" for the highest frequency, i. These are called multi-modal distributions. Notice that the three measures typically produce different results. The term "average" obscures Page 3

4 the difference between them and is better avoided. The three values are equal if the distribution is perfectly " normal " i. Dispersion Dispersion is the spread of values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. Th Test statistic In statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that reduces the data to one or a small number of values that can be used to perform a hypothesis test. Given a null hypothesis and a test statistic T, we can specify a "null value" T0 such that values of T close to T0 present the strongest evidence in favor of the null hypothesis, whereas values of T far from T0 present the strongest evidence against the null hypothesis. An important property of a test statistic is that we must be able to determine its sampling distribution under the null hypothesis, which allows us to calculate p-values. For example, suppose we wish to test whether a coin is fair i. If we flip the coin times and record the results, the raw data can be represented as a sequence of Heads and Tails. In this case, the exact sampling distribution of T is the binomial distribution, but for larger sample sizes the normal approximation can be used. Using one of these sampling distributions, it is possible to compute either a one-tailed or two-tailed p-value for the null hypothesis that the coin is fair. Note that the test statistic in this case reduces a set of numbers to a single numerical summary that can be used for testing. A test statistic shares some of the same qualities of a descriptive statistic, and many statistics can be used as both test statistics and descriptive statistics. However a test statistic is specifically intended for use in statistical testing, whereas the main quality of a descriptive statistic is that it is easily interpretable. Some informative descriptive statistics, such as the sample range, do not make good test statistics since it is difficult to determine their sampling distribution. Range statistics In descriptive statistics, the range is the length of the smallest interval which contains all the data. It is calculated by subtracting the smallest observation sample minimum from the greatest sample maximum and provides an indication of statistical dispersion. It is measured in the same units as the data. Since it only depends on two of the observations, it is a poor and weak measure of dispersion except when the sample size is large. The range, in the sense of the difference between the highest and lowest scores, is also called the crude range. When a new scale for measurement is developed, then a potential maximum or minimum will emanate from this scale. This is called the potential crude range. Of course this range should not be chosen too small, in order to avoid a ceiling effect. When the measurement is obtained, the resulting smallest or greatest observation, will provide the observed crude range. The midrange point, i. Again it is not particularly robust for small samples. Mathematical statistics Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis. The term "mathematical statistics" is closely related to the term " statistical theory " but also embraces modelling for actuarial science and non-statistical probability theory, particularly in Scandinavia. Statistics deals with gaining information from data. In practice, data often contain some randomness or uncertainty. Statistics handles such data using methods of probability theory. Introduction Statistical science is concerned with the planning of studies, especially with the design of randomized experiments and with the planning of surveys using random sampling. The initial analysis of the data from properly randomized studies often follows the study protocol. Of course, the data from a randomized study can be analyzed to consider secondary hypotheses or to suggest new ideas. A secondary analysis of the data from a planned study uses tools from data analysis. Data analysis is divided into: For example, inferential statistics involves selecting a model for the data, checking whether the data fulfill the conditions of a particular model, and with quantifying the involved uncertainty e. While the tools of data analysis work best on data from randomized studies, they are also applied to other kinds of data for example, from natural experiments and observational studies, in which case the inference is dependent on the model chosen by the statistician, and so subjective. Mathematical statistics has been inspired by and has extended many procedures in applied statistics. Statistics, mathematics, and mathematical statistics Mathematical statistics has substantial overlap with the discipline of statistics. Statistical theorists study and improve statistical procedures with mathematics, and statistical research often raises mathematical questions. Statistical theory relies on probability and decision theory. Mathematicians and statisticians like Gauss, Laplace, and C. Peirce used decision theory with probability distribution s and loss Page 4

5 function s or utility function s. The decision-theoretic approach to statistical inference was reinvigorated by Abraham Wald and his successors. From Yahoo Answers Question: What is a regular introductory Statistics class like?? Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. Page 5

6 Chapter 3 : Descriptive Statistics Descriptive statistics Although entering a large set of numbers into a spreadsheet can take some time, once they are there you can carry out a wide range of operations and calculations very easily and. May 5, by April Klazema Statistical analysis allows you to use math to reach conclusions about various situations. This type of analysis can be performed in several ways, but you will typically find yourself using both descriptive and inferential statistics in order to make a full analysis of a set of data. There are key differences between these two types of analysis, and using them both can aid you in getting accurate conclusions about your test subjects. So where is this type of analysis applicable? Believe it or not, most people use statistical analysis for many aspects of their life whether they realize it or not. You can really enhance your understanding of the world with just a little more understanding of statistics and how it works. If you want to learn all there is to know about the subject beyond the basics of descriptive and inferential statistics, then check out the Workshop in Probability Udemy course. Descriptive Statistics In order to understand the key differences between descriptive and inferential statistics, as well as know when to use them, you must first understand what each type of statistics does, and what it is used to analyze. Descriptive statistics is a form of analysis that helps you by describing, summarizing, or showing data in a meaningful way. Descriptive statistics only give us the ability to describe what is shown before us. You cannot draw any specific conclusions based on any hypothesis you have with just descriptive statistics. A good example would be grades for a body of students. Using just descriptive statistics, you can find patterns of the test scores, such as a small number of students get high and low test scores and a large number of students get average test scores. If you were to simply present the data as it is, then you would not be able to easily visualize what the data is trying to show or tell you. This is even more difficult when you have a lot of data to process. The first type is the Measures of Central Tendency. This type of statistics describes the central position for a frequency distribution when it comes to a specific group of data. In reference to the exam grades, finding out that the average test score is 77 would be a measure of the central tendencies of the data. This type of analysis helps people summarize data by describing the way in which the data is spread out. Look at the test scores again; the median score may be This form of analysis takes a look at these test scores and evaluates how many students made a score between 83 and, as well as how many made scores between 0 and The different ways in which to do this form of analysis includes finding things such as the absolute deviation, variance, quartiles, standard deviation, and the range. Some of these concepts may seem complicated, but you can learn statistics quickly and easily with Udemy. There are all sorts of courses and tutorials out there to help you master statistics. Inferential Statistics Above we explore descriptive analysis and it helps with a great amount of summarizing data. The examples regarding the test scores was an analysis of a population. When you use descriptive statistics, you have to have the entire population at your disposal, since descriptive analysis gives you the properties of the population as a whole, such the mean or the absolute deviation. Instead of getting the data from the entire school, you would take a small sample, such as the test scores that you already have. The technique you use for inferential statistics is a bit different from the ones you use with descriptive statistics. Inferential statistics involves you taking several samples and trying to find one that accurately represents the population as a whole. You then test that sample and use it to make generalizations about the entire population, which in this case is every student within the school. There are two methods used in inferential statistics: You can come to a close estimate of what the test scores of the population will be like, but you have no way of accurately knowing what the parameters of the test scores truly are without having the data yourself. As with descriptive statistics, you can learn to do inferential statistics with computer software. This can make things it a lot easier and will allow you to input data for a much larger set of numbers. This course teaches you everything you need to know about doing inferential statistics with the SPSS software. You will also get a step-by-step guide to help ensure that you are able to learn the concepts with ease. The Differences of Descriptive and Inferential Statistics Both descriptive and inferential statistics have their benefits and shortcomings. Descriptive statistics are great for a small population. As mentioned before, you have the Page 6

7 accuracy that you may want, but it is all limited to a very small population, at least in comparison to inferential statistics. You can make an educated guess on what the parameters of the entire population are, no matter how large it may be. Unfortunately, this does prevent you from having accurate data. Understanding Key Types of Statistics As easy as learning the concepts of statistics may seem, it can be a difficult thing for someone to apply in a real world situation. Having a few dozen pieces of data to analyze may not be much of a task, but when that data reaches into the hundreds or thousands, things can become a bit more difficult. It can teach you many of the key concepts of statistical analysis. Page 7

8 Chapter 4 : Statistic - Wikipedia Statistics: Descriptive- Chapter 2 Elementary Statistics: Picturing the World 5th edition Larson & Ferber Frequency Distribution & their graphs More Graphs and displays Measure of Central Tendency Measures of Variation Measure of Position. Better numbers exist to summarize location, association and spread: Statistics professors tend to gloss over basic descriptive statistics because they want to spend as much time as possible on margins of error and t-tests and regression. Forget what you think you know about descriptives and let me give you a whirlwind tour of the real stuff. The average The arithmetic mean is one of many measures of central tendency. One particularly useful feature of the mean is that, whenever we lack outside information like a scientific theory, it is our best possible guess for what to expect in the future. For example, some statistics about Homo sapiens: The problem with the arithmetic mean is that it does not correspond to anything or anyone, it just blends everything together. The median, on the other hand, can be interpreted as a typical sort of value. If you look at how tall the average adult human is, you will find a bump around cm and another around cm. In the case of human height, the answer is obvious: Once you split the data by gender, the bimodal distribution disappears. However, the typical adult human is not therefore a cm tall woman. The median of a dataset with two or more dimensions is not accurately represented by the median of each individual dimension. What you want is the centerpoint or the half-space. With very many dimensions, however, the concept of a central value becomes less and less useful. Interestingly, this is true not just for humans but for machines as well: The spread The standard deviation measures how spread out different values are. Why would you square something only to take its square root a couple steps later? We square the distances to the mean to make them positive Squaring is a mathematical hack: Easy differentiation is nice, but not terribly relevant when all you want to do is describe the spread of your data. The standard deviation lacks an easy interpretation. When communicating how far apart values are, use the mean absolute deviation or the median absolute deviation MAD. These statistics have the distinct advantage that they stand for what your audience will think they stand for. An acceptable substitute, also quite easy to interpret, is the interquartile range. Sort the data, put it into four bins of equal size, and return the lower and upper bound of the two bins in the middle, otherwise known as the first and third quartile. Half of your data is in between these goal posts. The interquartile range is the measure of spread you will usually see pictured as the box in a boxplot. The interquartile range is also sometimes communicated as a single number, the difference between the third and first quartiles. The location Statisticians and mathematicians are lazy, so instead of devising one statistical method that works for data with a mean of 2 and a variance of 5, and another statistical method for data with a mean of 23 and a variance of 8. These standardized numbers are called pivotal quantities, quantities that make no reference to the mean or variance or any other parameter of a statistical distribution, and they are used a lot in statistics. One such pivotal quantity is the z-score. To convert a dataset into z-scores, subtract the mean from each value and then divide each value by the standard deviation. This normalizes every value to a normal distribution with a mean of 0 and a standard deviation of 1. Once in that standardized format, you can run all kinds of statistical tests, in particular Wald tests. Normalized data is also useful when comparing things. If you took a test and got 15 out of 20 questions right, is that above or below average, and exactly how far above or below? Z-scores are great for statistical tests. As a basis for comparisons, they are flawed: A more easily interpretable number is the percentile rank. The 50th percentile is our good friend the median. Percentile refers to the actual value, percentile rank is the fraction of the data it corresponds to. You can calculate the percentile rank for any value in a dataset. As with the median, percentile ranks are immune to skew and kurtosis: Strangely, I almost never hear statisticians talk about z-scores but it pops up from time to time in news articles. The skew Data is skewed when it contains a disproportionate amount of small or large values, rather than the data being nicely spread out in both directions around the mean. If you graph the distribution, it will look lopsided, with the bulk of the data on one side and a long tail on the other. Negative skewness means the data is skewed to the left, which means it has a fat left tail, and positive skewness shows up as a fat right tail on a histogram. Skewness Page 8

9 is another statistic where I sometimes see non-statisticians trying to outdo statisticians. Skewness is a number that is used so little in statistics that even an experienced data scientist would have a hard time drawing a distribution of approximately the right shape if you gave them a skew statistic. How can we convey skewness if not through a statistic? For a technical audience, a QQ-plot can communicate how two distributions differ in shape. In every other situation, use a histogram. A histogram organizes the data into an arbitrary number of equal intervals, counts how many points fit in each interval, and plots those counts as a bar chart. It takes up more space than a number but you get to see the true shape of the data. In fact, analysis of just the bottom or top of a dataset opens you up to regression toward the mean, which will invalidate your conclusions. But there are moments when you do need a way to spot anomalies, perhaps to detect fraud or malfunctioning machines. It is common to look for outliers by identifying values that are more than 3 standard deviations from the mean. Intuitively this makes a lot of sense, because the standard deviation and the mean were probably the first things you calculated when you got the data, and a normal distribution has very little density at 3 standard deviations beyond the mean. However, x deviations from the mean is a self-defeating heuristic: To stick to anatomical examples: Instead of hunting for outliers per se, we leave out one observation from the model at a time and check whether this single observation affects the model parameters one way or the other, the idea being that something can only count as outlandish if it has an outlandish impact on how we see the world. Because it adds or removes the entire observation, with all of its component variables, this technique can detect highly unusual observations that at first sight look perfectly normal. The correlation Take a daily aspirin and you are less likely to succumb to a heart attack. Higher temperatures, maintained for longer, kill more bacteria. Machinery subjected to heavier loads will break down sooner. A relationship between any two variables is an association, an association between two quantities not gender or color but distance or weight is a correlation. Negative correlations simply mean that as one thing goes up, the other goes down. The longer folks have to wait for the bus, the worse their mood will be. Digging a little deeper, we see that the Pearson product-moment correlation is a measure of linear association. It works by drawing lines. Statisticians can do all sorts of crazy things with lines that make them not lines anymore while they get to pretend that they still are. The squiggly curves of polynomial regression still count as linear regression, for one. Fundamentally, though, a correlation is still just a line, and not every relationship between two variables can be captured by a linear relationship that states for each additional x, increase y by this amount. Toxins are generally harmless below a certain treshold and then very quickly become dangerous. Cheaper goods sell more, but below a certain price point other factors weigh more heavily on our purchasing decisions. Or you might remember from an intro to stats that taking the square root or the logarithm of the dependent variable in a hockey stick graph will straighten it out, and Pearson may live to fight another day. Not really though, Karl Pearson died in But why do you want a number at all? Instead, just draw a scatterplot, which shows the relationship in all its messy glory, no matter how bendy or how straight. Still not happy and absolutely want a number? You would do well to shun correlations even so. While statisticians are generally quite good at estimating a correlation from a picture and vice versa, most people are not. Communicate the linear relationship between two variables through its slope instead, the for each additional x, y will increase by this amount thing we mentioned earlier. To calculate the slope of a linear association, multiply the correlation by the standard deviation of the variable on the y-axis and divide it by the standard deviation of the variable on the x-axis. The discipline we call statistics is a two-headed beast. Descriptive statistics is the attempt to make sense of large amounts of data. Each observation brings its own ideosyncracies, so we must distill the data down to easier to read summaries, charts and comparisons between groups. Inferential statistics then takes these summaries and judges whether they are likely to hold true in general or whether they contain quirks, patterns that are particular to just your data. Descriptive statistics is when you ask five people and they all tell you coffee makes them sleepy. Means and medians are descriptive, hypotheses and margins of error are inferential. Statistics attracts people from many different backgrounds but above all it attracts mathematicians. Descriptive statistics is a matter of communication, cognition, numeracy, even user experience. Inferential statistics, on the other hand, is a theoretical delight. Similarly, much of Bayesian statistics relies on brute force simulation in lieu of the elegant little theorems of frequentist statistics. This sheds some light on the psychology of the statistician who turns Page 9

10 sour at the first mention of a posterior probability. The disdain of statisticians for descriptive work has contributed to a peculiar situation where innovations in visualization are generally the work of outsiders and fringe figures like John Tukey, William Cleveland and Edward Tufte. Another consequence is that the descriptive statistics we use so much â the mean, the standard deviation, the correlation â are our go-to numbers not because they are the nicest way to describe a dataset, but because they are useful building blocks for statistical inference. It would be nice to have numbers that can do double duty, statistics that work equally well for description and inference. Page 10

11 Chapter 5 : Descriptive and Inferential Statistics: How to Analyze Your Data In descriptive statistics, we simply state what the data shows and tells us. Interpreting the results and trends beyond this involves inferential statistics that is a separate branch altogether. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. Descriptive statistics are typically distinguished from inferential statistics. With descriptive statistics you are simply describing what is or what the data shows. With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Descriptive Statistics are used to present quantitative descriptions in a manageable form. In a research study we may have lots of measures. Or we may measure a large number of people on any measure. Descriptive statistics help us to simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary. For instance, consider a simple number used to summarize how well a batter is performing in baseball, the batting average. This single number is simply the number of hits divided by the number of times at bat reported to three significant digits. A batter who is hitting. The single number describes a large number of discrete events. This single number describes the general performance of a student across a potentially wide range of course experiences. Every time you try to describe a large set of observations with a single indicator you run the risk of distorting the original data or losing important detail. Even given these limitations, descriptive statistics provide a powerful summary that may enable comparisons across people or other units. Univariate Analysis Univariate analysis involves the examination across cases of one variable at a time. There are three major characteristics of a single variable that we tend to look at: The distribution is a summary of the frequency of individual values or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of persons who had each value. For instance, a typical way to describe the distribution of college students is by year in college, listing the number or percent of students at each of the four years. Or, we describe gender by listing the number or percent of males and females. In these cases, the variable has few enough values that we can list each one and summarize how many sample cases had the value. But what do we do for a variable like income or GPA? With these variables there can be a large number of possible values, with relatively few people having each one. In this case, we group the raw scores into categories according to ranges of values. For instance, we might look at GPA according to the letter grade ranges. Or, we might group income into four or five ranges of income values. One of the most common ways to describe a single variable is with a frequency distribution. Depending on the particular variable, all of the data values may be represented, or you may group the values into categories first e. Rather, the value are grouped into ranges and the frequencies determined. Frequency distributions can be depicted in two ways, as a table or as a graph. Table 1 shows an age frequency distribution with five categories of age ranges defined. The same frequency distribution can be depicted in a graph as shown in Figure 1. This type of graph is often referred to as a histogram or bar chart. Frequency distribution bar chart. Distributions may also be displayed using percentages. For example, you could use percentages to describe the: The central tendency of a distribution is an estimate of the "center" of a distribution of values. There are three major types of estimates of central tendency: Mean Median Mode The Mean or average is probably the most commonly used method of describing central tendency. To compute the mean all you do is add up all the values and divide by the number of values. For example, the mean or average quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values: The Median is the score found at the exact middle of the set of values. One way to compute the median is to list all scores in numerical order, and then locate the score in the center of the sample. For example, if there are scores in the list, score would be the median. If we order the 8 scores shown above, we would get: Since both of these scores are 20, the median is If the two middle scores had different values, you would have to interpolate to Page 11

12 determine the median. The mode is the most frequently occurring value in the set of scores. To determine the mode, you might again order the scores as shown above, and then count each one. The most frequently occurring value is the mode. In our example, the value 15 occurs three times and is the model. In some distributions there is more than one modal value. For instance, in a bimodal distribution there are two values that occur most frequently. Notice that for the same set of 8 scores we got three different values -- If the distribution is truly normal i. Dispersion refers to the spread of the values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. The Standard Deviation is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range as was true in this example where the single outlier value of 36 stands apart from the rest of the values. The Standard Deviation shows the relation that set of scores has to the mean of the sample. Again lets take the set of scores: We know from above that the mean is So, the differences from the mean are: Page 12

13 Chapter 6 : Comparative Study Between Descriptive Statistics authorstream Statistical analysis allows you to use math to reach conclusions about various situations. This type of analysis can be performed in several ways, but you will typically find yourself using both descriptive and inferential statistics in order to make a full analysis of a set of data. There are key. The first subject received Treatment 1, and had Outcome 1. X and Y are the values of two measurements on each subject. We were unable to get a measurement for Y on the second subject, or on X for the last subject, so these cells are blank. The subjects are entered in the order that the data became available, so the data is not ordered in any particular way. We used this data to do some simple analyses and compared the results with a standard statistical package. The comparison considered the accuracy of the results as well as the ease with which the interface could be used for bigger data sets - i. It includes a variety of choices including simple descriptive statistics, t-tests, correlations, 1 or 2-way analysis of variance, regression, etc. Two other Excel features are useful for certain analyses, but the Data Analysis tool pack is the only one that provides reasonably complete tests of statistical significance. Pivot Table in the Data menu can be used to generate summary tables of means, standard deviations, counts, etc. Also, you could use functions to generate some statistical measures, such as a correlation coefficient. Functions generate a single number, so using functions you will likely have to combine bits and pieces to get what you want. Even so, you may not be able to generate all the parts you need for a complete analysis. In order to check a variety of statistical tests, we chose the following tasks: Get means and standard deviations of X and Y for the entire group, and for each treatment group. Get the correlation between X and Y. Do a two sample t-test to test whether the two treatment groups differ on X and Y. Do a paired t-test to test whether X and Y are statistically different from each other. Compare the number of subjects with each outcome by treatment group, using a chi-squared test. All of these tasks are routine for a data set of this nature, and all of them could be easily done using any of the aobve listed statistical packages. Look in the Tools menu. If you do not have a Data Analysis item, you will need to install the Data Analysis tools. Search Help for "Data Analysis Tools" for instructions. Missing Values A blank cell is the only way for Excel to deal with missing data. If you have any other missing value codes, you will need to change them to blanks. Data Arrangement Different analyses require the data to be arranged in various ways. If you plan on a variety of different tests, there may not be a single arrangement that will work. You will probably need to rearrange the data several ways to get everything you need. The typical dialog box will have the following items: Type the upper left and lower right corner cells. You can only choose adjacent rows and columns. Unless there is a checkbox for grouping data by rows or columns and there usually is not, all the data is considered as one glop. Labels - There is sometimes a box you can check off to indicate that the first row of your sheet contains labels. If you have labels in the first row, check this box, and your output MAY be labeled with your label. Then again, it may not. Output location - New Sheet is the default. Or, type in the cell address of the upper left corner of where you want to place the output in the current sheet. New Worksheet is another option, which I have not tried. Ramifications of this choice are discussed below. Other items, depending on the analysis. Output location The output from each analysis can go to a new sheet within your current Excel file this is the default, or you can place it within the current sheet by specifying the upper left corner cell where you want it placed. Either way is a bit of a nuisance. If each output is in a new sheet, you end up with lots of sheets, each with a small bit of output. You will want to make this column wide in order to be able to read the labels. But if a simple Frequency output is right underneath, then the column displaying the values being counted, which may just contain small integers, will also be wide. Results of Analyses Descriptive Statistics The quickest way to get means and standard deviations for a entire group is using Descriptives in the Data Analysis tools. You can choose several adjacent columns for the Input Range in this case the X and Y columns, and each column is analyzed separately. The labels in the first row are used to label the output, and the empty cells are ignored. If you have more, non-adjacent columns you need to analyze, you will have to repeat the process for each group of contiguous columns. The procedure is straightforward, can manage many columns reasonably efficiently, and empty cells are treated properly. To get the means and Page 13

14 standard deviations of X and Y for each treatment group requires the use of Pivot Tables unless you want to rearrange the data sheet to separate the two groups. Finally, drag X in one more time, leaving it as Count of X. This will give us the Average, standard deviation and number of observations in each treatment group for X. Do the same for Y, so we will get the average, standard deviation and number of observations for Y also. This will put a total of six items in the Data box three for X and three for Y. As you can see, if you want to get a variety of descriptive statistics for several variables, the process will get tedious. A statistical package lets you choose as many variables as you wish for descriptive statistics, whether or not they are contiguous. You can get the descriptive statistics for all the subjects together, or broken down by a categorical variable such as treatment. You can select the statistics you want to see once, and it will apply to all variables chosen. Correlations Using the Data Analysis tools, the dialog for correlations is much like the one for descriptives - you can choose several contiguous columns, and get an output matrix of all pairs of correlations. Empty cells are ignored appropriately. The output does NOT include the number of pairs of data points used to compute each correlation which can vary, depending on where you have missing data, and does not indicate whether any of the correlations are statistically significant. If you want correlations on non-contiguous columns, you would either have to include the intervening columns, or copy the desired columns to a contiguous location. A statistical package would permit you to choose non-contiguous columns for your correlations. The output would tell you how many pairs of data points were used to compute each correlation, and which correlations are statistically significant. Two-Sample T-test This test can be used to check whether the two treatment groups differ on the values of either X or Y. In order to do the test you need to enter a cell range for each group. Since the data were not entered by treatment group, we first need to sort the rows by treatment. Be sure to take all the other columns along with treatment, so that the data for each subject remains intact. After the data is sorted, you can enter the range of cells containing the X measurements for each treatment. Do not include the row with the labels, because the second group does not have a label row. Therefore your output will not be labeled to indicate that this output is for X. If you want the output labeled, you have to copy the cells corresponding to the second group to a separate column, and enter a row with a label for the second group. The empty cells are ignored, and other than the problems with labeling the output, the results are correct. A statistical package would do this task without any need to sort the data or copy it to another column, and the output would always be properly labeled to the extent that you provide labels for your variables and treatment groups. It would also allow you to choose more than one variable at a time for the t-test e. Paired t-test The paired t-test is a method for testing whether the difference between two measurements on the same subject is significantly different from 0. In this example, we wish to test the difference between X and Y measured on the same subject. The important feature of this test is that it compares the measurements within each subject. If you scan the X and Y columns separately, they do not look obviously different. But if you look at each X-Y pair, you will notice that in every case, X is greater than Y. The paired t-test should be sensitive to this difference. In the two cases where either X or Y is missing, it is not possible to compare the two measures on a subject. Hence, only 8 rows are usable for the paired t-test. When you run the paired t-test on this data, you get a t-statistic of 0. The test does not find any significant difference between X and Y. Looking at the output more carefully, we notice that it says there are 9 observations. As noted above, there should only be 8. It appears that Excel has failed to exclude the observations that did not have both X and Y measurements. To get the correct results copy X and Y to two new columns and remove the data in the cells that have no value for the other measure. Now re-run the paired t-test. This time the t-statistic is 6. The conclusion is completely different! Of course, this is an extreme example. But the point is that Excel does not calculate the paired t-test correctly when some observations have one of the measurements but not the other. Although it is possible to get the correct result, you would have no reason to suspect the results you get unless you are sufficiently alert to notice that the number of observations is wrong. There is nothing in online help that would warn you about this issue. Apparently the functions and the Data Analysis tools are not consistent in how they deal with missing cells. Nevertheless, I cannot recommend the use of functions in preference to the Data Analysis tools, because the result of using a function is a single number - in this case, the 2-tail probability of the t-statistic. The function does not give you the t-statistic itself, the degrees of freedom, or any Page 14

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. MODULE 02: DESCRIBING DT SECTION C: KEY POINTS C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. C-2:

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Descriptive Statistics Lecture

Descriptive Statistics Lecture Definitions: Lecture Psychology 280 Orange Coast College 2/1/2006 Statistics have been defined as a collection of methods for planning experiments, obtaining data, and then analyzing, interpreting and

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 5, 6, 7, 8, 9 10 & 11)

More information

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017 Essential Statistics for Nursing Research Kristen Carlin, MPH Seattle Nursing Research Workshop January 30, 2017 Table of Contents Plots Descriptive statistics Sample size/power Correlations Hypothesis

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of numbers. Also, students will understand why some measures

More information

Undertaking statistical analysis of

Undertaking statistical analysis of Descriptive statistics: Simply telling a story Laura Delaney introduces the principles of descriptive statistical analysis and presents an overview of the various ways in which data can be presented by

More information

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)

More information

Variability. After reading this chapter, you should be able to do the following:

Variability. After reading this chapter, you should be able to do the following: LEARIG OBJECTIVES C H A P T E R 3 Variability After reading this chapter, you should be able to do the following: Explain what the standard deviation measures Compute the variance and the standard deviation

More information

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive

More information

Lesson 9 Presentation and Display of Quantitative Data

Lesson 9 Presentation and Display of Quantitative Data Lesson 9 Presentation and Display of Quantitative Data Learning Objectives All students will identify and present data using appropriate graphs, charts and tables. All students should be able to justify

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

Chapter 2--Norms and Basic Statistics for Testing

Chapter 2--Norms and Basic Statistics for Testing Chapter 2--Norms and Basic Statistics for Testing Student: 1. Statistical procedures that summarize and describe a series of observations are called A. inferential statistics. B. descriptive statistics.

More information

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj Statistical Techniques Masoud Mansoury and Anas Abulfaraj What is Statistics? https://www.youtube.com/watch?v=lmmzj7599pw The definition of Statistics The practice or science of collecting and analyzing

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5) 1 Summarizing Data (Ch 1.1, 1.3, 1.10-1.13, 2.4.3, 2.5) Populations and Samples An investigation of some characteristic of a population of interest. Example: You want to study the average GPA of juniors

More information

Introduction to Statistical Data Analysis I

Introduction to Statistical Data Analysis I Introduction to Statistical Data Analysis I JULY 2011 Afsaneh Yazdani Preface What is Statistics? Preface What is Statistics? Science of: designing studies or experiments, collecting data Summarizing/modeling/analyzing

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

Statistics for Psychology

Statistics for Psychology Statistics for Psychology SIXTH EDITION CHAPTER 3 Some Key Ingredients for Inferential Statistics Some Key Ingredients for Inferential Statistics Psychologists conduct research to test a theoretical principle

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5). Announcement Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5). Political Science 15 Lecture 8: Descriptive Statistics (Part 1) Data

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet

Standard Deviation and Standard Error Tutorial. This is significantly important. Get your AP Equations and Formulas sheet Standard Deviation and Standard Error Tutorial This is significantly important. Get your AP Equations and Formulas sheet The Basics Let s start with a review of the basics of statistics. Mean: What most

More information

Chapter 12. The One- Sample

Chapter 12. The One- Sample Chapter 12 The One- Sample z-test Objective We are going to learn to make decisions about a population parameter based on sample information. Lesson 12.1. Testing a Two- Tailed Hypothesis Example 1: Let's

More information

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data

Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data TECHNICAL REPORT Data and Statistics 101: Key Concepts in the Collection, Analysis, and Application of Child Welfare Data CONTENTS Executive Summary...1 Introduction...2 Overview of Data Analysis Concepts...2

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 4 Chapter 2 CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 1. A. Age: name/interval; military dictatorship: value/nominal; strongly oppose: value/ ordinal; election year: name/interval; 62 percent: value/interval;

More information

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months? Medical Statistics 1 Basic Concepts Farhad Pishgar Defining the data Population and samples Except when a full census is taken, we collect data on a sample from a much larger group called the population.

More information

Appendix B Statistical Methods

Appendix B Statistical Methods Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon

More information

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010

OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010 OCW Epidemiology and Biostatistics, 2010 David Tybor, MS, MPH and Kenneth Chui, PhD Tufts University School of Medicine October 27, 2010 SAMPLING AND CONFIDENCE INTERVALS Learning objectives for this session:

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics : Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different

More information

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points.

Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. Part 1. For each of the following questions fill-in the blanks. Each question is worth 2 points. 1. The bell-shaped frequency curve is so common that if a population has this shape, the measurements are

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Structure

More information

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI Statistics Nur Hidayanto PSP English Education Dept. RESEARCH STATISTICS WHAT S THE RELATIONSHIP? RESEARCH RESEARCH positivistic Prepositivistic Postpositivistic Data Initial Observation (research Question)

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Descriptive statistics

Descriptive statistics CHAPTER 3 Descriptive statistics 41 Descriptive statistics 3 CHAPTER OVERVIEW In Chapter 1 we outlined some important factors in research design. In this chapter we will be explaining the basic ways of

More information

Examining differences between two sets of scores

Examining differences between two sets of scores 6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you

More information

Unit 7 Comparisons and Relationships

Unit 7 Comparisons and Relationships Unit 7 Comparisons and Relationships Objectives: To understand the distinction between making a comparison and describing a relationship To select appropriate graphical displays for making comparisons

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Measuring the User Experience

Measuring the User Experience Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide

More information

3.2 Least- Squares Regression

3.2 Least- Squares Regression 3.2 Least- Squares Regression Linear (straight- line) relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these

More information

An Introduction to Research Statistics

An Introduction to Research Statistics An Introduction to Research Statistics An Introduction to Research Statistics Cris Burgess Statistics are like a lamppost to a drunken man - more for leaning on than illumination David Brent (alias Ricky

More information

Two-Way Independent ANOVA

Two-Way Independent ANOVA Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

Statisticians deal with groups of numbers. They often find it helpful to use

Statisticians deal with groups of numbers. They often find it helpful to use Chapter 4 Finding Your Center In This Chapter Working within your means Meeting conditions The median is the message Getting into the mode Statisticians deal with groups of numbers. They often find it

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Chapter 1: Introduction to Statistics Variables A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

Gage R&R. Variation. Allow us to explain with a simple diagram.

Gage R&R. Variation. Allow us to explain with a simple diagram. Gage R&R Variation We ve learned how to graph variation with histograms while also learning how to determine if the variation in our process is greater than customer specifications by leveraging Process

More information

Section 3.2 Least-Squares Regression

Section 3.2 Least-Squares Regression Section 3.2 Least-Squares Regression Linear relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these relationships.

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments

Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Using Analytical and Psychometric Tools in Medium- and High-Stakes Environments Greg Pope, Analytics and Psychometrics Manager 2008 Users Conference San Antonio Introduction and purpose of this session

More information

MEASURES OF ASSOCIATION AND REGRESSION

MEASURES OF ASSOCIATION AND REGRESSION DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 MEASURES OF ASSOCIATION AND REGRESSION I. AGENDA: A. Measures of association B. Two variable regression C. Reading: 1. Start Agresti

More information

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures

Stats 95. Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures Stats 95 Statistical analysis without compelling presentation is annoying at best and catastrophic at worst. From raw numbers to meaningful pictures Stats 95 Why Stats? 200 countries over 200 years http://www.youtube.com/watch?v=jbksrlysojo

More information

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

Chapter 20: Test Administration and Interpretation

Chapter 20: Test Administration and Interpretation Chapter 20: Test Administration and Interpretation Thought Questions Why should a needs analysis consider both the individual and the demands of the sport? Should test scores be shared with a team, or

More information

Statistics is a broad mathematical discipline dealing with

Statistics is a broad mathematical discipline dealing with Statistical Primer for Cardiovascular Research Descriptive Statistics and Graphical Displays Martin G. Larson, SD Statistics is a broad mathematical discipline dealing with techniques for the collection,

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,

More information

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis: Section 1.0 Making Sense of Data Statistics: Data Analysis: Individuals objects described by a set of data Variable any characteristic of an individual Categorical Variable places an individual into one

More information

Statistics: Making Sense of the Numbers

Statistics: Making Sense of the Numbers Statistics: Making Sense of the Numbers Chapter 9 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display, including

More information

V. Gathering and Exploring Data

V. Gathering and Exploring Data V. Gathering and Exploring Data With the language of probability in our vocabulary, we re now ready to talk about sampling and analyzing data. Data Analysis We can divide statistical methods into roughly

More information

Statistics: Interpreting Data and Making Predictions. Interpreting Data 1/50

Statistics: Interpreting Data and Making Predictions. Interpreting Data 1/50 Statistics: Interpreting Data and Making Predictions Interpreting Data 1/50 Last Time Last time we discussed central tendency; that is, notions of the middle of data. More specifically we discussed the

More information

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego Biostatistics Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego (858) 534-1818 dsilverstein@ucsd.edu Introduction Overview of statistical

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

Welcome to OSA Training Statistics Part II

Welcome to OSA Training Statistics Part II Welcome to OSA Training Statistics Part II Course Summary Using data about a population to draw graphs Frequency distribution and variability within populations Bell Curves: What are they and where do

More information

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D. This guide contains a summary of the statistical terms and procedures. This guide can be used as a reference for course work and the dissertation process. However, it is recommended that you refer to statistical

More information

Before we get started:

Before we get started: Before we get started: http://arievaluation.org/projects-3/ AEA 2018 R-Commander 1 Antonio Olmos Kai Schramm Priyalathta Govindasamy Antonio.Olmos@du.edu AntonioOlmos@aumhc.org AEA 2018 R-Commander 2 Plan

More information

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA 15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA Statistics does all kinds of stuff to describe data Talk about baseball, other useful stuff We can calculate the probability.

More information

1) What is the independent variable? What is our Dependent Variable?

1) What is the independent variable? What is our Dependent Variable? 1) What is the independent variable? What is our Dependent Variable? Independent Variable: Whether the font color and word name are the same or different. (Congruency) Dependent Variable: The amount of

More information

Displaying the Order in a Group of Numbers Using Tables and Graphs

Displaying the Order in a Group of Numbers Using Tables and Graphs SIXTH EDITION 1 Displaying the Order in a Group of Numbers Using Tables and Graphs Statistics (stats) is a branch of mathematics that focuses on the organization, analysis, and interpretation of a group

More information

Intro to SPSS. Using SPSS through WebFAS

Intro to SPSS. Using SPSS through WebFAS Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800

More information

MODULE S1 DESCRIPTIVE STATISTICS

MODULE S1 DESCRIPTIVE STATISTICS MODULE S1 DESCRIPTIVE STATISTICS All educators are involved in research and statistics to a degree. For this reason all educators should have a practical understanding of research design. Even if an educator

More information

STP226 Brief Class Notes Instructor: Ela Jackiewicz

STP226 Brief Class Notes Instructor: Ela Jackiewicz CHAPTER 2 Organizing Data Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that can be assigned a numerical value or nonnumerical

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES CHAPTER SIXTEEN Regression NOTE TO INSTRUCTORS This chapter includes a number of complex concepts that may seem intimidating to students. Encourage students to focus on the big picture through some of

More information

STATISTICS & PROBABILITY

STATISTICS & PROBABILITY STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness

More information

MTH 225: Introductory Statistics

MTH 225: Introductory Statistics Marshall University College of Science Mathematics Department MTH 225: Introductory Statistics Course catalog description Basic probability, descriptive statistics, fundamental statistical inference procedures

More information