DOWNLOAD PDF SUMMARIZING AND INTERPRETING DATA : USING STATISTICS

Size: px
Start display at page:

Download "DOWNLOAD PDF SUMMARIZING AND INTERPRETING DATA : USING STATISTICS"

Transcription

1 Chapter 1 : Summarizing Numerical Data Sets Worksheets Stem and Leaf Activity Sheets with Answers. Students first create the stem and leaf plot. Then they use it to answer questions. This is a great way to see how stem and leaf plots help us make sense of data quickly. Variance Boxplot A boxplot provides a graphical summary of the distribution of a sample. The boxplot shows the shape, central tendency, and variability of the data. Interpretation Use a boxplot to examine the spread of the data and to identify any potential outliers. Boxplots are best when the sample size is greater than Skewed data Examine the shape of your data to determine whether your data appear to be skewed. When data are skewed, the majority of the data are located on the high or low side of the graph. Often, skewness is easiest to detect with a histogram or boxplot. Right-skewed Left-skewed The boxplot with right-skewed data shows wait times. Most of the wait times are relatively short, and only a few wait times are long. The boxplot with left-skewed data shows failure time data. A few items fail immediately, and many more items fail later. Outliers Outliers, which are data values that are far away from other data values, can strongly affect the results of your analysis. Often, outliers are easiest to identify on a boxplot. Try to identify the cause of any outliers. Correct any dataâ entry errors or measurement errors. Consider removing data values for abnormal, one-time events also called special causes. Then, repeat the analysis. For more information, go to Identifying outliers. CoefVar The coefficient of variation CoefVar is a measure of spread that describes the variation in the data relative to the mean. The coefficient of variation is adjusted so that the values are on a unitless scale. Because of this adjustment, you can use the coefficient of variation instead of the standard deviation to compare the variation in data that have different units or that have very different means. Interpretation The larger the coefficient of variation, the greater the spread in the data. For example, you are the quality control inspector at a milk bottling plant that bottles small and large containers of milk. You take a sample of each product and observe that the mean volume of the small containers is 1 cup with a standard deviation of 0. Although the standard deviation of the gallon container is five times greater than the standard deviation of the small container, their coefficients of variation support a different conclusion. In other words, although the large container has a greater standard deviation, the small container has much more variability relative to its mean. For this ordered data, the first quartile Q1 is 9. Histogram, with normal curve A histogram divides sample values into many intervals and represents the frequency of data values in each interval with a bar. Interpretation Use a histogram to assess the shape and spread of the data. Histograms are best when the sample size is greater than You can use a histogram of the data overlaid with a normal curve to examine the normality of your data. A normal distribution is symmetric and bell-shaped, as indicated by the curve. It is often difficult to evaluate normality with small samples. A probability plot is best for determining the distribution fit. Good fit Poor fit Individual value plot An individual value plot displays the individual values in the sample. Each circle represents one observation. An individual value plot is especially useful when you have relatively few observations and when you also need to assess the effect of each observation. Interpretation Use an individual value plot to examine the spread of the data and to identify any potential outliers. Individual value plots are best when the sample size is less than Right-skewed Left-skewed The individual value plot with right-skewed data shows wait times. The individual value plot with left-skewed data shows failure time data. On an individual value plot, unusually low or high data values indicate possible outliers. For this ordered data, the interquartile range is 8 Interpretation Use the interquartile range to describe the spread of the data. As the spread of the data increases, the IQR becomes larger. Kurtosis Kurtosis indicates how the peak and tails of a distribution differ from the normal distribution. Interpretation Use kurtosis to initially understand general characteristics about the distribution of your data. Kurtosis value of 0 Normally distributed data establish the baseline for kurtosis. A kurtosis value of 0 indicates that the data follow the normal distribution perfectly. A kurtosis value that significantly deviates from 0 may indicate that the data are not normally distributed. Positive kurtosis A distribution that has a positive kurtosis value indicates that the distribution has heavier tails Page 1

2 and a sharper peak than the normal distribution. For example, data that follow a t-distribution have a positive kurtosis value. The solid line shows the normal distribution, and the dotted line shows a distribution that has a positive kurtosis value. Negative kurtosis A distribution with a negative kurtosis value indicates that the distribution has lighter tails and a flatter peak than the normal distribution. For example, data that follow a beta distribution with first and second shape parameters equal to 2 have a negative kurtosis value. The solid line shows the normal distribution and the dotted line shows a distribution that has a negative kurtosis value. Maximum The maximum is the largest data value. In these data, the maximum is Page 2

3 Chapter 2 : ViSta: The Visual Statistics System Exam 3 (Chapter 7: Summarizing and Interpreting Data: Using Statistics) study guide by sec88 includes 72 questions covering vocabulary, terms and more. Quizlet flashcards, activities and games help you improve your grades. More explicitly, exactly half of the values in the group are smaller than the median, and the other half of the values in the group are greater than the median. If there are an odd number of measurements, the median is simply equal to the middle value of the group, when the values are arranged in ascending order. If there are an even number of measurements as here, the median is equal to the mean of the two middle values again, when the values are arranged in ascending order. For the "without compost" group, the median is equal to the mean of the values of the 3rd and 4th values, which happen to be 4 and 5: Notice that, by definition, three of the values 3, 4, and 4 are less than the median, and the other three values 5, 6, and 8 are greater than the median. What is the median of the "with compost" group The mode is the value that appears most frequently in the group of measurements. For the "without compost" group, the mode is 4, because that value is repeated twice, while all of the other values are only represented once. What is the mode of the "with compost" group? It is entirely possible for a group of data to have no mode at all, or for it to have more than one mode. If all values occur with the same frequency for example, if all values occur only once, then the group has no mode. If more than one value occurs at the highest frequency, then each of those values is a mode. Here is an example of a group of raw data with two modes: The two modes of this data set are 26 and 41, since each of those values appears twice, while all the other values appear only once. A data set with two modes is sometimes called "bimodal. Mean, Median, or Mode: Which Measure Should I Use? When would you choose to use one in preference to another? The following illustration shows the mean, median, and mode of the "without compost" data sample on a graph. The x-axis shows the number of leaves per plant. The height of each bar y-axis shows the number of plants that had a certain number of leaves. Compare the graph with the data in the table, and you will see that all of the raw data values are shown in the graph. This graph shows why the mean, median, and mode are all called measures of central tendency. The data values are spread out across the horizontal axis of the graph, but the mean, median, and mode are all clustered towards the center. Each one is a slightly different measure of what happened "on average" in the experiment. The mode 4 shows which number of leaves per plant occurred most frequently. The mean 5 is the arithmetic average of all the data points. In general, the mean is the descriptive statistic most often used to describe the central tendency of a group of measurements. Of the three measures, it is the most sensitive measurement, because its value always reflects the contributions of each of the data values in the group. The median and the mode are less sensitive to "outliers"â data values at the extremes of a group. Imagine that, for the "without compost" group, the plant with the greatest number of leaves had 11 leaves, not 8. Both the median and the mode would remain unchanged. Check for yourself and confirm that this is true. The mean, however, would now be 5. On the other hand, sometimes it is an advantage to have a measure of central tendency that is less sensitive to changes in the extremes of the data. For example, if your data set contains a small number of outliers at one extreme, the median may be a better measure of the central tendency of the data than the mean. If your results involve categories instead of continuous numbers, then the best measure of central tendency will probably be the most frequent outcome the mode. For example, imagine that you conducted a survey on the most effective way to quit smoking. A reasonable measure of the central tendency of your results would be the method that works most frequently, as determined from your survey. It is important to think about what you are trying to accomplish with descriptive statistics, not just use them blindly. If your data contains more than one mode, then summarizing them with a simple measure of central tendency such as the mean or median will obscure this fact. First, what are you trying to describe? Second, what does your data look like? Then, the best measure of central tendency is Groups, or classes of things. Survey results often fall in this category, such as, "What is the most effective way to quit smoking? Position on a ranking scale, such as: The median movie ranking in Page 3

4 this survey was 2. Measures on a linear scale e. The shape of this data is approximately the same on the left and the right side of the graph, so we call this symmetrical data. For symmetrical data, the mean is the best measurement of central tendency. Notice how the data in this graph is non-symmetrical. The peak of the data is not centered, and the body mass values fall off more sharply on the left of the peak than on the right. When the peak is shifted like this to one side or the other, we call it skewed data. For skewed data, the median is the best choice to measure central tendency. Notice how this graph has two peaks. We call data with two prominent peaks bimodal data. In the case of a bimodal distribution, you may have two populations, each with its own separate central tendency. Notice how this graph has three peaks and lots of overlap between the tails of the peaks. We call this multimodal data. There is no single central tendency. It is easiest to describe data like this by referring to the graph. In this case, the data is scattered all over the place. In some cases, this may indicate that you need to collect more data. In this case there is no central tendency. Range, Variance, and Standard Deviation Measures of central tendency describe the "average" of a data set. Another important quality to measure is the "spread" of a data set. For example, these two data sets both have the same mean 5: For which data set would you feel more comfortable using the average description of "5"? It would be nice to have another measure to describe the "spread" of a data set. Such a measure could let us know at a glance whether the values in a data set are generally close to or far from the mean. The descriptive statistics that measure the quality of scatter are called measures of dispersion. When added to the measures of central tendency discussed previously, measures of dispersion give a more complete picture of the data set. We will discuss three such measurements: Range The range of a data set is the simplest of the three measures. The range is defined by the smallest and largest data values in the set. The range gives only minimal information about the spread of the data, by defining the two extremes. It says nothing about how the data are distributed between those two endpoints. Two other related measures of dispersion, the variance and the standard deviation, provide a numerical summary of how much the data are scattered. When printing this document, you may NOT modify it in any way. For any other use, please contact Science Buddies. Page 4

5 Chapter 3 : Summarizing Data The term statistics refers to the analysis and interpretation of this numerical data. Psychologists use statistics to organize, summarize, and interpret the information they collect. Psychologists use statistics to organize, summarize, and interpret the information they collect. The median is another measure of central tendency. To get the median you have to order the data from lowest to highest. The median is the number in the middle. If the number of cases is odd the median is the single value, for an even number of cases the median is the average of the two numbers in the middle. The excel function is: By age there are more students 19 years old in the sample than any other group. The sample variance measures the dispersion of the data from the mean. It is the simple mean of the squared distance from the mean. It is calculated by: Indicates how close the data is to the mean. The excel formula is: It is a roughly test for normality in the data by dividing it by the SE. If it is positive there is more data on the left side of the curve right skewed, the median and the mode are lower than the mean. A negative value indicates that the mass of the data is concentrated on the right of the curve left tail is longer, left skewed, the median and the mode are higher than the mean. A normal distribution has a skew of 0. Skewness can also be estimated with the following function: The current view of kurtosis argues that it measures the peak of a distribution. According to Peter Westfall, that view is not quite correct. High kurtosis may suggest the presence of outliers. Technically speaking, kurtosis focuses more on the tails for the distribution than the peak, so positive kurtosis indicates too few cases in the tails or a tall distribution leptokurtic, negative kurtosis too many cases in the tails or a flat distribution platykurtic. A normal distribution has a kurtosis of 0 given a correction of â 3, otherwise it will have a kurtosis of 3. The excel function for kurtosis is: Exploring data using pivot tables To explore the data by groups you can sort the columns for the variables you want for example gender, or major or country, etc. You can also use pivot tables. In step 2 select the range for the range of all values as in the following picture: On the right side of the wizard layout you can see the list of all variables in the data. The wizard layout should look like this: The wizard layout should look like this. This is a crosstabulation between gender and major. Each cell represents the average SAT score for a student according to gender and major. For example a female student with an econ major has an average SAT score of cell B5 in the picture while a male student also with an econ major has B6. Overall econ major students have an average SAT score of B7. In general, female students have an average SAT score in this sample of For more information on pivot tables go to the following site. Page 5

6 Chapter 4 : Interpret all statistics and graphs for Descriptive Statistics - Minitab Express The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions. Descriptive statistics methods of organizing, summarizing and presenting data in an informative way (describes raw data, chart/graph). A specific quantile or percentile is a value in the data set that holds a specific percentage of the values at or below it. The median is the 50th percentile, the third quartile is the 75th percentile and the maximum is the th percentile i. A box-whisker plot is a graphical display of these percentiles. The horizontal lines represent from the top the maximum, the third quartile, the median also indicated by the dot, the first quartile and the minimum. A box-whisker plot is meant to convey the distribution of a variable at a quick glance. Recall that in the full sample we determined that there were outliers both at the low and the high end See Table In Figure 12 the outliers are displayed as horizontal lines at the top and bottom of the distribution. At the low end of the distribution, there are 5 values that are considered outliers i. At the high end of the distribution, there are 12 values that are considered outliers i. The "whiskers" of the plot boldfaced horizontal brackets are the limits we determined for detecting outliers Figure 13 below shows side-by-side box-whisker plots of the distributions of weights, in pounds, for men and women in the Framingham Offspring Study. The figure clearly shows a shift in the distributions with men having much higher weights. In fact, the 25th percentile of the weights in men is approximately pounds and equal to the 75th percentile in women. There are many outliers at the high end of the distribution among both men and women. There are two outlying low values among men. There are again many outliers in the distributions in both men and women. However, when taking height into account by comparing body mass index instead of comparing weights alone, we see that the most extreme outliers are among the women. Some statistical computing packages use the following to determine outliers: Summary The first important aspect of any statistical analysis is an appropriate summary of the key analytic variables. This involves first identifying the type of variable being analyzed. This step is extremely important as the appropriate numerical and graphical summaries depend on the type of variable being analyzed. Variables are dichotomous, ordinal, categorical or continuous. The best numerical summaries for dichotomous, ordinal and categorical variables involve relative frequencies. The best numerical summaries for continuous variables include the mean and standard deviation or the median and interquartile range, depending on whether or not there are outliers in the distribution. The mean and standard deviation or the median and interquartile range summarize central tendency also called location and dispersion, respectively. The best graphical summary for dichotomous and categorical variables is a bar chart and the best graphical summary for an ordinal variable is a histogram. Both bar charts and histograms can be designed to display frequencies or relative frequencies, with the latter being the more popular display. Box-whisker plots provide a very useful and informative summary for continuous variables. Box-whisker plots are also useful for comparing the distributions of a continuous variable among mutually exclusive i. The following table summarizes key statistics and graphical displays organized by variable type. Page 6

7 Chapter 5 : Statistics science blog.quintoapp.com Descriptive statistics are useful for describing the basic features of data, for example, the summary statistics for the scale variables and measures of the data. In a research study with large data, these statistics may help us to manage the data and present it in a summary table. For instance, in. The first step of data analysis is to accurately summarize all of this data, both graphically and numerically, so that we can understand what the data reveals. To be able to use and interpret the data correctly is essential to making informed decisions. For instance, when you see a survey of opinion about a certain TV program, you may be interested in the proportion of those people who indeed like the program. In this unit, you will learn about descriptive statistics, which are used to summarize and display data. After completing this unit, you will know how to present your findings once you have collected data. For example, suppose you want to buy a new mobile phone with a particular type of a camera. Suppose you are not sure about the prices of any of the phones with this feature, so you access a website that provides you with a sample data set of prices, given your desired features. Looking at all of the prices in a sample can sometimes be confusing. A better way to compare this data might be to look at the median price and the variation of prices. The median and variation are two ways out of several ways that you can describe data. You can also graph the data so that it is easier to see what the price distribution looks like. In this unit, you will study precisely this; namely, you will learn both numerical and graphical ways to describe and display your data. You will understand the essentials of calculating common descriptive statistics for measuring center, variability, and skewness in data. You will learn to calculate and interpret these measurements and graphs. Descriptive statistics are, as their name suggests, descriptive. They do not generalize beyond the data considered. Descriptive statistics illustrate what the data shows. Numerical descriptive measures computed from data are called statistics. Numerical descriptive measures of the population are called parameters. Inferential statistics can be used to generalize the findings from sample data to a broader population. Completing this unit should take you approximately 22 hours. Elements of Probability and Random Variables Probabilities affect our everyday lives. In this unit, you will learn about probability and its properties, how probability behaves, and how to calculate and use it. You will study the fundamentals of probability and will work through examples that cover different types of probability questions. These basic probability concepts will provide a foundation for understanding more statistical concepts, for example, interpreting polling results. Though you may have already encountered concepts of probability, after this unit, you will be able to formally and precisely predict the likelihood of an event occurring given certain constraints. Probability theory is a discipline that was created to deal with chance phenomena. For instance, before getting a surgery, a patient wants to know the chances that the surgery might fail; before taking medication, you want to know the chances that there will be side effects; before leaving your house, you want to know the chance that it will rain today. Probability is a measure of likelihood that takes on values between 0 and 1, inclusive, with 0 representing impossible events and 1 representing certainty. The chances of events occurring fall between these two values. The skill of calculating probability allows us to make better decisions. We will also talk about random variables. A random variable describes the outcomes of a random experiment. A statistical distribution describes the numbers of times each possible outcome occurs in a sample. The values of a random variable can vary with each repetition of an experiment. Intuitively, a random variable, summarizing certain chance phenomenon, takes on values with certain probabilities. A random variable can be classified as being either discrete or continuous, depending on the values it assumes. Suppose you count the number of people who go to a coffee shop between 4 p. In this case, the number of people is an example of a discrete random variable and the amount of waiting time they spend is an example of a continuous random variable. Completing this unit should take you approximately 25 hours. Sampling Distributions The concept of sampling distribution lies at the very foundation of statistical inference. It is best to introduce sampling distribution using an example here. Suppose you want to estimate a Page 7

8 parameter of a population, say the population mean. There are two natural estimators: In particular, for a sample of even size n, the median is the mean of the middle two numbers. But which one is better, and in what sense? This involves repeated sampling, and you want to choose the estimator that would do better on average. It is clear that different samples may give different sample means and medians; some of them may be closer to the truth than the others. Consequently, we cannot compare these two sample statistics or, in general, any two sample statistics on the basis of their performance with a single sample. Instead, you should recognize that sample statistics are themselves random variables; therefore, sample statistics should have frequency distributions by taking into account all possible samples. In this unit, you will study the sampling distribution of several sample statistics. This unit will show you how the central limit theorem can help to approximate sampling distributions in general. Completing this unit should take you approximately 15 hours. Estimation with Confidence Intervals In this unit, you will learn how to use the central limit theorem and confidence intervals, the latter of which enables you to estimate unknown population parameters. The central limit theorem provides us with a way to make inferences from samples of non-normal populations. This theorem states that given any population, as the sample size increases, the sampling distribution of the means approaches a normal distribution. This powerful theorem allows us to assume that given a large enough sample, the sampling distribution will be normally distributed. You will also learn about confidence intervals, which provide you with a way to estimate a population parameter. Instead of giving just a one-number estimate of a variable, a confidence interval gives a range of likely values for it. This is useful, because point estimates will vary from sample to sample, so an interval with certain confidence level is better than a single point estimate. After completing this unit, you will know how to construct such confidence intervals and the level of confidence. Completing this unit should take you approximately 10 hours. Hypothesis Test A hypothesis test involves collecting and evaluating data from a sample. The data gathered and evaluated is then used to make a decision as to whether or not the data supports the claim that is made about the population. This unit will teach you how to conduct hypothesis tests and how to identify and differentiate between the errors associated with them. Many times, you need answers to questions in order to make efficient decisions. The process of hypothesis testing is a way of decision-making. In this unit, you will learn to establish your assumptions through null and alternative hypotheses. The null hypothesis is the hypothesis that is assumed to be true and the hypothesis you hope to nullify, while the alternative hypothesis is the research hypothesis that you claim to be true. This means that you need to conduct the correct tests to be able to accept or reject the null hypothesis. You will learn how to compare sample characteristics to see whether there is enough data to accept or reject the null hypothesis. Completing this unit should take you approximately 12 hours. Linear Regression In this unit, we will discuss situations in which the mean of a population, treated as a variable, depends on the value of another variable. One of the main reasons why we conduct such analyses is to understand how two variables are related to each other. The most common type of relationship is a linear relationship. For example, you may want to know what happens to one variable when you increase or decrease the other variable. You want to answer questions such as, "Does one variable increase as the other increases, or does the variable decrease? In this unit, you will also learn to measure the degree of a relationship between two or more variables. Both correlation and regression are measures for comparing variables. Correlation quantifies the strength of a relationship between two variables and is a measure of existing data. On the other hand, regression is the study of the strength of a linear relationship between an independent and dependent variable and can be used to predict the value of the dependent variable when the value of the independent variable is known. Page 8

9 Chapter 6 : Interpreting quartiles (practice) Khan Academy Part Two Preliminary Skills Needed for Conducting Research You have already familiarized yourself with the major sources and locations of published research. That was the first phase in learning to use the library profitably. Remember that the quality of the output depends on the quality of input. Garbage in, garbage out. Data scientists spend much of their time on data preparation before they jump into modelling, because understanding, generating and selecting useful features impacts model performance. It helps the data scientists to check assumptions required for fitting models. Depending on size and type of data, understanding and interpreting data sets can be challenging. What can be done? Use different exploratory data analysis and visualization techniques to get a better understanding. This includes summarizing main data set characteristics, finding representative or critical points and discovering relevant features. After gaining an overall understanding of the data set, you can think about which observations and features to use in modeling. Summary statistics with visualization Summary statistics help to analyze information about the sample data. It indicates something about the continuous interval and discrete nominal data set variables. Analyze those variables individually or together because they can help find: The distribution of feature values across different features can be compared, as can feature statistics for training and test data sets. This helps uncover differences between them. Be careful about summary statistics. Excessive trust of summary statistics can hide problems in the data set. Consider using additional techniques for a full understanding. Example-based explanations Assume the data set has millions of observations with thousands of variables. One approach to solve this problem is to use example-based explanations; techniques that can help pick important observations and dimensions. They can help interpret highly complex big data sets with different distributions. The techniques available to solve this problem include finding observations and dimensions to characterize, to criticize and to distinguish the data set groups. As humans, we usually use representative examples from the data for categorization and decision making. Those examples, usually called prototypes, are observations that best describe dataset categories. They can be used to interpret categories since it is hard to make interpretations using all the observations in a certain category. Finding prototypes alone is not sufficient to understand the data since it overgeneralizes. We need to show exceptions criticisms to the rules. Those observations can be considered as minority observations very different from the prototype, but still belonging in the same category. In the illustrations below, robot pictures in each category consist of robots with different head and body shapes. Robots in costumes can also belong to one of those categories, although they can be very different from a typical robot picture. Those pictures are needed to understanding the data since they are important minorities. Finding representatives may not always be enough. If the number of features is high, it will still be hard to understand the selected observations. This is because humans cannot comprehend long and complicated explanations. The explanations need to be simple. The most important features for those selected observations must be considered. Subspace representation is a solution to that problem. Using the prototype and subspace representation helps in interpretability. For that, find distinguishing dimensions in the data. A mind the gap model MGM combines extractive and selective approaches and reports a global set of distinguishable dimensions to assist with further exploration. In the above example, by looking at the features extracted from different robot pictures we can say that shape of the head is a distinguishing dimension. Embedding techniques An embedding is a mapping from discrete values, such as words or observations, to vectors. Different embedding techniques help visualize lower-dimensional representation. Embeddings can be in hundreds of dimensions. The common way to understand them is to project them into two or three dimensions. They are useful for many things: Use them to explore local neighborhoods. It may help to explore the closest points to a given point to make sure that they are related to each other. Select those points and do further analysis. Use them to understand the behavior of a model. Use them to analyze the global structure, seeking groups of points. This helps find clusters and outliers. There are many methods for obtaining Page 9

10 embedding, including: This is an effective algorithm to reduce dimensionality of data, especially if strong linear relationships exist among variables. It can be used to highlight the variations and eliminate dimensions. Remaining principal components account for trivial amounts of variance. They should not be retained for interpretability and analysis. T-distributed stochastic neighbor embedding t-sne: It is nonlinear and nondeterministic; and allows creation of 2 or 3D projections. T-SNE finds structures that other methods may miss. While preserving local structure, it may distort global structure. If more information is needed about t-sne, check out a great article at distill. Topological data analysis TDA Topology studies geometric features preserved when we deform the object without tearing it. Topological data analysis provides tools to study the geometric features of data using topology. This includes detecting and visualizing features, and the statistical measures related to those. Geometric features can be distinct clusters, loops and tendrils in the data. If there is a loop in this network, the conclusion is that a pattern occurs periodically. Mapper algorithms in TDA are useful for data visualization and clustering. Topological networks of a data set can be created in which nodes are the group of similar observations and the edges connect the nodes if they have a common observation. Conclusion When it comes to understanding and interpreting data, there is no one solution that fits all. Pick the one that best meets your need. Ilknur Kaynar Kabul is a scientist and manager at SAS, working at the intersection of computer science, statistics and optimization. Her work involves building scalable machine learning algorithms that help solve big data problems. She holds a doctorate of computer science from the University of North Carolina. Page 10

11 Chapter 7 : Course: MA Introduction to Statistics The field of statistics provides principles and methods for collecting, summarizing, and analyzing data, and for interpreting the results. You use statistics to describe data and make inferences. Then, you use the inferences to improve processes and products. Then, you examine the interval plot, individual value plot, and boxplot together to assess the equality of the means. Interpret the residual plots Use residual plots, which are available with many statistical commands, to verify statistical assumptions. Normal Probability Plot Use this plot to detect nonnormality. Points that approximately follow a straight line indicate that the residuals are normally distributed. Histogram Use this plot to detect multiple peaks, outliers, and nonnormality. Look for a normal histogram, which is approximately symmetric and bell-shaped. Versus Fits Use this plot to detect nonconstant variance, missing higher-order terms, and outliers. Look for residuals that are scattered randomly around zero. Versus Order Use this plot to detect the time dependence of the residuals. Inspect the plot to ensure that the residuals display no obvious pattern. For the shipping data, the four-in-one residual plots indicate no violations of statistical assumptions. Note In Minitab, you can display each of the residual plots on a separate page. Interpret the interval plot, individual value plot, and boxplot Examine the interval plot, individual value plot, and boxplot. Each graph indicates that the delivery time varies by shipping center, which is consistent with the histograms from the previous chapter. The boxplot for the Eastern shipping center has an asterisk. The asterisk identifies an outlier. This outlier is an order that has an unusually long delivery time. Examine the interval plot again. Hold the pointer over the points on the graph to view the means. The interval plot shows that the Western shipping center has the fastest mean delivery time 2. The Tukey confidence intervals show the following pairwise comparisons: Eastern shipping center mean minus Central shipping center mean Western shipping center mean minus Central shipping center mean Western shipping center mean minus Eastern shipping center mean Hold the pointer over the points on the graph to view the middle, upper, and lower estimates. The interval for the Eastern minus Central comparison is 0. That is, the mean delivery time of the Eastern shipping center minus the mean delivery time of the Central shipping center is between 0. You interpret the other Tukey confidence intervals similarly. Also, notice the dashed line at zero. If an interval does not contain zero, the corresponding means are significantly different. Therefore, all the shipping centers have significantly different average delivery times. Minitab provides detailed information about the Session window output and graphs for most statistical commands. On the Standard toolbar, click the Help button. Save all your work in a Minitab project. Navigate to the folder that you want to save your files in. In File name, enter. Page 11

12 Chapter 8 : Summarizing and Interpreting Data Sets Worksheets Join Curt Frye for an in-depth discussion in this video, Summarizing data using descriptive statistics, part of Excel Business Statistics. Descriptive statistics Descriptive statistics are tabular, graphical, and numerical summaries of data. The purpose of descriptive statistics is to facilitate the presentation and interpretation of data. Most of the statistical presentations appearing in newspapers and magazines are descriptive in nature. Univariate methods of descriptive statistics use data to enhance the understanding of a single variable; multivariate methods focus on using statistics to understand the relationships among two or more variables. To illustrate methods of descriptive statistics, the previous example in which data were collected on the age, gender, marital status, and annual income of individuals will be examined. Tabular methods The most commonly used tabular summary of data for a single variable is a frequency distribution. A frequency distribution shows the number of data values in each of several nonoverlapping classes. Another tabular summary, called a relative frequency distribution, shows the fraction, or percentage, of data values in each class. The most common tabular summary of data for two variables is a cross tabulation, a two-variable analogue of a frequency distribution. For a qualitative variable, a frequency distribution shows the number of data values in each qualitative category. For instance, the variable gender has two categories: Thus, a frequency distribution for gender would have two nonoverlapping classes to show the number of males and females. A relative frequency distribution for this variable would show the fraction of individuals that are male and the fraction of individuals that are female. Constructing a frequency distribution for a quantitative variable requires more care in defining the classes and the division points between adjacent classes. For instance, if the age data of the example above ranged from 22 to 78 years, the following six nonoverlapping classes could be used: A frequency distribution would show the number of data values in each of these classes, and a relative frequency distribution would show the fraction of data values in each. A cross tabulation is a two-way table with the rows of the table representing the classes of one variable and the columns of the table representing the classes of another variable. To construct a cross tabulation using the variables gender and age, gender could be shown with two rows, male and female, and age could be shown with six columns corresponding to the age classes 20â 29, 30â 39, 40â 49, 50â 59, 60â 69, and 70â The entry in each cell of the table would specify the number of data values with the gender given by the row heading and the age given by the column heading. Such a cross tabulation could be helpful in understanding the relationship between gender and age. Graphical methods A number of graphical methods are available for describing data. A bar graph is a graphical device for depicting qualitative data that have been summarized in a frequency distribution. Labels for the categories of the qualitative variable are shown on the horizontal axis of the graph. A bar above each label is constructed such that the height of each bar is proportional to the number of data values in the category. A bar graph of the marital status for the individuals in the above example is shown in Figure 1. There are 4 bars in the graph, one for each class. A pie chart is another graphical device for summarizing qualitative data. The size of each slice of the pie is proportional to the number of data values in the corresponding class. A pie chart for the marital status of the individuals is shown in Figure 2. A pie chart for the marital status of individuals. A histogram is the most common graphical presentation of quantitative data that have been summarized in a frequency distribution. The values of the quantitative variable are shown on the horizontal axis. A rectangle is drawn above each class such that the base of the rectangle is equal to the width of the class interval and its height is proportional to the number of data values in the class. Page 1 of 8. Page 12

13 Chapter 9 : Descriptive Statistics Excel/Stata Summarize, represent, and interpret data on two categorical and quantitative variables blog.quintoapp.comb.5 Summarize categorical data for two categories in two-way frequency tables. Interpret relative frequencies in the context of the data (including joint, marginal, and conditional relative frequencies). Science Science is based on the empirical method for making observations - for systematically obtaining information. It consists of methods for making observations. Observations Observations are the basic empirical "stuff" of science. Statistics Statistics is a set of methods and rules for organizing, summarizing and interpreting information. The methods and rules enable scientific researchers to describe and analyze the observations they have made. Statistical methods are tools for science. Science consists of methods for making observations; Statistics consists of methods for describing and analyzing the observations. We will also refer to populations of scores. Samples A sample is a set of individuals selected from a population, usually intended to represent the population in a study. We will also refer to samples of scores. The data we gathered in class are a "sample" of scores obtained with a sample of individuals. The population we sampled from is the population of UNC undergraduates. Parameters A Parameter is a value, usually a numerical value, that describes a Population. A Parameter may be obtained from a single measurement, or it may be derived from a set of measurements from the Population. Statistics A Statistic is a value, usually a numerical value, that describes a Sample. A Statistic may be obtained from a single measurement, or it may be derived from a set of measurements from the Sample. Here are some "statistics" computed from our sample of data: Data Data plural are measurements or observations. A data set is a collection of measurements or observations. A datum singular is a single measurement or observation and is commonly called a data-value, a score, or a raw score. Descriptive Statistics Descriptive Statistics are statistical procedures used to summarize, organize and simplify data. It is also the branch of statistical activity focusing on the use of such procedures. These procedures are the focus of chapters 1 through 5. Statistical Visualization Recently developed computational statistical procedures used to visually summarize, organize and simplify data. The statistical system we are using is named ViSta for "Visual Statistics", because it includes statistical visualiation. A statistical visualization of our data is shown below. Higher satisfaction is associated with higher GPA. Exploratory Statistics The process of exploring data by using descriptive and visualization methods to "see what the data seem to say". The branch of statistics that focuses on "seeing what the data seem to say" Tukey, 19?? Inferential Statistics Inferential Statistics consist of techniques that allow us to study samples and then to make generalizations about the populations from which the samples were selected. These procedures are the focus of chapters 8 through the remainder of the text. The groundwork for statistical inference is laid in chapters 6 and 7. Sampling Error Sampling error is the discrepency, or amount of error, that exists between a sample statistic and the corresponding population parameter. The Scientific Method and the Design of Experiments Science attempts to discover orderliness in the universe - to discover regularity in changes. Something that can change is called a variable. Variables A variable is a characteristic or condition that changes or has different values for different individuals. In the data we gathered, the variables include "Gender", "Age", etc. A constant is a characteristic or condition that does not vary, and is the same for every individual. The Correlational Method The scientific method in which two or more variables are observed without manipulation i. The correlational method cannot establish cause-and-effect: Correlation is not causation! The data we gathered are an example of the correlational method. The Experimental Method The scientific method which can establish a cause-and-effect relationship between two or more variables. The researcher manipulates one variable and observes what happens on the other. More than one variable may be manipulated or observed. To correctly establish cause-and-effect, the researcher must exercise some control over the experimental situation to ensure that some other variable s do es not influence the relationship being watched. The experimental conditions must be identical, other than differing on values of the manipulated variable. Independent Variable also called Page 13

14 the predictor variable The variable which is manipulated by the researcher. Dependent Variable also called the response variable The variable which is observed by the researcher for changes in order to access the effect of the treatment. The treatment is the manipulation of the predictor variable. Confounding Variable An uncontrolled variable that is unintentionally allowed to vary systematically with the independent variable. Confounds the results bad, bad, bad! The control group This is a condition of the independent variable that does not receive the experimental treatment. Usually, the control group receives either no treatment or a placebo treatment. The experimental group This is a condition of the independent variable that does receive an experimental treatment. There may be several experimental groups. The Quasi-Experimental Method Examines differences between pre-existing groups of subjects such as men vs. Hypotheses A hypothesis is a prediction about the outcome of an experiment. In experimental research, a hypothesis makes a prediction about how the manipulation of the independent predictor variable will affect the dependent response variable. Measurement Data are measurements of observations which involve categorizing, ordering or using number to characterize amount. Several levels of measurement are involved. These in turn determine what statistics can be computed. Measurements may also be discrete or continuous. Scales Levels of Measurement Nominal The nominal level of measurement labels observations so that they fall into different categories. Football jersey numbers and home street addresses are common examples. In ViSta, nominal variables are called "Category" variables. Ordinal The ordinal level of measurement consists of categories that are ordered in a sequence. Order of finish in a race is a common example. In ViSta, ordinal variables are called "Ordinal" variables. Interval The interval level of measurement consists of ordered categories where all of the categories are intervals of exactly the same size. Temperature is a common example. Here, equal differences between numbers reflect equal differences in magnitude of the observed variable. Ratio The ratio level of measurement is an interval scale with an absolute zero point. Length and weight are common examples. Here, ratio of numbers reflect ratios of variable magnitude. In ViSta, interval and ratio variables are called "Numeric" variables. Discrete and Continuous Variables Discrete A discrete variable has separate, indivisible categories. No values can exist in between two neighboring categories. Continuous A continuous variable has an infinite number of possible values falling between any two observed values. Mathematical Notation In statistical calculations you will constantly be required to add a set of values to find a specific total. We use algebraic expressions to represent the values being added. For example X means "Scores on a Variable. Thus, we write Note that All calculations within parentheses are done first. Squaring, multiplying, and dividing are done second, and should be completed in order from left to right. Adding and subtracting including summation are third, and should be completed in order from left to right. The following term, which is called the "squared sum" works as shown: Because of the order of operations, the following term, which is called "the sum of squares", works as shown: Consider how the following summation equation works: On the other hand, the next summation equation works differently: Finally, consider how this last summation equation works: Page 14

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Chapter 1: Introduction to Statistics Variables A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Introduction to Statistical Data Analysis I

Introduction to Statistical Data Analysis I Introduction to Statistical Data Analysis I JULY 2011 Afsaneh Yazdani Preface What is Statistics? Preface What is Statistics? Science of: designing studies or experiments, collecting data Summarizing/modeling/analyzing

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 5, 6, 7, 8, 9 10 & 11)

More information

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA Data Analysis: Describing Data CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA In the analysis process, the researcher tries to evaluate the data collected both from written documents and from other sources such

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI

Statistics. Nur Hidayanto PSP English Education Dept. SStatistics/Nur Hidayanto PSP/PBI Statistics Nur Hidayanto PSP English Education Dept. RESEARCH STATISTICS WHAT S THE RELATIONSHIP? RESEARCH RESEARCH positivistic Prepositivistic Postpositivistic Data Initial Observation (research Question)

More information

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)

More information

Chapter 7: Descriptive Statistics

Chapter 7: Descriptive Statistics Chapter Overview Chapter 7 provides an introduction to basic strategies for describing groups statistically. Statistical concepts around normal distributions are discussed. The statistical procedures of

More information

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months? Medical Statistics 1 Basic Concepts Farhad Pishgar Defining the data Population and samples Except when a full census is taken, we collect data on a sample from a much larger group called the population.

More information

V. Gathering and Exploring Data

V. Gathering and Exploring Data V. Gathering and Exploring Data With the language of probability in our vocabulary, we re now ready to talk about sampling and analyzing data. Data Analysis We can divide statistical methods into roughly

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

Undertaking statistical analysis of

Undertaking statistical analysis of Descriptive statistics: Simply telling a story Laura Delaney introduces the principles of descriptive statistical analysis and presents an overview of the various ways in which data can be presented by

More information

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj Statistical Techniques Masoud Mansoury and Anas Abulfaraj What is Statistics? https://www.youtube.com/watch?v=lmmzj7599pw The definition of Statistics The practice or science of collecting and analyzing

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

Statistics is a broad mathematical discipline dealing with

Statistics is a broad mathematical discipline dealing with Statistical Primer for Cardiovascular Research Descriptive Statistics and Graphical Displays Martin G. Larson, SD Statistics is a broad mathematical discipline dealing with techniques for the collection,

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Chapter 1: Introduction to Statistics Statistics, Science, and Observations Definition: The term statistics refers to a set of mathematical procedures for organizing, summarizing, and interpreting information.

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

Knowledge discovery tools 381

Knowledge discovery tools 381 Knowledge discovery tools 381 hours, and prime time is prime time precisely because more people tend to watch television at that time.. Compare histograms from di erent periods of time. Changes in histogram

More information

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. MODULE 02: DESCRIBING DT SECTION C: KEY POINTS C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape. C-2:

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Chapter 1: Introduction o to Statistics Statistics, ti ti Science, and Observations Definition: The term statistics refers to a set of mathematical procedures for organizing, summarizing, and interpreting

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE 1. When you assert that it is improbable that the mean intelligence test score of a particular group is 100, you are using. a. descriptive

More information

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments 1 2 Outline Finish sampling slides from Tuesday. Study design what do you do with the subjects/units once you select them? (OI Sections 1.4-1.5) Observational studies vs. experiments Descriptive statistics

More information

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis:

Population. Sample. AP Statistics Notes for Chapter 1 Section 1.0 Making Sense of Data. Statistics: Data Analysis: Section 1.0 Making Sense of Data Statistics: Data Analysis: Individuals objects described by a set of data Variable any characteristic of an individual Categorical Variable places an individual into one

More information

Chapter 2--Norms and Basic Statistics for Testing

Chapter 2--Norms and Basic Statistics for Testing Chapter 2--Norms and Basic Statistics for Testing Student: 1. Statistical procedures that summarize and describe a series of observations are called A. inferential statistics. B. descriptive statistics.

More information

Measuring the User Experience

Measuring the User Experience Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics Chapter 2 Background Tom Tullis and Bill Albert Morgan Kaufmann, 2008 ISBN 978-0123735584 Introduction Purpose Provide

More information

Appendix B Statistical Methods

Appendix B Statistical Methods Appendix B Statistical Methods Figure B. Graphing data. (a) The raw data are tallied into a frequency distribution. (b) The same data are portrayed in a bar graph called a histogram. (c) A frequency polygon

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015

Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Introduction to statistics Dr Alvin Vista, ACER Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Structure

More information

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13

Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13 Readings: Textbook readings: OpenStax - Chapters 1 4 Online readings: Appendix D, E & F Online readings: Plous - Chapters 1, 5, 6, 13 Introductory comments Describe how familiarity with statistical methods

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

Observational studies; descriptive statistics

Observational studies; descriptive statistics Observational studies; descriptive statistics Patrick Breheny August 30 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 38 Observational studies Association versus causation

More information

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION Variables In the social sciences data are the observed and/or measured characteristics of individuals and groups

More information

Before we get started:

Before we get started: Before we get started: http://arievaluation.org/projects-3/ AEA 2018 R-Commander 1 Antonio Olmos Kai Schramm Priyalathta Govindasamy Antonio.Olmos@du.edu AntonioOlmos@aumhc.org AEA 2018 R-Commander 2 Plan

More information

ANOVA in SPSS (Practical)

ANOVA in SPSS (Practical) ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling

More information

Six Sigma Glossary Lean 6 Society

Six Sigma Glossary Lean 6 Society Six Sigma Glossary Lean 6 Society ABSCISSA ACCEPTANCE REGION ALPHA RISK ALTERNATIVE HYPOTHESIS ASSIGNABLE CAUSE ASSIGNABLE VARIATIONS The horizontal axis of a graph The region of values for which the null

More information

AP Statistics. Semester One Review Part 1 Chapters 1-5

AP Statistics. Semester One Review Part 1 Chapters 1-5 AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically

More information

How to interpret scientific & statistical graphs

How to interpret scientific & statistical graphs How to interpret scientific & statistical graphs Theresa A Scott, MS Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott 1 A brief introduction Graphics:

More information

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 4 Chapter 2 CHAPTER 2. MEASURING AND DESCRIBING VARIABLES 1. A. Age: name/interval; military dictatorship: value/nominal; strongly oppose: value/ ordinal; election year: name/interval; 62 percent: value/interval;

More information

Descriptive statistics

Descriptive statistics CHAPTER 3 Descriptive statistics 41 Descriptive statistics 3 CHAPTER OVERVIEW In Chapter 1 we outlined some important factors in research design. In this chapter we will be explaining the basic ways of

More information

Unit 7 Comparisons and Relationships

Unit 7 Comparisons and Relationships Unit 7 Comparisons and Relationships Objectives: To understand the distinction between making a comparison and describing a relationship To select appropriate graphical displays for making comparisons

More information

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution

Organizing Data. Types of Distributions. Uniform distribution All ranges or categories have nearly the same value a.k.a. rectangular distribution Organizing Data Frequency How many of the data are in a category or range Just count up how many there are Notation x = number in one category n = total number in sample (all categories combined) Relative

More information

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics

Psy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics Psy201 Module 3 Study and Assignment Guide Using Excel to Calculate Descriptive and Inferential Statistics What is Excel? Excel is a spreadsheet program that allows one to enter numerical values or data

More information

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of

Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of Students will understand the definition of mean, median, mode and standard deviation and be able to calculate these functions with given set of numbers. Also, students will understand why some measures

More information

Examining differences between two sets of scores

Examining differences between two sets of scores 6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS

STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS STATISTICS 8 CHAPTERS 1 TO 6, SAMPLE MULTIPLE CHOICE QUESTIONS Circle the best answer. This scenario applies to Questions 1 and 2: A study was done to compare the lung capacity of coal miners to the lung

More information

Psychology Research Process

Psychology Research Process Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:

More information

HW 1 - Bus Stat. Student:

HW 1 - Bus Stat. Student: HW 1 - Bus Stat Student: 1. An identification of police officers by rank would represent a(n) level of measurement. A. Nominative C. Interval D. Ratio 2. A(n) variable is a qualitative variable such that

More information

CCM6+7+ Unit 12 Data Collection and Analysis

CCM6+7+ Unit 12 Data Collection and Analysis Page 1 CCM6+7+ Unit 12 Packet: Statistics and Data Analysis CCM6+7+ Unit 12 Data Collection and Analysis Big Ideas Page(s) What is data/statistics? 2-4 Measures of Reliability and Variability: Sampling,

More information

PRINCIPLES OF STATISTICS

PRINCIPLES OF STATISTICS PRINCIPLES OF STATISTICS STA-201-TE This TECEP is an introduction to descriptive and inferential statistics. Topics include: measures of central tendency, variability, correlation, regression, hypothesis

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017 SOCI 420 Advanced Methods of Social Research Fall 2017 EXAM 1 RUBRIC Instructor: Ernesto F. L. Amaral, Assistant Professor, Department of Sociology Date: October 12, 2017 (Thursday) Section 903: 9:35 10:50am

More information

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego Biostatistics Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego (858) 534-1818 dsilverstein@ucsd.edu Introduction Overview of statistical

More information

1. To review research methods and the principles of experimental design that are typically used in an experiment.

1. To review research methods and the principles of experimental design that are typically used in an experiment. Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab Exercise Lab #7 (there was no Lab #6) Treatment for Depression: A Randomized Controlled Clinical Trial Objectives: 1. To review

More information

Descriptive Statistics Lecture

Descriptive Statistics Lecture Definitions: Lecture Psychology 280 Orange Coast College 2/1/2006 Statistics have been defined as a collection of methods for planning experiments, obtaining data, and then analyzing, interpreting and

More information

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5)

Summarizing Data. (Ch 1.1, 1.3, , 2.4.3, 2.5) 1 Summarizing Data (Ch 1.1, 1.3, 1.10-1.13, 2.4.3, 2.5) Populations and Samples An investigation of some characteristic of a population of interest. Example: You want to study the average GPA of juniors

More information

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017 SOCI 420 Advanced Methods of Social Research Fall 2017 EXAM 1 RUBRIC Instructor: Ernesto F. L. Amaral, Assistant Professor, Department of Sociology Date: October 12, 2017 (Thursday) Section 904: 2:20 3:35pm

More information

Distributions and Samples. Clicker Question. Review

Distributions and Samples. Clicker Question. Review Distributions and Samples Clicker Question The major difference between an observational study and an experiment is that A. An experiment manipulates features of the situation B. An experiment does not

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. AP Psych - Stat 1 Name Period Date MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) In a set of incomes in which most people are in the $15,000

More information

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Dr. Kelly Bradley Final Exam Summer {2 points} Name {2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.

More information

The degree to which a measure is free from error. (See page 65) Accuracy

The degree to which a measure is free from error. (See page 65) Accuracy Accuracy The degree to which a measure is free from error. (See page 65) Case studies A descriptive research method that involves the intensive examination of unusual people or organizations. (See page

More information

Quizzes (and relevant lab exercises): 20% Midterm exams (2): 25% each Final exam: 30%

Quizzes (and relevant lab exercises): 20% Midterm exams (2): 25% each Final exam: 30% 1 Intro to statistics Continued 2 Grading policy Quizzes (and relevant lab exercises): 20% Midterm exams (2): 25% each Final exam: 30% Cutoffs based on final avgs (A, B, C): 91-100, 82-90, 73-81 3 Numerical

More information

Outcome Measure Considerations for Clinical Trials Reporting on ClinicalTrials.gov

Outcome Measure Considerations for Clinical Trials Reporting on ClinicalTrials.gov Outcome Measure Considerations for Clinical Trials Reporting on ClinicalTrials.gov What is an Outcome Measure? An outcome measure is the result of a treatment or intervention that is used to objectively

More information

Statistics Mathematics 243

Statistics Mathematics 243 Statistics Mathematics 243 Michael Stob February 2, 2005 These notes are supplementary material for Mathematics 243 and are not intended to stand alone. They should be used in conjunction with the textbook

More information

Chapter 3 CORRELATION AND REGRESSION

Chapter 3 CORRELATION AND REGRESSION CORRELATION AND REGRESSION TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression

More information

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D. This guide contains a summary of the statistical terms and procedures. This guide can be used as a reference for course work and the dissertation process. However, it is recommended that you refer to statistical

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test February 2016 STAB22H3 Statistics I, LEC 01 and LEC 02 Duration: 1 hour and 45 minutes Last Name: First Name:

More information

Statistics and Probability

Statistics and Probability Statistics and a single count or measurement variable. S.ID.1: Represent data with plots on the real number line (dot plots, histograms, and box plots). S.ID.2: Use statistics appropriate to the shape

More information

Two-Way Independent ANOVA

Two-Way Independent ANOVA Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

AP Stats Review for Midterm

AP Stats Review for Midterm AP Stats Review for Midterm NAME: Format: 10% of final grade. There will be 20 multiple-choice questions and 3 free response questions. The multiple-choice questions will be worth 2 points each and the

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

AP Psych - Stat 2 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. AP Psych - Stat 2 Name Period Date MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) In a set of incomes in which most people are in the $15,000

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. STAT 280 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017 Essential Statistics for Nursing Research Kristen Carlin, MPH Seattle Nursing Research Workshop January 30, 2017 Table of Contents Plots Descriptive statistics Sample size/power Correlations Hypothesis

More information

Frequency Distributions

Frequency Distributions Frequency Distributions In this section, we look at ways to organize data in order to make it more user friendly. It is difficult to obtain any meaningful information from the data as presented in the

More information

Test 1 Version A STAT 3090 Spring 2018

Test 1 Version A STAT 3090 Spring 2018 Multiple Choice: (Questions 1 20) Answer the following questions on the scantron provided using a #2 pencil. Bubble the response that best answers the question. Each multiple choice correct response is

More information