Statistics: Making Sense of the Numbers Chapter 9 This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display, including transmission of any image over a network; preparation of any derivative work, including the extraction, in whole or in part, of any images; any rental, lease or lending of the program. The Growing Popularity of Living Together In Sin What To Do Once You Have The Numbers Data coding = The process of putting the raw quantitative information into a computer-readable format. Data record = Information on one person, unit, case, or your unit of analysis in a computer-friendly format. 1
What To Do Once You Have The Numbers Codebook = A document that describes the coding procedure and the location of data for variables in a format that computers can use. Precoding Cleaning Up the Numbers After coding, check and clean the data. Verify coding. Statistics refers both to a set of collected numbers, and a branch of applied mathematics Looking at Results with One Variable Frequency distribution = A simple table showing how many, or what percent, of the cases fall into each variable category. 2
Looking at Results with One Variable Where is the Middle the Average? Measures of Central Tendency = A measures of the center of a set of numerical data, the mean, median and mode. The mode is the most common or frequently occurring number. The median is the middle point and the 50th percentile. The mean, the arithmetic average, is the most widely used measure of central tendency Skewed distribution = A distribution of cases that is not bell-shaped or normal, but instead has many cases at one of the extreme values (very high or very low) of a variable. Looking at Results with One Variable What is the Spread? You can measure variation in three ways: range, percentile, and standard deviation. Range consists of the largest and smallest scores Percentiles tell us the score at a specific place within the distribution. Standard deviation = a widely used measure of the variability of a variable that indicates the average distance of cases from the mean value. Z-scores = a standardized measure that allows comparisons of groups that differ in their means and standard deviations. 3
Looking at Results with One Variable Displaying information on a map. Information along a time line. A Bivariate Relationship Covariation = When two variable go together or are associated with one another. Statistical Independence = The absence of an association or covariation between two variables. 4
Seeing the Relationship: Scattergrams Scattergram = A graph on which you plot the value of each case or observation. Each axis of the graph represents the values of one variable, and the graph can reveal bivarate relations. How to construct a scattergram Seeing the Relationship: Scattergrams What can you learn from a scattergram? Form. Direction. positive relationship = a connection between two variables such that as one increases the other variable also increases, and vice versa. negative relationship = a connection between two variables such that as one rises the other variable falls, and vice versa. Precision. 5
Bivariate Tables Condensed scattergrams Cross-tabulation = Placing two variables in a table at the same time allow you to see how cases that have values on one variable align with values on a second variable for those same cases. Contingency table = A table with two or more variables that have been cross-tabulated. Raw counts are often converted to percentage counts. Reading a percentaged table. Measures of Association: information about bivariate relations condensed into a single number. It expresses the strength, and often the direction, of a relationship. Six Measures of Association 6
Results with More Than Two Variables Statistical Control Control variables = Variables measured in nonexperimental research studies that represent alternative explanations for a causal relationship. Results with More Than Two Variables Control in Percentage Tables Multiple Regression Analysis Multiple regression tells you two things. R-squared (R 2 ) or the percentage of prediction accuracy the direction and numerical size of each independent variable s impact on a dependent variable 7
Going Beyond Description: Inferential Statistics The Purpose of Inferential Statistics to determine whether results from a sample hold for a population to evaluate the strength of relationships between variables Statistical significance = A way to determine how likely sample results could be due to random processes. Statistical Significance Levels of Significance = A simplified way to indicate the statistical significance of a relationship. Statistical significance is often set at one of three levels:.05 level.01 level.001 level 8
Levels of Significance When set at.05 it means that 5 out of 100 times would this result have occurred by chance and 95 times out of 100 reflect the population accurately. Type I and Type II Errors Type I Error is a false positive: you falsely reject the null hypothesis when there is no true relationship between variables. Type II Error is a false negative: you false accept the null hypothesis when there is a true relationship in the population. 9