Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have a variety of different terms or processes that are all equally applicable, thus making precise definition a challenge. In this topic as well as in most of the course, we are confronted by varying terminology in our required and supplementary readings as well as in assorted reference texts. I have tried to adopt a terminology that is as close as possible to most texts and that enables you to see the logical connections of the terms and processes. Statistics and research design are tools we use to test research hypotheses. According to Kerlinger (1986), Statistics are the theory and method of analyzing quantitative data obtained from samples of observations in order to study and compare sources of variance of phenomena, to help make decisions to accept or reject hypothesized relations between the phenomena, and to aid in making reliable inferences from empirical observations. (p. 175) and Research design is the plan and structure of investigation so conceived as to obtain answers to research questions. (p. 279) Statistics are the tools of research design (Williams, 1986). Research design is the overall plan for maximizing validity in inquiry (we will discuss validity later). Each study should set out with a specific research design (Tuckman, 1988). Statistical design and then specific statistical tests are planned accordingly. Statistical procedures and design are tools that are applied according to research design. Analogy Research design is the blueprint of the study. Statistical design and procedures are the craft and tools used to conduct quantitative studies. The logic of hypothesis testing is the decision making process that links statistical design to research design.
Statistics 2 STATISTICS* There are two types of statistics: Descriptive Statistics Inferential Statistics The choice of a particular statistic depends on: the type of data collected what information is needed from the data what type of use is needed from the hypotheses Descriptive Statistics Descriptive statistics involve tabulating, depicting, and describing data. It is a tool for describing, summarizing, or reducing to comprehensible form the properties of an otherwise unwieldy mass of data. An entire set of data can be described in tables, graphs, measures of central tendency, or measures of dispersion. An individual observation in a data set can be described in relation to the entire set by converting the individual score into a type of transformed score, such as a percentile, rank, or z-score. A. Scales of Measurement 1. Nominal Scale 2. Ordinal Scale 3. Interval Scale 4. Ratio Scale B. Frequency Distributions 1. In tables, the frequency distribution is constructed by summarizing data in terms of the number or frequency of observations in each category, score, or score interval. 2. In graphs, the data can be concisely summarized into bar graphs, histograms, or frequency polygons. Frequency polygons are constructed by joining the midpoints of the histogram bars. 3. Distribution Types Normal Curve: the right half of the curve is the mirror image of the left half the observations are approximately symmetrical skewness = 0 approximately 68% of all scores fall between one standard deviation above and below the mean
Statistics 3 approximately 95% of all scores fall between two standard deviations above and below the mean approximately 99% of all scores fall between three standard deviations above and below the mean Bimodal Curve: distributions that have two distinctly different points around which scores cluster Rectangular: the frequency of scores is constant across all values of X Positively Skewed: there are a greater number of scores at the lower end and tail off at the high or positive end Negatively Skewed: the scores are clustered together at the high end and tail off toward the low or negative end C. Measures of Central Tendency 1. Mode the score or category that occurs most often in a set of data can have more than one mode example: 2 3 4 4 4 6 8 9 10 11 11 2. Median the score that divides a set of data in half when the data has been ordered from lowest to highest it is the middle score in an odd number of observations or the value that lies midway between the two middle values in an even number of observations the 50 th percentile of a distribution second most used measure of central tendency example: 2 3 4 4 4 6 8 9 10 11 11 3. Mean the balance point of a set of scores obtained by finding the sum of all the scores and dividing by N the mean is preferred whenever possible and is the only measure of central tendency that is used in advanced statistical calculations because: it is generally the most reliable or accurate of the three measures of central tendency the mean is better suited to arithmetical computations the mean is sensitive to extreme scores and so is often not the appropriate measure to use when a distribution is skewed example: 2 3 4 4 4 6 8 9 10 11 11 D. Measures of Variability (Dispersion) 1. Range calculated by subtracting the lowest score from the highest score used only for ordinal, interval, or ratio scales since the data must be ordered
Statistics 4 2. Variance indicates the extent to which individual scores in a distribution of scores differ from one another 3. Standard Deviation the square root of the variance most widely used measure to describe the dispersion among a set of observations in a distribution often used in computing other statistics E. Standard Scores Raw scores are frequently transformed to standard scores to help in interpretation. With standard scores, the mean and standard deviation are constant and known to the user. Any observation expressed in standard deviation units from the mean is a standard score. If the mean and standard deviation are known, the relationship of one score to that of others in the distribution is easily seen. 1. Z-Scores the most widely used standard score in statistics example: a z-score of 1.5 means that the score is 1.5 standard deviations above the mean; a z-score of -1.5 means that the score is 1.5 standard deviations below the mean z-scores have the same meaning in all distributions the mean is always 0 and the standard deviation is always 1 to find a percentile rank, first convert to a z-score and then find percentile rank off of a normal-curve table 2. T-Scores most commonly used standard score for reporting performance t-scores may be converted from z-scores and are always rounded to two figures, therefore, eliminating decimals they are always reported in positive numbers the mean is always 50 and the standard deviation is always 10 F. Correlation or Covariation A correlation coefficient is a statistical summary of the degree or magnitude and direction of the relationship or association between two variables. Two variables are correlated if they tend to go together. It is possible to have a negative or positive correlation. A causal relationship cannot be assumed between two variables being correlated. 1. The Pearson Product-Moment Correlation Coefficient is widely used in virtually all empirical disciplines. It is also called the Pearson r or r.
Statistics 5 2. The Spearman Rank Correlation is the degree to which scores maintain the same relative positions or ranks on two measures. Also known as r ranks. 3. Scatterplots are also used to study the relationship between two variables. Each dot or tally represents the intersection of two scores, or pair of observations, e.g., heights of fathers and sons. G. Linear Regression Not only does the correlation coefficient between X and Y on a scatterplot describe the degree of association between X and Y, but this correlation will allow predictions to be made. The purpose of a regression equation is to make predictions on a new sample of observations from the findings on a previous sample of observations, e.g., to predict the height of sons who were not in the original sample given the father s height. On a scatterplot, when the line connecting the actual means of Y for points all along the X-axis does not differ significantly from a straight line, the regression is said to be linear. Inferential Statistics Inferential statistics predicts or estimates characteristics of a population from a knowledge of the characteristics of only a sample of the population. Statistical methods assist researchers in describing data, in drawing inferences to larger bodies of data, and in studying causal relationships. Inferential statistics allows the researcher to make an inference about the relationship between independent and dependent variables in a population based on data collected from a sample or samples drawn from the population. To understand inferential statistics, one must distinguish between sample values and population values. When an entire population is not available for a study, the researcher uses a statistic (sample value) to estimate a parameter (population value). Inferential statistical methods are similar to inductive reasoning reasoning that goes from the particular to the general, from the seen to the unseen. A. Types of Samples and Sampling Error we will discuss this on June 17 th. B. Interval Estimate a range or band within which the parameter is thought to lie, instead of a single point or value as the estimate of the parameter. C. Sampling Distributions 1. The sampling distribution of the mean is a frequency distribution, not of observations, but of means of samples, each based on n observations. It is the frequency distribution that would result if random samples of a certain size (n) were drawn from the parent population, the mean was computed for these n observations, and then the process was repeated hundreds of times. The
Statistics 6 frequency distribution of these numerous mean values is the sampling distribution of the mean. 2. The standard error of the mean is used as an estimate of the magnitude of sampling error. It is the standard deviation of the sampling distribution of the sample means. D. Confidence Intervals As has been stated, 68% of the cases in a normal distribution lie within 1 standard deviation of the mean. The.68 confidence interval is referred to as the confidence coefficient of the confidence interval. However, it is not known if the particular mean that is being studied is one of the 68% that falls within the 1 standard deviation. Thus, it is usually best to have more than.68 confidence coefficient so that a wider confidence interval, the.95 confidence interval, is more commonly used. The parameter will then be contained within 95% (19 out of 20) or.95 confidence interval, in the long run. To be even more efficient, the.99 confidence interval is often used (99 out of 100). E. Central Limit Theorem States that the distribution of samples (means, medians, variances, and most other statistical measures) approaches a normal distribution as the sample size, n, increases. F. Hypothesis Testing we will cover this on June 3 rd G. Types of Statistical Analysis 1. Descriptive Measures quantify the degree of relationship between variables. Parametric tests are used to test hypotheses with stringent assumptions about observations. Observations must be normally distributed in the population, variances of populations are homogeneous, and observations in the sample must be randomly drawn. Parametric tests are used with interval or ratio scales. Examples are the t-test and ANOVA Nonparametric tests are used with data in a nominal or ordinal scale. One cannot assume that there is a normal distribution of data of that there is homogeneity of variances. Examples are the Chi-Square test, the Mann-Whitney U test, and the Wilcoxon test 2. Inferential Statistical Tests allow generalization about populations using data from samples The test for nominal data is the Chi-Square test
Statistics 7 Tests for ordinal data are the Kolmogorov test, the Kolmogorov Smirnov test, the Mann-Whitney U test, and the Wilcoxon Matched-Pairs Signed-Ranks test. Tests for interval and ratio data include the t-test, Analysis of Variance (ANOVA), Analysis of Covariance (ANCOVA), and Post-Hoc ANOVA tests. * All of this information is adapted from: Bullard, B., Lawless, L., Williams, M., & Bergstrom, D. (1999). Clinical Mental Health Counselor: Handbook and Study Guide. Dubuque, IA: Kendall/Hunt.