In this module we will cover Correla4on and Validity.

A correla4on coefficient is a sta4s4c that is o:en used as an es4mate of measurement, such as validity and reliability. You will learn the strength of correla4on and the direc4on of correla4on. Validity is a crucial aspect of test quality. You will learn three types of validity: content validity, criterion validity, and construct validity.

Correla4on is a sta4s4cal procedure used to measure the rela4onship or associa4on between two variables. Correla4ons allow us to answer ques4ons such as: Are athletes poor scholars? or Do students with high grades in high school tend to get high grades in college? You may ask: Why do we need correla4on? A:er all, this is a measurement course, not a sta4s4cal course. However, correla4on is used heavily in certain aspects of measurement including es4ma4on of validity and reliability.

A correla4on sta4s4c measures the rela4onship between two variables. There are two aspects to this associa4on. The strength of the rela4onship and the direc4on of the rela4onship. Correla4on coefficients range from nega4ve 1 to posi4ve 1.

The lemer r is used to denote the correla4on coefficient. The closest a coefficient gets to - 1 or +1 the stronger the rela4onship between two variables.

ScaMerplots can be used to visually represent the associa4on between two variables. They can also show the strength and direc4on of the rela4onship. The strength of the rela4onship shows how accurately a predic4on can be made from the test score to the criterion. The direc4on of the rela4onship could be posi4ve and nega4ve. The posi4ve associa4on between two variables denotes that as one increases, the other increases. Whereas the nega4ve rela4onship denotes that as one increases, the other decreases.

Here is an example that uses a reading readiness test (denoted by X) to predict future reading skills (denoted by Y). We collected ten students scores on the reading readiness test and future reading skills. Each student has two scores. These scores can be transferred into a two- dimensional graph, known as a scamerplot. The X axis represents reading readiness scores and the Y axis represents future reading skills scores. Each dot in the scamerplot represents each student with two scores. For instance, Sheila has two scores of (2 and 1). Here is the corresponding dot. The rest of students also have corresponding dots in the scamerplot. Based a special formula and the data we have, the correla4on coefficient can be calculated. The correla4on coefficient in this case is a.529 which is a posi4ve and moderate correla4on.

Generally speaking, the rela4onships between two variables can be classified into four categories: A posi4ve correla4on, a nega4ve correla4on, no rela4onship, and a curvilinear rela4onship.

As men4oned earlier, a posi4ve correla4on suggests that as scores on Test A increase, scores on Test B increase. This scamerplot shows the posi4ve rela4onship between Tests A and B. There is a pamern among dots from the lower le: to the upper right.

A nega4ve correla4on suggests that as scores on Test A increase, scores on Test B decrease. A popular example is that when the number of absences increases, GPA decreases. This scamerplot shows this nega4ve correla4on between GPA and the number of absences. There is a pamern among dots from the upper le: to the lower right.

No rela4onship means that the two variables are not associated. The correla4on coefficient would be close to zero. This scamerplot represents no rela4onship between Tests A and B. As you can see, there is no pamern among these dots.

A curvilinear rela4onship between two variables suggests that as scores on Test A increase, scores on Test B first increase, then decrease. There is a curvilinear pamern among these dots in this scamerplot.

Another thing we should know is the restric4on of range problem. When we look at the en4re scamerplot there is a clear posi4ve correla4on between the tests. But when a truncated or restricted range of scores are examined, such as high scores on both tests, the rela4onship becomes weak and the direc4on is unclear. A typical example of this situa4on is to explore the rela4onship between students scores on the GRE test and their graduate GPA scores of the first year. This is a restricted range case because most of the students who are admimed into graduate schools have higher GRE scores.

There are several different correla4on coefficients. Two types of correla4ons are commonly used: the Pearson Product- Moment Correla4on and the Rank Difference Correla4on. The Pearson Product Moment Correla4on used student s raw, con4nuous scores and requires at least 30 samples. For the rank difference correla4on, the students raw scores are ordered, and then the ranks or ordered numbers are used. This coefficient allows to measure the associa4on for small sample sizes. For informa4on on Pearson Product- Moment, refer to Appendix B of your text. For informa4on on Rank difference correla4on, refer to Chapter 14.

Cau4on: correla4on does not imply causality. You cannot conclude that one is the result of another just because two distribu4ons are related. For instance, there may be a high correla4on between the sales of ice cream and the number of drownings. It s weird to say that ea4ng ice cream causes drowning. The reasonable explana4on is that perhaps both are affected by temperature.

We measure validity because it provides informa4on about the usefulness of a test in a par4cular situa4on. We always make inferences from test scores so it is important to make sure that those inferences are appropriate. That is why we care about validity.

Recall that validity and reliability are two very important aspects of test quality. Validity is the extent to which a test measures what it s supposed to measure if a test is not measuring what it s supposed to measure, it s rather pointless at best. Reliability is the extent to which a test score is consistent. If test scores aren t reliable, they don t give us good informa4on about a student s true achievement level.

There are three popular types of validity: Content validity, criterion- related validity, and construct validity. Criterion- related validity can be predic4ve criterion validity and concurrent criterion validity.

Content validity can indicate if the instruc4on matches the objec4ves, if the test blueprint matches the instruc4on or objec4ves and, if the test items match the test blueprint. To increase content validity, each item should be linkable to the instruc4on or objec4ves. And there should be a proper balance of content and a proper balance of cogni4ve process levels.

Content validity is relevant to achievement tes4ng. To be content valid, the test should be representa4ve of the topics and cogni4ve processes given in the course unit, and must be consistent with the objec4ves and the instruc4on. Evidence for content validity is obtained by logical analyzing the course objec4ves and instruc4on, and comparing these to the test.

To inves4gate criterion validity, test scores are correlated with a criterion variable. Criterion validity is established through an empirical process, which first gather data on the test and criterion of interest and then create a scamerplot and compute correla4on between two scores. The computed correla4on represents criterion validity.

There are two types of criterion validity: predic4ve criterion validity and concurrent criterion validity. Predic4ve criterion validity is computed by correla4ng test scores from an instrument with test scores from a criterion measure taken in the future. In contrast, concurrent criterion validity is the correla4on between the scores from an instrument with the scores from a measure taken at the same +me.

Here are some examples of predic4ve criterion validity: Correla4on between SAT and college GPA Correla4on between GRE and graduate GPA Correla4on between the score on job placement and work produc4vity Correla4on between the score on reading readiness and future reading performance

Here are some examples of concurrent criterion validity: correla4on between the score on short group IQ test and the score on individually administered IQ test. correla4on between the score on standardized achievement test and the score on teacher made assessments correla4on between the score on the Terra Nova reading test and the score on the FCAT reading test. They are concurrent criterion validity because each pair of tests are administered at the same 4me.

Concurrent coefficients are generally higher than predic4ve coefficients. This does not mean that the test with the higher validity coefficient is bemer in a specific situa4on. Group variability affects the size of the validity coefficient. Higher validity coefficients are derived from heterogeneous groups than from homogenous ones. The relevance and reliability of the criterion needs to be considered, as well as the test itself. A poor criterion will lower the validity coefficient.

Construct validity can be thought of in two ways. In the first way, construct validity is simply another type of validity, like content validity, predic4ve, and concurrent validity. It is used specifically to establish the validity of a test that is intended to measure some abstract trait or skill, such as intelligence, anxiety, or mechanical ap4tude. The construct validity of a test is o:en inves4gated by correla4ng the test with tests of other amributes that, theore4cally, ought to be related to the trait that this test is supposed to measure.

For example, scores on a test of intelligence ought to show a strong rela4onship with measures of achievement. For some tests, this is the only type of validity inves4ga4on that can be carried out. There may be no direct connec4on to another measure that could be used for predic4ve or concurrent validity analyses.

The second way of thinking about construct validity is to think of this type of validity as the founda4onal, or all- encompassing type of validity. In this sense, all the other types of validity can be seen as also providing informa4on about whether the test is actually measuring the abstract trait, or construct, that was theorized. In either way, construct validity is important in helping define the actual construct that a test measures.

Construct validity inves4gates whether the test score actually taps an abstract trait or ability (e.g., intelligence, test anxiety, academic mo4va4on). For example: Does the Stanford- Binet IQ test measure the abstract trait of intelligence?

In other words, as for construct validity, we are interested in making inferences about the amount of trait that is possessed. Therefore, establishing this form of validity usually involves finding evidence that agrees with logical and theore4cal expecta4on, which means that construct validity is obtained by correla4ng observed test scores with other scores or measures.

Here is an example of construct validity. To establish construct validity of the Academic Mo4va4on Test: We could have teachers rate students mo4va4on and we would expect teacher ra4ngs to correlate posi4vely with mo4va4on scale. If so, we would have evidence for construct validity of the AMT. We could examine completed homework assignments (we would expect highly mo4vated students to complete homework, and for this to relate to the mo4va4on measure).

It is some4mes argued that all validity is construct validity. This is because all evidence that helps establish any type of test validity, helps establish the construct as well. It all helps determine if the test is measuring what it is supposed to.