Running head: USING REPLICATION TO INFORM DECISIONS ABOUT SCALE-UP

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Running head: USING REPLICATION TO INFORM DECISIONS ABOUT SCALE-UP"

Transcription

1 Using Replication 1 Running head: USING REPLICATION TO INFORM DECISIONS ABOUT SCALE-UP Using Replication to Help Inform Decisions about Scale-up: Three Quasi-experiments on a Middle School Unit on Motion and Forces Bill Watson Curtis Pyke Sharon Lynch Rob Ochsendorf The George Washington University This work was conducted by SCALE-uP: A collaboration between George Washington University and Montgomery County Public Schools (MD); Sharon Lynch, Joel Kuipers, Curtis Pyke, and Bonnie Hansen-Grafton, principal investigators. Funding for SCALE-uP was provided by the National Science Foundation, the U.S. Department of Education, and the National Institute of Health (REC ). Any opinions, findings, conclusions, or recommendations are those of the authors and do not necessarily reflect the position or policy of endorsement of the funding agencies.

2 Using Replication 2 Using Replication to Help Inform Decisions about Scale-up: Three Quasi-experiments on a Middle School Unit on Motion and Forces Research programs that include experiments are becoming increasingly important in science education as a means through which to develop a sound and convincing empirical basis for understanding the effects of interventions and making evidence-based decisions about their scale-up of in diverse settings. True experiments, which are characterized by the random assignment of members of a population to a treatment or a control group, are considered the gold standard in education research because they reduce the differences between groups to only random variation and the presence (or absence) of the treatment (Subotnik & Walberg, 2006). For researchers, these conditions increase the likelihood that two samples drawn from the same population are comparable to each other and to the population, thereby increasing confidence in causal inferences about effectiveness (Cook & Campbell, 1979). For practitioners, those making decisions about curriculum and instruction in schools, the Institute for Educational Sciences at the US Department of Education (USDOE) suggests that only studies with randomization be considered as strong evidence or possible evidence of an intervention s effectiveness (Institute for Educational Sciences, 2006). Quasi-experiments are also a practical and valid means for the evaluation of interventions when a true experiment is impractical due to the presence of natural groups, such as classes and schools, within which students are clustered (Subotnik & Walberg, 2006). In these circumstances, a Quasi-experiment that includes careful sampling (e.g., random selection of schools), a priori assignment of matched pairs to a treatment or control group and/or a pretest used to control for any remaining group differences can often come close to providing the rigor of true experiment (Subotnik & Walberg, 2006). However, there are inherent threats to internal validity in Quasi-experimental designs that the research must take care to address with supplemental data. Systematic variation introduced through the clustering of subjects that occurs in Quasi-experiments can compete with the intervention studied as a cause of differences observed. Replications of quasi-experiments can provide opportunities to adjust procedures to address some threats to the internal validity of Quasi-experiments and can study new samples to address external validity concerns. Replications can take many forms and serve a multitude of purposes (e.g., Hendrick, 1990; Kline, 2003). Intuitively, a thoughtful choice of replication of a quasi-experimental design can produce new and improved result or increase the confidence researchers have in the presence of a treatment effect found in an initial study. Therefore, replication can be important in establishing the effectiveness of an intervention when it fosters a sense of robustness in results or enhances the generalizability of findings from stand-alone studies (Cohen, 1994; Robinson & Levin, 1997). This paper presents data to show the utility in combining a high quality quasiexperimental design with multiple replications in school-based scale-up research. Scale-up research is research charged with producing evidence to inform scale-up decisions; decisions regarding which innovations can be expected to be effective for all students in a range of school contexts and settings what works best, for whom, and under what conditions (Brown, McDonald, & Schneider, 2006, p. 1). Scaling-up by definition is the introduction of interventions whose efficacy has been established in one context into new settings, with the goal of producing similarly positive impacts in larger, frequently more diverse, populations (Brown et al., 2006).

3 Using Replication 3 Our work shows that a good first step in scaling-up an intervention is a series of experiments or quasi-experiments at small scale. Replication in Educational Research Quasi-experiments are often the most practical research design for an educational field study, including scale-up studies used to evaluate whether or not an intervention is worth taking to scale. However, because they are not true experiments and therefore do not achieve true randomization, the possibility for systematic error to occur is always present, and, with it, the risk of threats to internal and external validity of the study. For the purposes of this discussion, we consider internal validity to be the validity with which statements can be made about whether there is a causal relationship from one variable to another in the form in which the variables were manipulated or measured (Cook & Campbell, 1979, p. 38). External validity refers to the approximate validity with which conclusions are drawn about the generalizability of a causal relationship to and across populations of persons, settings, and times (Cook & Campbell, 1979). Unlike replications with experimental designs, which almost always add to the efficacy of a sound result, the replication of a quasi-experiment may not have an inherent value if the potential threats to validity found in the initial study are not addressed. Replication: Frameworks In social science research, replication of research has traditionally been understood to be a process in which different researchers repeat a study s methods independently with different subjects in different sites and at different times with the goal of achieving the same results and increasing the generalizability of findings (Meline & Paradiso, 2003; Thompson, 1996). However, the process of replication in social science research in field settings is considerably more nuanced than this definition might suggest. In field settings, both the intervention and experimental procedures can be influenced by the local context and sample in ways that change the nature of the intervention or the experiment, or both from one experiment to another. Before conducting a replication, an astute researcher must therefore ask: In what context, with what kinds of subjects, and by which researchers will the replication be conducted? (Rosenthal, 1990). The purpose of the replication must also be considered: Is the researcher interested in making adjustments to the study procedures or intervention to increase the internal validity of findings or will the sampling be adjusted to enhance the external validity of initial results? A broader view of replication of field-based quasi-experiments might enable classification of different types according the multiple purposes for replication when conducting research in schools. Hendrick (1990) proposed four kinds of replication that take into account the procedural variables associated with a study and contextual variables (e.g., subject characteristics, physical setting). Hendrick s taxonomy proposes that an exact replication adheres as closely as possible to the original variables and processes in order to replicate results. A partial replication varies some aspects of either the contextual or procedural variables, and a conceptual replication radically departs from one or more of the procedural variables. Hendrick argued for a fourth type of replication, systematic replication, which includes first a strict replication and then either a partial or conceptual replication to isolate the original effect and explore the intervention when new variables are considered. Rosenthal (1990) referred to such a succession of replications as a replication battery: "The simplest form of replication battery requires two replications of the original study: one of these replications is as similar as we can make it to the original study, the other is at least

4 Using Replication 4 moderately dissimilar to the original study" (p. 6). Rosenthal (1990) argued that if the same results were obtained with similar but not exact Quasi-experimental procedures, internal validity would be increased because differences between groups could more likely be attributed to the intervention of interest and not to experimental procedures. Further, even if one of the replications is of poorer quality than the others, Rosenthal argued for its consideration in determining the overall effect of the intervention, albeit with less weight than more rigorous (presumably internally valid) replications. More recently, Kline (2003) also distinguished among several types of replication according to the different research purposes they address. For example, Kline s operational replications are like Hendrick s (1990) exact replication: the sampling and experimental methods of the original study are repeated to test whether results can be duplicated. Balanced replications are akin to partial and conceptual replications in that they appear to address the limitations of quasi-experiments by manipulating additional variables to rule out competing explanations for results. In a recent call for replication of studies in educational research, Schneider (2004) also suggested a degree of flexibility in replication, describing the process as "conducting an investigation repeatedly with comparable subjects and conditions" (p. 1473) while also suggesting that it might include making "controllable changes" to an intervention as part of its replication. Schneider s (2004) notion of controllable changes, Kline s (2003) description of balanced replication, Hendrick s (1990) systematic replication, and Rosenthal s (1990) argument in favor of the replication battery all suggest that a series of replications taken together can provide important information about an intervention s effectiveness beyond a single Quasiexperiment. Replication: Addressing Threats to Internal Validity When multiple quasi-experiments (i.e., replications) are conducted with adjustments, the threats to internal validity inherent in quasi-experimentation might be more fully addressed (Cook & Campbell, 1979). Although changing quasi-experiments in the process of replicating them might decrease confidence in the external validity of an initial study finding, when a replication battery is considered, a set of studies might provide externally valid data to contribute to decision making within and beyond a particular school district. The particular threats to internal validity germane to the studies reported in this paper are those associated with the untreated control group design with pretest and posttest (Cook & Campbell, 1979). This classic and widely implemented quasi-experimental design features an observation of participants in two non-randomly assigned groups before and after one of the groups receives treatment with an intervention of interest. The internal validity of a study or set of studies ultimately depends on the confidence that the researcher has that differences between groups are caused by the intervention of interest (Cook & Campbell, 1979). Cook and Campbell (1979) provided considerable detail about threats to internal validity in quasi-experimentation that could reduce confidence in claims of causality (p ). However, they concluded that the untreated control group design with pretest and posttest usually controls for all but four threats to internal validity: selection-maturation, instrumentation, differential regression to the mean, and local history. Table 1 briefly describes each of these threats. In addition, they are not mutually exclusive. In a study of the effectiveness of curriculum materials, for example, the extent to which the researchers are confident differential regression to the mean is not a threat relies upon their confidence that sampling methods have produced two samples similar on performance and demographic variables

5 Using Replication 5 (selection-maturation) and that the assessment instrument has similar characteristics for all subjects (instrumentation). Cook and Campbell (1979) suggest that replication plays a role in establishing external validity by presenting the simplest case: An exact replication (Hendrick, 1990) of a quasiexperiment in which results are corroborated and confidence in internal validity is high. However, we argue that the relationship between replication and validity is more complex, given the multiple combinations of outcomes that are possible when different kinds of replications are conducted. Two dimensions of replication seem particularly important. The first is the consistency of results across replication. The second is whether a replication addresses internal validity threats that were not addressed in a previous study (i.e., it improves upon the study) or informs the interpretation of the presence or absence of threats in a prior study (i.e., it enhances interpretation of the study). In an exact replication, results can either be the same as or different from results in the original quasi-experiment. If results are different, it seems reasonable to suggest that some element of the local history - perhaps schools, teachers, or a cohort of students - could have an effect on the outcomes, in addition to (or instead of) the effect of an intervention. A partial replication therefore seems warranted to adjust the quasi-experimental procedures to address the threats. A partial replication would also be appropriate if the results are the same, but the researchers do not have confidence that threats to internal validity have been adequately addressed. Indeed, conducting partial replications in either of these scenarios is consistent with the recommendation of Hendrick (1990) to consider results from a set of replications when attempting to determine the effectiveness of an intervention. Addressing threats to validity with partial replication, is, in turn, not a straightforward process. What if results of a partial replication of a quasi-experiment are not the same as those found in either the original quasi-experiment or its exact replication? If the partial replication addresses a threat to internal validity where the original quasi-experiment or its exact replication did not, then the partial replication improves upon the study, and its results might be considered the most robust. If threats to internal validity are still not adequately addressed in the partial replication, the researcher must explore relationships between all combinations of the quasiexperiments. Alternatively, if the partial replication provides data that help to address threats to the internal validity of the original quasi-experiment or its exact replication, then the partial replication enhances interpretation of the original study, and its results might be considered with the results of the previous study. Figure 1 provides a possible decision tree for researchers faced with data from a quasiexperiment and an exact replication. Because multiple replications of quasi-experiments in educational research are rare, Figure 1 is more an exercise in logic than a decision matrix supported by data produced in a series of actual replication batteries. However, the procedures and results described in this paper will provide data generated from a series of quasi-experiments with practical consequences for the scale-up of a set of curriculum materials in a large, suburban school district. We hope to support the logic of Figure 1 by applying it to the example to which we now turn. Replication in Practice: The SCALE-uP Studies The Scaling-up Curriculum for Achievement, Learning, and Equity Project (SCALE-uP) conducted quasi-experiments with replication to study the effectiveness of three sets of middle school science curriculum materials as the first step in studying how they scale-up in a large

6 Using Replication 6 suburban school district. The SCALE-uP research program called for the selection of three sets of curriculum materials that have a favorable rating according to the instructional analysis of the American Association for the Advancement of Science (AAAS) Project 2061 curriculum analysis protocol (Kesidou & Roseman, 2002). First, the original research questions for each quasi-experiment sought to determine the effectiveness of the materials: 1) Does use of curriculum materials that meet a majority of criteria in the AAAS Project 2061 instructional analysis produce higher mean scores on a test of concept understanding when compared to a district s regular curriculum offerings? 2) Does disaggregating the outcome data reveal differences in achievement for different subgroups of students not observed in the reports on aggregate mean scores? Next, replications were conducted to verify the initial findings and provide evidence of reliability of the procedures. In retrospect, the intent was closest to an operational (Kline, 2003) or an exact (Hendrick, 1990) replication. Finally, the evidence obtained from quasi-experiments and their replications was to be used to inform a decision to scale-up each unit within the school district. In the first quasi-experiment, SCALE-uP found that the first unit studied, Chemistry That Applies (State of Michigan, 1993) was effective for all students when compared to curriculum materials used in the control group. Replication with a new cohort of students in the same schools yielded similar results, providing support for a decision to scale-up the unit to all schools in the district (Lynch, Kuipers, Pyke, & Szesze, 2005). Data collected during the initial study and replication of the second unit, The Real Reasons for the Seasons (Lawrence Hall of Science, 2000) suggested that the unit was not as effective as the materials used in the control group for any subgroup of students (Pyke, Lynch, Kuipers, Szesze, & Watson, 2005a). The second unit was therefore not recommended for scale-up in the district. The results for the study of the third unit, Exploring Motion and Forces: Speed, Acceleration, and Friction (Harvard-Smithsonian Center for Astrophysics, 2001) were not as straightforward. They motivated an investigation of the internal validity of the studies and the role that replication can play in identifying suitable data for informing decisions about scale-up in a field setting. SCALE-uP Studies of Motion and Forces Motion and Forces (Harvard-Smithsonian Center for Astrophysics, is a six-week physical science curriculum unit designed for use with students in fifth through eighth grades that received an acceptable rating according to the SCALE-uP application of the Project 2061 curriculum analysis (Ochsendorf, et al., 2001). Its 18 Explorations are inquiry-centered and activity-based, with an emphasis on students direct experience with phenomena. The curriculum materials consist of a Teacher Manual and a student Science Journal, but no traditional student textbook. Materials required to conduct the Explorations (e.g., sliding disks, ramps, rolling carts) are often constructed and assembled by the students. The unit's target concepts are closely associated with the following target concept from Benchmarks for Science Literacy (AAAS, 1993): Changes in speed or direction of motion are caused by forces. The greater the force is, the greater the change in motion will be. The more massive an object is, the less effect a given force will have (4F, 3-5, #1; p. 89). an object at rest stays that way unless acted on by a force an object in motion will continue to move unabated unless acted on by a force (p. 90).

7 Using Replication 7 The first SCALE-uP quasi-experiment conducted on Motion and Forces suggested that the unit was effective for some subgroups of students but not for others. The replication of the quasi-experiment confirmed this unusual result. Because scale-up (and SCALE-uP) is concerned with the effectiveness of the intervention for all students, neither the researchers nor the school district administrators were comfortable interpreting the data as positive evidence for scaling-up the unit. Given the investment that the school district had made in the materials and the potential for the unit to be effective, a balanced, or partial, replication (Hendrick, 1990; Kline, 2003) of the study (this time, with greater attention to potential threats to internal validity) seemed warranted. By conducting the second replication, SCALE-uP was in a unique position to identify and address threats to validity across all three quasi-experiments and to consider data from the set of quasi-experiments in making a decision about the scale-up of the unit. General Design, Population, and Sampling for the Motion & Forces Studies Each study of the effectiveness of Motion & Forces employed an untreated control group design with pretest and posttest (Cook & Campbell, 1979), intended to test for differences in outcomes between equivalent groups. There were three Quasi-experiments conducted in three consecutive school years. Quasi-experiment 1 and Quasi-experiment 2 (an exact replication) were conducted in the same set of schools randomly selected from the sampling frame (see below), while Quasi-experiment 3 (a partial replication) was conducted in a different set of schools, also randomly selected from the sampling frame. Population and Sample The population under investigation was 6 th grade students in Montgomery County Public Schools (MCPS). MCPS is a large Maryland school district (approximately 136,000 students total, 32,000 in grades 6-8) located in the Washington, DC, metropolitan area, with a student population that is rapidly becoming more diverse, culturally, linguistically, and socioeconomically. MCPS consistently occupies a position among the top-performing school districts in the State of Maryland. The study sample for each quasi-experiment consisted of 6 th grade students from MCPS middle schools. Schools were used as the sampling unit, with schools randomly selected from a sampling frame consisting of 5 different School Profile Categories (SPCs). The SPCs were developed by SCALE-uP researchers and MCPS administrators to identify five groups of schools, each containing schools that fit a similar demographic and achievement profile. Inclusion in an SPC was determined by a proxy for socio-economic status, the percentage of students in attendance eligible for the Free and Reduced Price Meals System (FARMS) and math and reading achievement data from 5 th grade nationally norm-referenced tests. Two schools were randomly selected from each SPC. One school from each pair was randomly selected as a treatment school; the other was assigned to the comparison condition. This sampling method was developed to produce two samples representative of the study population, with enough students to provide power for significance tests on data for subgroups disaggregated by FARMS, ethnicity, and eligibility for English for Speakers of Other Languages (ESOL) and special education services. The Motion and Forces Assessment

8 Using Replication 8 Student understanding of the target idea was determined by a score on the Motion and Forces Assessment (MFA) a curriculum-independent posttest given by teachers at the end of instruction. An analysis and development procedure developed in collaboration with the AAAS Project 2061 (DeBoer, 2005; Stern & Ahlgren, 2002) was used to develop the assessment. The MFA consists of 10 items (6 constructed response and 4 selected response) that provided the students with 4 different physical phenomena to respond to questions about motion and force. Raters coded student responses according to a rating guide that categorized student responses according to their alignment with a scientifically appropriate understanding of the target benchmark (average inter-rater reliability = 0.82). A weighting scheme that distributes contribution of students' ideas about different parts of the benchmark and balances the contribution of selected response and constructed response items and the difficulty and discrimination of each item was applied to raw scores to calculate a scale score for student understanding. A standard-setting process (Plake & Hambleton, 2001; Pyke & Hanson, 2005) established cut scores that distinguish among four levels of understanding of the target ideas: 0-20 = no understanding; = context-limited understanding; = some fluency with ideas; = flexible understanding. General SCALE-uP Data Analysis Procedures The statistical analyses to test the research hypotheses used Analysis of Variance (ANOVA) and Analysis of Covariance (ANCOVA) techniques, with pretest scores as the covariate. Data were analyzed for overall differences in the mean posttest MFA score between the treatment and comparison conditions. They were also disaggregated according to gender, FARMS, ethnicity, and eligibility for ESOL and special education services. Assumptions for these analyses were generally acceptable (see Pyke, Lynch, Kuipers, Szesze, & Watson, 2004, 2005b, 2006, for complete analyses) according to the guidelines established in the ANOVA/ANCOVA literature (c.f., Tabachnick and Fidell, 1989). For simplicity in presenting results in this paper, only data disaggregated by Free and Reduced-price Meals Status (FARMS) subgroups are considered. (Now FARMS represents the subgroup that had never been eligible for services; Prior FARMS, the subgroup that was previously but not currently eligible for services; and Now FARMS, the subgroup eligible for services at the time of the study.) In addition to ANOVA/ANCOVA analyses, when significant main effects and interactions were found, exploratory follow-up analyses for simple main effects were conducted to explain the effects. Finally, effect size was calculated for each subgroup for all dependent variables by subtracting the adjusted comparison mean from the adjusted treatment mean and dividing the difference by the study sample standard deviation. All analyses were performed using SPSS for Windows, Version 12. Quasi-experiment 1 Demographics of the Quasi-experiment 1 Sample The study sample for Quasi-experiment 1 was selected according to the procedure described above. Comparison of the students in the treatment and comparison conditions suggested that the random selection of schools from five SPCs and the random assignment of schools to the treatment condition resulted in two samples of students probabilistically similar in demographic and prior performance variables (see Tables 1 and 2).

9 Using Replication 9 Pretest Procedures Copies of the MFA were shipped to schools by the MCPS program evaluation team. The MFA was administered to students in the treatment and comparison conditions by classroom teachers on or before the date that instruction with the target benchmark began according to instructions provided by SCALE-uP. All assessments were collected at the school by the coordinator and sent to the program evaluation team through the MCPS mail delivery system, where they were picked up by SCALE-uP researchers. Pretests were rated by trained science graduate students and MCPS 8 th grade teachers. Intervention Procedures: Instructional Attributes Materials used to teach Motion & Forces were distributed to teachers in the treatment condition from the MCPS central materials center before the fourth quarter of the school year. Teachers in the treatment condition were instructed to teach Motion & Forces according to a set of fidelity guidelines developed by SCALE-uP in conjunction with MCPS administrators and teachers (O'Donnell, Lynch, Watson, & Rethinam, 2007). Students in the treatment condition received photocopies of pages of the Student Journal for each lesson. Teachers in the comparison condition chose curriculum materials from a list of materials considered acceptable in MCPS to teach the target benchmark, as their usual practice dictated. Posttest Procedures Several weeks before the end of the quarter, copies of the MFA were shipped to schools by the MCPS program evaluation team. The MFA was administered to students in the treatment and comparison conditions by classroom teachers on or immediately after the date that instruction ended according to instructions provided by SCALE-uP. All assessments were collected at the school by the coordinator and sent to the program evaluation team through the MCPS mail delivery system, where they were picked up by SCALE-uP researchers. The same raters who rated the pretests rated the posttests. Results Pretest. A 1 X 2 between-groups Analysis of Variance (ANOVA) indicated no statistically significant difference in pretest score between the treatment and comparison conditions. Posttest. A 1 X 2 between-groups Analysis of Covariance (ANCOVA) indicated a statistically significant main effect in favor of the treatment condition, with F(1, 2169) = 6.44, p <.05, Cohen's d =.10. ES =.10. The adjusted mean score for the treatment condition (M = 56.98, SD = 22.17) and the comparison condition (M = 54.72, SD = 22.64) were both within the same level of understanding, some fluency with ideas. A 2 X 3 between-groups Analysis of Covariance (ANCOVA) indicated a statistically significant interaction between curriculum condition and FARMS, with F(2, 2165) = 8.094, p <.05. Follow-up tests were conducted to determine the nature of the interaction. These tests revealed that differences between the Never FARMS subgroup and the Prior FARMS and Now FARMS subgroups were the same in both conditions, but that only the Never FARMS subgroup mean was significantly higher in the treatment condition than in the comparison condition. There was no statistically significant difference in the means for the other two subgroups between the treatment and comparison conditions (see Figure 2).

10 Using Replication 10 Potential Threats to Internal Validity Selection-Maturation. Table 1 indicates that the treatment and comparison groups were similar in terms of their standardized test scores prior to the study. However, the average prior science grade point average (GPA) during the current school year was significantly higher in the comparison condition, perhaps suggesting a difference in prior knowledge. Pretest scores indicated no significant differences between students with similar levels of FARMS eligibility. Instrumentation. Reliability data and skewness of score distributions suggest a possible instrumentation threat. The MFA was shown to have acceptable validity (Pyke & Ochsendorf, 2006), but only modest reliability (Cronbach's a =.54). Although in the aggregate the tests of assumptions for parametric statistics were not violated, examination of pretest scores indicated that pretest scores for the Now FARMS subgroup in both conditions were negatively skewed (i.e., skew statistics were more than two times the standard error of skewness, cf. Tabachnick & Fidell, 1989). The skewness of the pretest scores for this subgroup, combined with the low mean score (M = 34.19, SD = 17.19), suggests that the MFA did not reliably detect differences in performance for students with the lowest scores. Because the distributions were similarly skewed, comparisons between Now FARMS subgroups does not appear to be jeopardized, but pretest main effects for FARMS are more difficult to interpret. Further, skewed distributions for pretest scores used as covariates in the ANCOVA raises concerns about the interpretation of adjusted posttest scores. Scores for the Now FARMS subgroup in both conditions could be underadjusted, leading to more conservative estimates of posttest scores. Differential Regression to the Mean. The negatively skewed pretest and posttest distributions for the Now FARMS subgroups in the treatment and comparison conditions resulted in an unexpected proportion of scores from this subgroup below 20, the lowest cut-point on the MFA scale. Practically, differences between scores below 20 are not distinguishable, so MFA scores are likely to underestimate student understanding and include more error at pretest than at posttest for the Now FARMS subgroup. This subgroup is likely to have experienced gain due to statistical regression to the mean rather than (or in addition to) the treatment unit. However, this effect is more relevant to within group comparisons between levels of FARMS and not differential regression to the mean because the floor effect is present in both conditions. Local History. There are four potential local history concerns in Quasi-experiment 1 that could jeopardize the validity of the data, three of which concern differences in implementation of Motion & Forces among the treatment schools. First, little information beyond retrospective reflection was available about the curriculum materials used in the comparison condition and the fidelity with which Motion & Forces was implemented. (Fidelity of Implementation is used in this context to mean whether or not the intervention was implemented in accordance with the original program design. For a full discussion of Fidelity of Implementation in the Motion & Forces studies, please see O Donnell et al., 2007.) Therefore, it is unknown whether or not there was any diffusion of the intervention or results for students in the treatment condition due to the implementation of the unit. Second, according to MCPS staff and teachers who implemented the unit, the entire unit was completed in only 5 of 54 classrooms in which it was implemented; most classrooms completed between 9 and 13 of the unit's 18 explorations. MCPS personnel attributed this variation to a combination of two factors. The unit took teachers longer than the suggested 6-8 weeks to implement. Because it was taught in the fourth quarter of the school year, many teachers were unable to finish it before the end of the year. This situation introduces a threat to the fidelity with which the unit was taught in that it could not have been implemented with full

11 Using Replication 11 fidelity if some of it was not implemented at all. It also introduces a potential difference between instruction provided in the treatment and comparison conditions, with students in the comparison condition potentially receiving broader instruction toward mastery of the target benchmark. Third, students in the treatment condition did not receive individual, bound, published student journals as suggested by the developer of the unit. This is a broader local history threat in that neither the district nor SCALE-uP had the resources to purchase student journals for all students. It was recommended that teachers make copies of individual pages for students to use. The fourth threat is a potential threat in classrooms and schools within and across conditions. The pretest was distributed to all students by their classroom teacher. Pretest conditions were not standardized, suggesting that perhaps some students had an opportunity to learn from the pretest. In addition, teachers had access to the pretest, suggesting that they could have gleaned information about the target benchmark and its assessment that could have affected teaching. Cook and Campbell (1979) suggested that diffusion of treatment could occur when teachers have the opportunity to talk with each other about interventions or are made aware of the salient features of the intervention. We consider a possible pretest effect to be a contamination effect related to diffusion. Quasi-experiment 2: Exact Replication Demographics of the Quasi-experiment 2 Sample The study sample for Quasi-experiment 2 was selected according to the procedure described above. For Quasi-experiment 2, retrospectively considered an exact replication of Quasi-experiment 1, the study sample consisted of students from the same treatment and comparison schools. The two samples of students were probabilistically similar in prior performance variables (see Tables 1 and 2). Changes to Procedures There were no differences in pretest procedures between Quasi-experiment 1 and Quasiexperiment 2. Despite the local history threat due to teachers' potentially changing instruction according to the pretest or students learning from the pretest, SCALE-uP did not have the resources to alter pretest procedures for Quasi-experiment 2. The implementation procedures changed in that materials used to teach Motion & Forces were distributed to teachers in the treatment condition from the MCPS central materials center before the third quarter of the school year. The change from fourth quarter implementation to third quarter implementation was made to address concerns about the unit not being finished in most treatment classrooms. Other instructional attributes in Quasi-experiment 2 were the same as those in Quasi-experiment 1. Posttest procedures for Quasi-experiment 2 were also the same as those for Quasi-experiment 1. Results Pretest. A 1 X 2 between-groups Analysis of Variance (ANOVA) indicated a statistically significant difference in pretest score between the treatment and comparison conditions. Follow-up analyses suggested that the aggregate mean difference at pretest might be attributed to significantly greater pretest scores for the Now FARMS subgroup in the treatment condition than in the comparison condition (F, 1, 567 = 6.270, p <.05).

12 Using Replication 12 Posttest. A 1 X 2 between-groups Analysis of Covariance (ANCOVA) indicated no statistically significant main effect in for quasi-experimental condition, with F(1, 2251) = 2.546, p =.11. The adjusted mean score for the treatment condition (M = 51.18, SD = 22.54) and the comparison condition (M = 52.53, SD = 21.68) were both within the same level of understanding, some fluency with ideas. A 2 X 3 between-groups Analysis of Covariance (ANCOVA) indicated a statistically significant interaction between curriculum condition and FARMS, with F(2, 2247) = , p <.05. Follow-up tests were conducted to determine the nature of the interaction. These tests revealed that differences between the Never FARMS subgroup and the Prior FARMS and Now FARMS subgroups were the same in both conditions. There was no significant posttest difference in mean scores for the Never FARMS subgroup, but the Prior FARMS subgroup and the Now FARMS subgroup means in the comparison condition were greater than those in the treatment condition (see Figure 3). The 95% confidence interval for the means for the Prior FARMS and Now FARMS subgroups in the comparison condition and for the Prior FARMS subgroup in the treatment condition suggest that group means could be in either the second or third level of understanding. Potential Threats to Internal Validity Selection-Maturation. There was a statistically significant difference in prior science GPA in Quasi-experiment 2, just as there had been in Quasi-experiment 1. Further, the pretest difference observed suggested another potential selection threat. Finally, additional analysis of the pretest difference on the MFA indicated that scores for the Now FARMS subgroup were greater in the treatment condition than in the comparison condition. This suggests a possible selection-maturation effect for Now FARMS students in the treatment condition that would only be partially mitigated by adjusting for pretest scores in the ANCOVA analysis. Instrumentation. Reliability data and skewness of score distributions suggest a possible instrumentation threat in Quasi-experiment 2 similar to the threat described for Quasi-experiment 1. Additionally, there was a possible differential posttest floor effect because the Now FARMS subgroup distribution was positively skewed in the treatment and not the comparison condition. Although the MFA was shown to have acceptable validity (Pyke & Ochsendorf, 2006), its reliability in this sample was, again, modest (Cronbach's a =.60). Differential Regression to the Mean. The implications of the possible floor effects for regression to the mean were similar to those described for Quasi-experiment 1. The effect was again considered to be more relevant to within group comparisons between levels of FARMS and not differential regression to the mean because the floor effect is present in both conditions. Local History. Data collected during Quasi-experiment 2 do suggest that some local history threats present in Quasi-experiment 1 were addressed. First, teachers in the comparison condition were interviewed to determine the curriculum materials they used. The interviews suggested that the curriculum materials were sufficiently different from the treatment materials to attenuate concerns about intervention diffusion (Lynch et al., 2006). Second, because Motion & Forces was implemented during the third quarter, all teachers reported having completed all 18 of the explorations (Lynch et al., 2006). Third, conducting an exact replication in the same schools reduced the threat that the results of Quasi-experiment 1 were due to the particular cohort of students in sixth grade. However, there could be effects of individual schools or teachers that affect the results. Such effects would not be distinguishable in an exact replication in the same schools.

13 Using Replication 13 Quasi-experiment 3: Partial Replication Demographics of the Quasi-experiment 3 Sample Quasi-experiment 3 was conducted as a partial replication of Quasi-experiments 1 and 2 in order to address the limitations of the latter manipulating additional variables to rule out competing explanations for results (Hendrick, 1990; Kline, 2003). A new sample of schools, randomly selected from within the sampling frame and randomly assigned to the treatment or comparison condition was selected for Quasi-experiment 3. Comparison of the students in the treatment and comparison conditions suggested that the sampling procedure resulted in two samples of students probabilistically similar in demographic and prior performance variables (see Tables 1 and 2). Changes to Procedures To address potential local history threats to internal validity due to potential student learning from the pretest, a study-within-a-study was conducted to empirically identify pretest effects (Ochsendorf & Pyke, 2007). The pretest for Quasi-experiment 3 was distributed to a randomly selected sub-sample of students representative of the population (n = 295). The pretests were prepared by SCALE-uP researchers and sent to the MCPS program evaluation team. To address teacher learning from the pretest, representatives of the evaluation team distributed the pretests to the sub-sample of students according to a standardized procedure. All assessments were returned to the MCPS program evaluation team office by the representatives, where they were picked up by SCALE-uP researchers. Posttest procedures for Quasi-experiment 3 were the same as those for Quasi-experiments 1 and 2. To address the potential local history threats of intervention diffusion or lack of implementation of the intervention, fidelity of implementation data were collected by observing one full lesson taught in a sample of classrooms (n = 60) during implementation of instruction in the target benchmark. Trained raters collected data on teachers' and students' adherence to the structure of the curriculum materials (in the treatment condition) and use of instructional strategies consistent with the treatment unit (treatment and comparison conditions) (O Donnell et al., 2007). In addition, all students in the treatment condition were provided with a bound, printed Student Journal as recommended by the developers of Motion and Forces. Results Pretest. A 1 X 2 between-groups Analysis of Variance (ANOVA) indicated no statistically significant difference in pretest score between the sub-sample of students pretested in the treatment and comparison conditions. Posttest. A 1 X 2 between-groups Analysis of Variance (ANOVA) indicated a statistically significant main effect in favor of the treatment condition, with F(1, 1759) = 24.26, p <.05, Cohen's d =.23. The adjusted mean score for the treatment condition (M = 56.09, SD = 22.47) and the comparison condition (M = 50.85, SD = 22.18) were both within the third level of understanding, some fluency with ideas. However, the 95% confidence interval around the mean for the comparison condition suggested that the true mean could be in either the second or third level of understanding. ANOVA indicated a statistically significant interaction between curriculum condition and FARMS, F(2, 1755) = 3.80, p <.05. Follow-up tests conducted to determine simple main effects for the interaction indicated that the interaction could be explained by a lack of a statistically

14 Using Replication 14 significant difference between the treatment and comparison conditions for the Prior FARMS subgroup. The main effect for curriculum condition was consistent for the Never FARMS and Now FARMS subgroups (see Figure 4). Potential Threats to Internal Validity Selection-Maturation. Tables 1 and 2 indicate that the treatment and comparison groups were similar demographically and in terms of their standardized test scores and prior GPA prior to the study. In the aggregate, pretest scores also indicated no significant differences between students with similar demographics and a significant difference between the scores from only one matched pair of schools. Instrumentation. The MFA again showed only modest reliability for this sample (Cronbach's a =.52). In the aggregate, the tests of assumptions for parametric statistics were not violated, but examination of posttest scores indicated posttest scores for the Now FARMS subgroup in the comparison condition were positively skewed. The skewness of the posttest scores for the Now FARMS subgroup makes comparisons across conditions and within the treatment condition more difficult to interpret. Differential Regression to the Mean. Differential regression to the mean cannot be determined for Quasi-experiment 3 because pretest scores are unknown for the majority of the sample. Local History. The potential local history threat of teachers learning from the pretest was addressed by administering the pretest to a limited sample of students. Further, comparison of results from the sub-sample of pretested students suggested that the threat of students learning from the pretest was not present in Quasi-experiment 3. Local history threats of diffusion of the intervention or low fidelity to the intervention in treatment classrooms were addressed with data collected during observations of treatment and comparison classrooms and during interviews with teachers. These data indicated that Motion & Forces was implemented with sufficient fidelity to the structure of the unit (e.g., activities, procedures, sequence) to justify a conclusion that the treatment was implemented and that Motion & Forces was not used in comparison classrooms, thereby reducing the threat of diffusion. Finally, all students in the treatment condition received and used journals as recommended by the Motion & Forces curriculum materials (Harvard-Smithsonian Center for Astrophysics, 2001). The collection of fidelity data provides evidence that threats due to differential implementation in different locations are somewhat alleviated when considering the data from Quasi-experiment 3. On the other hand, the presence of observers could have had an different effect on different teachers and classrooms in different schools. Differences in outcomes between Quasi-experiments 1 and 2, particularly for the Prior FARMS and Now FARMS subgroups, and Quasi-experiment 3 suggest the possible threat that factors at the school level not detected by our sampling procedures could have influenced outcomes. However, demographic similarities at the student level, at which data were analyzed, partially alleviate this concern. Analysis: Considering the Results of 3 Quasi-Experiments The analyses of data in Quasi-experiments 1, 2, and 3 paint different pictures of the effectiveness of the Motion and Forces curriculum unit. The data from quasi-experiments 1 and 2 suggest that the unit is either effective or does no harm for students from the highest socioeconomic level (i.e., Never FARMS) but could actually be worse than the standard fare

15 Using Replication 15 for students of lover socioeconomic levels. The data from Quasi-experiment 3, however, suggest that the unit is more effective than materials used in the comparison condition in the aggregate and for the Never FARMS and Now FARMS subgroups, while they are no better or no worse for the Prior FARMS subgroups. Faced with these data, it seems important to analyze the threats to internal validity to suggest which of the data are most valid and therefore suitable to use for making decisions about scaling-up the unit. Addressing Threats to Internal Validity through Replication Table 3 summarizes the threats to internal validity present in each of the quasiexperiments on the effectiveness of Motion & Forces conducted by SCALE-uP when aggregate data are considered. Table 4 summarizes the same threats when data disaggregated by FARMS status are considered. If data indicate that a threat did not exist, a no was placed into the appropriate cell on the table. When data suggested that the threat did exist, then a qualitative description of the threat level as low, moderate, or high, agreed upon through discussion among the researchers, was assigned to the appropriate cell. The reader should remember that in some cases, a threat in one area interacts with a threat posed in another. For example, the level of threat posed by differential regression to the mean between experimental conditions is dependent in part on a pretest difference between groups. The first three categories of threats selection-maturation, instrumentation, and differential regression to the mean are generally independent across quasi-experiments. That is, the existence of differences between groups, floor or ceiling effects, and the subsequent differential regression to the mean are affected primarily by the specific sample studied in a given quasi-experiment. Overall, selection-maturation threats are generally low, and instrumentation and differential regression to the mean threats appear to be low to moderate. The contrast between the description of threats for the aggregate data and those for the disaggregated data are noteworthy. When data are not disaggregated, the threat levels for floor and ceiling effects and for differential regression to the mean are low. The potential threats to validity when data are disaggregated, however, suggest that perhaps the instrument is not as effective for the Now FARMS subgroup (i.e., the lowest SES status) and that gain registered for this subgroup might be differentially interpreted as regression to the mean. In any case, the threats to the validity of the data when data are disaggregated underscore the importance of disaggregation when making decisions about data that inform scale-up of an intervention to all students in a population. Potential threats due to local history are not as consistent across quasi-experiments, although they are consistent across aggregate and disaggregated data. Threats are lower in the partial replication for all but one of the local history variables. This is not surprising, given the purpose of the partial replication to address the limitations of quasi-experiments by manipulating additional variables to rule out competing explanations for results (Kline, 2003). In Quasiexperiment 3, any reduction in confidence in the data caused by not administering a pretest to all students (and therefore raising potential concerns about selection-maturation) appears to be addressed not only by the results from the pretested sub-sample, but also by the added confidence resulting from the elimination of the threat of a pretest effect for students or teachers. Further, we might conclude that the absence of a pretest effect on students in the study-within-astudy conducted in Quasi-experiment 3 (Ochsendorf & Pyke, 2007) suggests that it was not present in Quasi-experiments 1 and 2, thereby reducing the potential threat in those quasiexperiments. While not eliminating the threat, we are inclined to conclude that the threat of a

In this chapter we discuss validity issues for quantitative research and for qualitative research.

In this chapter we discuss validity issues for quantitative research and for qualitative research. Chapter 8 Validity of Research Results (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we discuss validity issues for

More information

The Regression-Discontinuity Design

The Regression-Discontinuity Design Page 1 of 10 Home» Design» Quasi-Experimental Design» The Regression-Discontinuity Design The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Research Design. Source: John W. Creswell RESEARCH DESIGN. Qualitative, Quantitative, and Mixed Methods Approaches Third Edition

Research Design. Source: John W. Creswell RESEARCH DESIGN. Qualitative, Quantitative, and Mixed Methods Approaches Third Edition Research Design Source: John W. Creswell RESEARCH DESIGN Qualitative, Quantitative, and Mixed Methods Approaches Third Edition The Three Types of Designs Three types Qualitative research Quantitative research

More information

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS

CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS CHAPTER NINE DATA ANALYSIS / EVALUATING QUALITY (VALIDITY) OF BETWEEN GROUP EXPERIMENTS Chapter Objectives: Understand Null Hypothesis Significance Testing (NHST) Understand statistical significance and

More information

CALIFORNIA STATE UNIVERSITY STANISLAUS DEPARTMENT OF SOCIOLOGY ASSESSMENT MODEL

CALIFORNIA STATE UNIVERSITY STANISLAUS DEPARTMENT OF SOCIOLOGY ASSESSMENT MODEL CALIFORNIA STATE UNIVERSITY STANISLAUS DEPARTMENT OF SOCIOLOGY ASSESSMENT MODEL Introduction The purpose of assessment in education is to create a model that can quantify the degree of program success

More information

Conducting Strong Quasi-experiments

Conducting Strong Quasi-experiments Analytic Technical Assistance and Development Conducting Strong Quasi-experiments Version 1 May 2015 This report was prepared for the Institute of Education Sciences (IES) by Decision Information Resources,

More information

VALIDITY OF QUANTITATIVE RESEARCH

VALIDITY OF QUANTITATIVE RESEARCH Validity 1 VALIDITY OF QUANTITATIVE RESEARCH Recall the basic aim of science is to explain natural phenomena. Such explanations are called theories (Kerlinger, 1986, p. 8). Theories have varying degrees

More information

CHAPTER EIGHT EXPERIMENTAL RESEARCH: THE BASICS of BETWEEN GROUP DESIGNS

CHAPTER EIGHT EXPERIMENTAL RESEARCH: THE BASICS of BETWEEN GROUP DESIGNS CHAPTER EIGHT EXPERIMENTAL RESEARCH: THE BASICS of BETWEEN GROUP DESIGNS Chapter Objectives: Understand that the purpose of experiments and group quasi-experiments is to investigate differences between

More information

The Effects of the Payne School Model on Student Achievement Submitted by Dr. Joseph A. Taylor

The Effects of the Payne School Model on Student Achievement Submitted by Dr. Joseph A. Taylor The Effects of the Payne School Model on Student Achievement Submitted by Dr. Joseph A. Taylor Executive Summary This study was commissioned by Teaching Tolerance to synthesize evidence of effectiveness

More information

CHAPTER VI RESEARCH METHODOLOGY

CHAPTER VI RESEARCH METHODOLOGY CHAPTER VI RESEARCH METHODOLOGY 6.1 Research Design Research is an organized, systematic, data based, critical, objective, scientific inquiry or investigation into a specific problem, undertaken with the

More information

Lecturer: Dr. Emmanuel Adjei Department of Information Studies Contact Information:

Lecturer: Dr. Emmanuel Adjei Department of Information Studies Contact Information: Lecturer: Dr. Emmanuel Adjei Department of Information Studies Contact Information: eadjei@ug.edu.gh College of Education School of Continuing and Distance Education 2014/2015 2016/2017 Session Overview

More information

Experimental and Quasi-Experimental designs

Experimental and Quasi-Experimental designs External Validity Internal Validity NSG 687 Experimental and Quasi-Experimental designs True experimental designs are characterized by three "criteria for causality." These are: 1) The cause (independent

More information

What Works Clearinghouse

What Works Clearinghouse What Works Clearinghouse U.S. DEPARTMENT OF EDUCATION July 2012 WWC Review of the Report Randomized, Controlled Trial of the LEAP Model of Early Intervention for Young Children With Autism Spectrum Disorders

More information

WWC STUDY REVIEW STANDARDS

WWC STUDY REVIEW STANDARDS WWC STUDY REVIEW STANDARDS INTRODUCTION The What Works Clearinghouse (WWC) reviews studies in three stages. First, the WWC screens studies to determine whether they meet criteria for inclusion within the

More information

Underlying Theory & Basic Issues

Underlying Theory & Basic Issues Underlying Theory & Basic Issues Dewayne E Perry ENS 623 Perry@ece.utexas.edu 1 All Too True 2 Validity In software engineering, we worry about various issues: E-Type systems: Usefulness is it doing what

More information

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04

Validity and Quantitative Research. What is Validity? What is Validity Cont. RCS /16/04 Validity and Quantitative Research RCS 6740 6/16/04 What is Validity? Valid Definition (Dictionary.com): Well grounded; just: a valid objection. Producing the desired results; efficacious: valid methods.

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Research in Education. Tenth Edition

Research in Education. Tenth Edition Research in Education John W. Best James V. Kahn Tenth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World

More information

Measuring and Assessing Study Quality

Measuring and Assessing Study Quality Measuring and Assessing Study Quality Jeff Valentine, PhD Co-Chair, Campbell Collaboration Training Group & Associate Professor, College of Education and Human Development, University of Louisville Why

More information

CHAPTER LEARNING OUTCOMES

CHAPTER LEARNING OUTCOMES EXPERIIMENTAL METHODOLOGY CHAPTER LEARNING OUTCOMES When you have completed reading this article you will be able to: Define what is an experiment Explain the role of theory in educational research Justify

More information

Lecture 4: Research Approaches

Lecture 4: Research Approaches Lecture 4: Research Approaches Lecture Objectives Theories in research Research design approaches ú Experimental vs. non-experimental ú Cross-sectional and longitudinal ú Descriptive approaches How to

More information

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES)

Assessing the Validity and Reliability of the Teacher Keys Effectiveness. System (TKES) and the Leader Keys Effectiveness System (LKES) Assessing the Validity and Reliability of the Teacher Keys Effectiveness System (TKES) and the Leader Keys Effectiveness System (LKES) of the Georgia Department of Education Submitted by The Georgia Center

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

Behavioral Intervention Rating Rubric. Group Design

Behavioral Intervention Rating Rubric. Group Design Behavioral Intervention Rating Rubric Group Design Participants Do the students in the study exhibit intensive social, emotional, or behavioral challenges? % of participants currently exhibiting intensive

More information

John A. Nunnery, Ed.D. Executive Director, The Center for Educational Partnerships Old Dominion University

John A. Nunnery, Ed.D. Executive Director, The Center for Educational Partnerships Old Dominion University An Examination of the Effect of a Pilot of the National Institute for School Leadership s Executive Development Program on School Performance Trends in Massachusetts John A. Nunnery, Ed.D. Executive Director,

More information

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research Chapter 11 Nonexperimental Quantitative Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) Nonexperimental research is needed because

More information

UNIT III: Research Design. In designing a needs assessment the first thing to consider is level for the assessment

UNIT III: Research Design. In designing a needs assessment the first thing to consider is level for the assessment UNIT III: Research Design SWK 330 Kimberly Baker-Abrams What is a needs assessment? Why is it helpful? an assessment to determine the existence of services, client population, client access to services,

More information

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity Measurement & Variables - Initial step is to conceptualize and clarify the concepts embedded in a hypothesis or research question with

More information

EXPERIMENTAL RESEARCH DESIGNS

EXPERIMENTAL RESEARCH DESIGNS ARTHUR PSYC 302 (EXPERIMENTAL PSYCHOLOGY) 18C LECTURE NOTES [10/08/18] EXPERIMENTAL RESEARCH DESIGNS PAGE 1 Topic #5 EXPERIMENTAL RESEARCH DESIGNS As a strict technical definition, an experiment is a study

More information

STATISTICAL CONCLUSION VALIDITY

STATISTICAL CONCLUSION VALIDITY Validity 1 The attached checklist can help when one is evaluating the threats to validity of a study. VALIDITY CHECKLIST Recall that these types are only illustrative. There are many more. INTERNAL VALIDITY

More information

Chapter 1 Introduction to Educational Research

Chapter 1 Introduction to Educational Research Chapter 1 Introduction to Educational Research The purpose of Chapter One is to provide an overview of educational research and introduce you to some important terms and concepts. My discussion in this

More information

Overview of the Logic and Language of Psychology Research

Overview of the Logic and Language of Psychology Research CHAPTER W1 Overview of the Logic and Language of Psychology Research Chapter Outline The Traditionally Ideal Research Approach Equivalence of Participants in Experimental and Control Groups Equivalence

More information

Causal inferences with large scale assessment data: using a validity framework

Causal inferences with large scale assessment data: using a validity framework DOI 10.1186/s40536-016-0019-1 RESEARCH Causal inferences with large scale assessment data: using a validity framework David Rutkowski 1* and Ginette Delandshere 2 Open Access *Correspondence: david.rutkowski@cemo.uio.no

More information

Study Design. Svetlana Yampolskaya, Ph.D. Summer 2013

Study Design. Svetlana Yampolskaya, Ph.D. Summer 2013 Study Design Svetlana Yampolskaya, Ph.D. Summer 2013 Study Design Single point in time Cross-Sectional Research Multiple points in time Study only exists in this point in time Longitudinal Research Study

More information

Research Questions and Survey Development

Research Questions and Survey Development Research Questions and Survey Development R. Eric Heidel, PhD Associate Professor of Biostatistics Department of Surgery University of Tennessee Graduate School of Medicine Research Questions 1 Research

More information

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz This study presents the steps Edgenuity uses to evaluate the reliability and validity of its quizzes, topic tests, and cumulative

More information

Behavioral Intervention Rating Rubric. Group Design

Behavioral Intervention Rating Rubric. Group Design Behavioral Intervention Rating Rubric Group Design Participants Do the students in the study exhibit intensive social, emotional, or behavioral challenges? Evidence is convincing that all participants

More information

EXPERIMENTAL RESEARCH DESIGNS

EXPERIMENTAL RESEARCH DESIGNS ARTHUR PSYC 204 (EXPERIMENTAL PSYCHOLOGY) 14A LECTURE NOTES [02/28/14] EXPERIMENTAL RESEARCH DESIGNS PAGE 1 Topic #5 EXPERIMENTAL RESEARCH DESIGNS As a strict technical definition, an experiment is a study

More information

Running head: CPPS REVIEW 1

Running head: CPPS REVIEW 1 Running head: CPPS REVIEW 1 Please use the following citation when referencing this work: McGill, R. J. (2013). Test review: Children s Psychological Processing Scale (CPPS). Journal of Psychoeducational

More information

Quantitative Research Methods FSEHS-ARC

Quantitative Research Methods FSEHS-ARC Quantitative Research Methods FSEHS-ARC Overview Research Process Quantitative Methods Designs Validity and Research Designs A Definition of Research Research is a process of steps used to collect and

More information

Programme Name: Climate Schools: Alcohol and drug education courses

Programme Name: Climate Schools: Alcohol and drug education courses STUDY REFERENCE: C/ADEPIS01 Programme Name: Climate Schools: Alcohol and drug education courses Contact Details: Nicola Newton, University of New South Wales, email: n.newton@unsw.edu.au Natasha Nair,

More information

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS Michael J. Kolen The University of Iowa March 2011 Commissioned by the Center for K 12 Assessment & Performance Management at

More information

Basic concepts and principles of classical test theory

Basic concepts and principles of classical test theory Basic concepts and principles of classical test theory Jan-Eric Gustafsson What is measurement? Assignment of numbers to aspects of individuals according to some rule. The aspect which is measured must

More information

QUASI-EXPERIMENTAL APPROACHES

QUASI-EXPERIMENTAL APPROACHES QUASI-EXPERIMENTAL APPROACHES Experimental approaches work by comparing changes in a group that receives a development intervention with a group that does not. The difference is then attributed to the

More information

Spring 2014 Heroin Prevention Program Pilot: Evaluation Report Executive Summary

Spring 2014 Heroin Prevention Program Pilot: Evaluation Report Executive Summary Ryerson Espino Evaluation & Development Consulting Building Evaluation Capacity & Program Effectiveness through Collaboration Spring 2014 Heroin Prevention Program Pilot: Evaluation Report Executive Summary

More information

Introductory: Coding

Introductory: Coding Introductory: Coding Sandra Jo Wilson Editor, Education Coordinating Group Associate Director, Peabody Research Institute Research Assistant Professor, Dept. of Special Education Vanderbilt University,

More information

THE EFFECTS OF SELF AND PROXY RESPONSE STATUS ON THE REPORTING OF RACE AND ETHNICITY l

THE EFFECTS OF SELF AND PROXY RESPONSE STATUS ON THE REPORTING OF RACE AND ETHNICITY l THE EFFECTS OF SELF AND PROXY RESPONSE STATUS ON THE REPORTING OF RACE AND ETHNICITY l Brian A. Harris-Kojetin, Arbitron, and Nancy A. Mathiowetz, University of Maryland Brian Harris-Kojetin, The Arbitron

More information

Research Approach & Design. Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section

Research Approach & Design. Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section Research Approach & Design Awatif Alam MBBS, Msc (Toronto),ABCM Professor Community Medicine Vice Provost Girls Section Content: Introduction Definition of research design Process of designing & conducting

More information

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Version No. 7 Date: July Please send comments or suggestions on this glossary to Impact Evaluation Glossary Version No. 7 Date: July 2012 Please send comments or suggestions on this glossary to 3ie@3ieimpact.org. Recommended citation: 3ie (2012) 3ie impact evaluation glossary. International

More information

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker

Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis. Russell W. Smith Susan L. Davis-Becker Detecting Suspect Examinees: An Application of Differential Person Functioning Analysis Russell W. Smith Susan L. Davis-Becker Alpine Testing Solutions Paper presented at the annual conference of the National

More information

Article Critique - Use of Triangulation 1. Running Header: ARTICLE CRITIQUE USE OF TRIANGULATION FOR

Article Critique - Use of Triangulation 1. Running Header: ARTICLE CRITIQUE USE OF TRIANGULATION FOR Article Critique - Use of Triangulation 1 Running Header: ARTICLE CRITIQUE USE OF TRIANGULATION FOR COMPLETENESS PURPOSES Article Critique Use of Triangulation for Completeness Purposes Debbie L. Philpott

More information

Definitions of Nature of Science and Scientific Inquiry that Guide Project ICAN: A Cheat Sheet

Definitions of Nature of Science and Scientific Inquiry that Guide Project ICAN: A Cheat Sheet Definitions of Nature of Science and Scientific Inquiry that Guide Project ICAN: A Cheat Sheet What is the NOS? The phrase nature of science typically refers to the values and assumptions inherent to scientific

More information

RESEARCH METHODS. Winfred, research methods, ; rv ; rv

RESEARCH METHODS. Winfred, research methods, ; rv ; rv RESEARCH METHODS 1 Research Methods means of discovering truth 2 Research Methods means of discovering truth what is truth? 3 Research Methods means of discovering truth what is truth? Riveda Sandhyavandanam

More information

2013 Supervisor Survey Reliability Analysis

2013 Supervisor Survey Reliability Analysis 2013 Supervisor Survey Reliability Analysis In preparation for the submission of the Reliability Analysis for the 2013 Supervisor Survey, we wanted to revisit the purpose of this analysis. This analysis

More information

An evidence rating scale for New Zealand

An evidence rating scale for New Zealand Social Policy Evaluation and Research Unit An evidence rating scale for New Zealand Understanding the effectiveness of interventions in the social sector Using Evidence for Impact MARCH 2017 About Superu

More information

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.

Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education. The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,

More information

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016

The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 The Logic of Data Analysis Using Statistical Techniques M. E. Swisher, 2016 This course does not cover how to perform statistical tests on SPSS or any other computer program. There are several courses

More information

Experimental Research I. Quiz/Review 7/6/2011

Experimental Research I. Quiz/Review 7/6/2011 Experimental Research I Day 3 Quiz/Review Quiz Review Normal Curve z scores & T scores More on the normal curve and variability... Theoretical perfect curve. Never happens in actual research Mean, median,

More information

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University

Can Quasi Experiments Yield Causal Inferences? Sample. Intervention 2/20/2012. Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University Can Quasi Experiments Yield Causal Inferences? Matthew L. Maciejewski, PhD Durham VA HSR&D and Duke University Sample Study 1 Study 2 Year Age Race SES Health status Intervention Study 1 Study 2 Intervention

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Volume 9, Number 3 Handling of Nonresponse Error in the Journal of International Agricultural and Extension Education James R.

Volume 9, Number 3 Handling of Nonresponse Error in the Journal of International Agricultural and Extension Education James R. DOI: 10.5191/jiaee.2002.09307 Handling of Nonresponse Error in the Journal of International Agricultural and Extension Education James R. Lindner, Assistant Professor Texas A&M University Volume 9, Number

More information

Assignment 4: True or Quasi-Experiment

Assignment 4: True or Quasi-Experiment Assignment 4: True or Quasi-Experiment Objectives: After completing this assignment, you will be able to Evaluate when you must use an experiment to answer a research question Develop statistical hypotheses

More information

Psychology Department Assessment

Psychology Department Assessment Psychology Department Assessment 2008 2009 The 2008-2009 Psychology assessment included an evaluation of graduating psychology seniors regarding their experience in the program, an analysis of introductory

More information

26:010:557 / 26:620:557 Social Science Research Methods

26:010:557 / 26:620:557 Social Science Research Methods 26:010:557 / 26:620:557 Social Science Research Methods Dr. Peter R. Gillett Associate Professor Department of Accounting & Information Systems Rutgers Business School Newark & New Brunswick 1 Overview

More information

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis?

How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? How Does Analysis of Competing Hypotheses (ACH) Improve Intelligence Analysis? Richards J. Heuer, Jr. Version 1.2, October 16, 2005 This document is from a collection of works by Richards J. Heuer, Jr.

More information

Research in Physical Medicine and Rehabilitation

Research in Physical Medicine and Rehabilitation Research in Physical Medicine and Rehabilitation IV. Some Practical Designs in Applied Research RICHARD P. REILLY, PHD AND THOMAS W. FINDLEY, MD, PHD The randomized controlled trial is often difficult,

More information

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.

Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters. Chapter 9 Experimental Research (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) In this chapter we talk about what experiments are, we

More information

Cross-validation of easycbm Reading Cut Scores in Washington:

Cross-validation of easycbm Reading Cut Scores in Washington: Technical Report # 1109 Cross-validation of easycbm Reading Cut Scores in Washington: 2009-2010 P. Shawn Irvin Bitnara Jasmine Park Daniel Anderson Julie Alonzo Gerald Tindal University of Oregon Published

More information

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation

Formative and Impact Evaluation. Formative Evaluation. Impact Evaluation Formative and Impact Evaluation Formative Evaluation 2 An evaluation designed to produce qualitative and quantitative data and insight during the early developmental phase of an intervention, including

More information

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving

Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, The Scientific Method of Problem Solving Doctoral Dissertation Boot Camp Quantitative Methods Kamiar Kouzekanani, PhD January 27, 2018 The Scientific Method of Problem Solving The conceptual phase Reviewing the literature, stating the problem,

More information

Critical Thinking Assessment at MCC. How are we doing?

Critical Thinking Assessment at MCC. How are we doing? Critical Thinking Assessment at MCC How are we doing? Prepared by Maura McCool, M.S. Office of Research, Evaluation and Assessment Metropolitan Community Colleges Fall 2003 1 General Education Assessment

More information

Family Support for Children with Disabilities. Guidelines for Demonstrating Effectiveness

Family Support for Children with Disabilities. Guidelines for Demonstrating Effectiveness Family Support for Children with Disabilities Guidelines for Demonstrating Effectiveness September 2008 Acknowledgements The Family Support for Children with Disabilities Program would like to thank the

More information

Chapter 10 Quasi-Experimental and Single-Case Designs

Chapter 10 Quasi-Experimental and Single-Case Designs Chapter 10 Quasi-Experimental and Single-Case Designs (Reminder: Don t forget to utilize the concept maps and study questions as you study this and the other chapters.) The experimental research designs

More information

Professional Development: proposals for assuring the continuing fitness to practise of osteopaths. draft Peer Discussion Review Guidelines

Professional Development: proposals for assuring the continuing fitness to practise of osteopaths. draft Peer Discussion Review Guidelines 5 Continuing Professional Development: proposals for assuring the continuing fitness to practise of osteopaths draft Peer Discussion Review Guidelines February January 2015 2 draft Peer Discussion Review

More information

Formulating Research Questions and Designing Studies. Research Series Session I January 4, 2017

Formulating Research Questions and Designing Studies. Research Series Session I January 4, 2017 Formulating Research Questions and Designing Studies Research Series Session I January 4, 2017 Course Objectives Design a research question or problem Differentiate between the different types of research

More information

AQC93, 47 th Annual Quality Congress, Boston, Massachusetts, May 24-26, 1993

AQC93, 47 th Annual Quality Congress, Boston, Massachusetts, May 24-26, 1993 H. J. Bajaria, Ph.D., P.E. Multiface, Inc. Garden City, Michigan ABSTRACT STATISTICAL PROBLEM SOLVING STRATEGIES Statistical Problem Solving (SPS) strategies play a key role in connecting problem-solving

More information

HPS301 Exam Notes- Contents

HPS301 Exam Notes- Contents HPS301 Exam Notes- Contents Week 1 Research Design: What characterises different approaches 1 Experimental Design 1 Key Features 1 Criteria for establishing causality 2 Validity Internal Validity 2 Threats

More information

Research Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus.

Research Prospectus. Your major writing assignment for the quarter is to prepare a twelve-page research prospectus. Department of Political Science UNIVERSITY OF CALIFORNIA, SAN DIEGO Philip G. Roeder Research Prospectus Your major writing assignment for the quarter is to prepare a twelve-page research prospectus. A

More information

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials EFSPI Comments Page General Priority (H/M/L) Comment The concept to develop

More information

Research Approaches Quantitative Approach. Research Methods vs Research Design

Research Approaches Quantitative Approach. Research Methods vs Research Design Research Approaches Quantitative Approach DCE3002 Research Methodology Research Methods vs Research Design Both research methods as well as research design are crucial for successful completion of any

More information

chapter thirteen QUASI-EXPERIMENTAL AND SINGLE-CASE EXPERIMENTAL DESIGNS

chapter thirteen QUASI-EXPERIMENTAL AND SINGLE-CASE EXPERIMENTAL DESIGNS chapter thirteen QUASI-EXPERIMENTAL AND SINGLE-CASE EXPERIMENTAL DESIGNS In the educational world, the environment or situation you find yourself in can be dynamic. You need look no further than within

More information

RESEARCH METHODS. Winfred, research methods,

RESEARCH METHODS. Winfred, research methods, RESEARCH METHODS Winfred, research methods, 04-23-10 1 Research Methods means of discovering truth Winfred, research methods, 04-23-10 2 Research Methods means of discovering truth what is truth? Winfred,

More information

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations Research Design Beyond Randomized Control Trials Jody Worley, Ph.D. College of Arts & Sciences Human Relations Introduction to the series Day 1: Nonrandomized Designs Day 2: Sampling Strategies Day 3:

More information

Lecture 9 Internal Validity

Lecture 9 Internal Validity Lecture 9 Internal Validity Objectives Internal Validity Threats to Internal Validity Causality Bayesian Networks Internal validity The extent to which the hypothesized relationship between 2 or more variables

More information

What Constitutes a Good Contribution to the Literature (Body of Knowledge)?

What Constitutes a Good Contribution to the Literature (Body of Knowledge)? What Constitutes a Good Contribution to the Literature (Body of Knowledge)? Read things that make good contributions to the body of knowledge. The purpose of scientific research is to add to the body of

More information

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information: Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information: aanum@ug.edu.gh College of Education School of Continuing and Distance Education 2014/2015 2016/2017 Session Overview The course provides

More information

Analysis A step in the research process that involves describing and then making inferences based on a set of data.

Analysis A step in the research process that involves describing and then making inferences based on a set of data. 1 Appendix 1:. Definitions of important terms. Additionality The difference between the value of an outcome after the implementation of a policy, and its value in a counterfactual scenario in which the

More information

TRACER STUDIES ASSESSMENTS AND EVALUATIONS

TRACER STUDIES ASSESSMENTS AND EVALUATIONS TRACER STUDIES ASSESSMENTS AND EVALUATIONS 1 INTRODUCTION This note introduces the reader to tracer studies. For the Let s Work initiative, tracer studies are proposed to track and record or evaluate the

More information

Author's response to reviews

Author's response to reviews Author's response to reviews Title: Gender differences in Greek centenarians. A cross-sectional nation-wide study, examining multiple socio-demographic and personality factors and health locus of control.

More information

A Cross-validation of easycbm Mathematics Cut Scores in. Oregon: Technical Report # Daniel Anderson. Julie Alonzo.

A Cross-validation of easycbm Mathematics Cut Scores in. Oregon: Technical Report # Daniel Anderson. Julie Alonzo. Technical Report # 1104 A Cross-validation of easycbm Mathematics Cut Scores in Oregon: 2009-2010 Daniel Anderson Julie Alonzo Gerald Tindal University of Oregon Published by Behavioral Research and Teaching

More information

Data Patterns for COS Ratings: What to Expect and What to Question

Data Patterns for COS Ratings: What to Expect and What to Question Data Patterns for COS Ratings: What to Expect and What to Question May 2018 Cornelia Taylor Dominique Tunzi SRI International The contents of this document were developed under a grant from the U.S. Department

More information

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School

TRANSLATING RESEARCH INTO ACTION. Why randomize? Dan Levy. Harvard Kennedy School TRANSLATING RESEARCH INTO ACTION Why randomize? Dan Levy Harvard Kennedy School Your background Course Overview 1. What is evaluation? 2. Measuring impacts (outcomes, indicators) 3. Why randomize? 4. How

More information

Increasing Access to Technical Science Vocabulary Through Use of Universally Designed Signing Dictionaries

Increasing Access to Technical Science Vocabulary Through Use of Universally Designed Signing Dictionaries UNIVERSAL DESIGN in Higher Education P R O M I S I N G P R A C T I C E S Increasing Access to Technical Science Vocabulary Through Use of Universally Designed Signing Dictionaries Judy Vesel and Tara Robillard,

More information

A framework for predicting item difficulty in reading tests

A framework for predicting item difficulty in reading tests Australian Council for Educational Research ACEReSearch OECD Programme for International Student Assessment (PISA) National and International Surveys 4-2012 A framework for predicting item difficulty in

More information

How Do We Assess Students in the Interpreting Examinations?

How Do We Assess Students in the Interpreting Examinations? How Do We Assess Students in the Interpreting Examinations? Fred S. Wu 1 Newcastle University, United Kingdom The field of assessment in interpreter training is under-researched, though trainers and researchers

More information

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany

Overview of Perspectives on Causal Inference: Campbell and Rubin. Stephen G. West Arizona State University Freie Universität Berlin, Germany Overview of Perspectives on Causal Inference: Campbell and Rubin Stephen G. West Arizona State University Freie Universität Berlin, Germany 1 Randomized Experiment (RE) Sir Ronald Fisher E(X Treatment

More information

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI

STATS8: Introduction to Biostatistics. Overview. Babak Shahbaba Department of Statistics, UCI STATS8: Introduction to Biostatistics Overview Babak Shahbaba Department of Statistics, UCI The role of statistical analysis in science This course discusses some biostatistical methods, which involve

More information

Is Leisure Theory Needed For Leisure Studies?

Is Leisure Theory Needed For Leisure Studies? Journal of Leisure Research Copyright 2000 2000, Vol. 32, No. 1, pp. 138-142 National Recreation and Park Association Is Leisure Theory Needed For Leisure Studies? KEYWORDS: Mark S. Searle College of Human

More information

FCE 3900 EDUCATIONAL RESEARCH E X P E R I M E N T A L A N D Q U A S I E X P E R I M E N T A L R E S E A R C H

FCE 3900 EDUCATIONAL RESEARCH E X P E R I M E N T A L A N D Q U A S I E X P E R I M E N T A L R E S E A R C H FCE 3900 EDUCATIONAL RESEARCH E X P E R I M E N T A L A N D Q U A S I E X P E R I M E N T A L R E S E A R C H 1 Research Design: Causal Research Causality may be thought of as understanding a phenomenon

More information