MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS

Size: px

Start display at page:

Download "MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS"

Dora Higgins
5 years ago
Views:

1 MEASURING MIDDLE GRADES STUDENTS UNDERSTANDING OF FORCE AND MOTION CONCEPTS: INSIGHTS INTO THE STRUCTURE OF STUDENT IDEAS The purpose of this study was to create an instrument that measures middle grades students understanding of concepts relating to Newton s First Law of Motion. Design criteria included: 1) an exclusive focus on force and motion ideas (no mathematics or other science concepts); 2) distractors based upon the research on student thinking; 3) minimal burden on the test taker and the researcher; and 4) the measurement property of providing reliable information about students across a broad ability spectrum. After defining the content domain, multiple-choice items were drafted and revised using feedback from cognitive interviews with students. A pool of items was piloted, revised, field tested with approximately 5,000 students, and reviewed by a panel of physicists for content accuracy and domain coverage. Dimensionality analyses revealed that items clustered in two sets; one representing general knowledge of Newton s First Law, and one tapping a particularly prominent misconception the idea that constant non-zero net force results in constant speed. Item response theory was used to select items for a 25-item scale. The study generated a valid, rigorously constructed, minimally burdensome instrument that researchers can use to study the effect of instructional strategies. Further, it added to the knowledge base on how student thinking about force and motion is organized. P. Sean Smith, Horizon Research, Inc. Eric R. Banilower, Horizon Research, Inc. Introduction The purpose of this study was to create a measure of middle grades students understanding of concepts related to Newton s First Law of Motion. Specifically, we set out to create a tool that could be used by researchers to study the effect of different instructional strategies on student understanding. Similar tools exist. For instance, the Force Concept Inventory (FCI) (Hestenes, Wells, & Swackhamer, 1992) is well-respected and widely used for studying force and motion learning at the undergraduate level. The FCI is often used at the high school level as well, but it is not appropriate for the middle grades, covering topics beyond what national standards indicate middle school students should know. Our intent was to create a measure of the targeted concepts in as pure a form as possible; i.e., the instrument would not draw on other understandings such as mathematical or graphing skills. Finally, to enable wide-scale research, we set out to create an instrument that would be minimally burdensome, both for the test taker and the researcher. Thus, we opted for a multiple-choice format. Although this format has some limits, multiplechoice items can probe conceptual understanding, and is the format best suited to our purposes. Smith & Banilower Page 1 of 13 Horizon Research. Inc.

2 Theoretical Underpinnings This study is firmly rooted in the literature on student thinking (summarized in: Driver, Squires, Rushworth, & Wood-Robinson, 2002; Driver, Guesne, & Tiberghien, 2002), particularly the literature on the developmental progression of student ideas (correct and incorrect) in force and motion. The process we used to develop our instrument draws on and adds to this literature. The work is also situated in item response theory (IRT) (Swaminathan & Rogers, 1991), which we drew on to develop a scale that provides reliable information about student understanding across a wide range of ability levels. Instrument Development The development effort described in this paper is part of a much larger and well-funded project 1, which afforded the luxury of an elaborate and thorough development process. The process began with identifying the content domain, the idea that an unbalanced force acting on an object changes its speed (American Association for the Advancement of Science/Project 2061, 1993). For assessment purposes, we restricted the domain to motion in one dimension and defined the performance space by unpacking this idea into six sub-ideas. The content domain was reviewed by a panel of physicists and physics educators, which prompted minor revisions. The final version of the content domain is shown in Table 1. 1 The project is tilted ATLAST Assessing Teacher Learning About Science Teaching. ATLAST is funded by the National Science Foundation under grant number EHR The views expressed in this paper are those of the authors and do not necessarily represent the opinions of the National Science Foundation. Smith & Banilower Page 2 of 13 Horizon Research. Inc.

3 Table 1 Force and Motion Content Domain Targeted Idea: An unbalanced force acting on an object changes its speed. Sub-ideas: A. A force is a push or pull interaction between two objects, and has both magnitude and direction. B. All of the forces acting on an object combine through vector addition into a net force; they either balance each other out (net force is zero), or act like an unbalanced force (net force is not zero). 1. If the sum of forces exerted on an object in one direction is the same strength as the sum of forces exerted on the object in the opposite direction, then the forces on the object are balanced (i.e., the net force is zero). 2. If the sum of forces exerted on an object in one direction is greater than the sum of forces exerted on the object in the opposite direction, then the forces on the object are unbalanced (i.e., the net force is not zero). C. If an object is moving faster and faster, then there is a net force acting on the object in the same direction as the motion. D. If an object is moving slower and slower, then there is a net force acting on the object in the direction opposite to the object s motion. E. If an object has constant speed in a straight line (or zero speed), then there is no net force acting on the object. This can occur either when: 1. the forces on the object are balanced; or 2. there are no forces exerted on the object F. The force of friction acts to oppose the relative motion of two objects in contact. Friction acts on both objects along the surfaces in contact with each other. The magnitude of friction depends upon the smoothness/roughness of the surfaces and how hard the objects are pushed together. Force and motion is one of the few science topics that enjoys a robust literature on student thinking. After an extensive search of this literature, we associated known misconceptions 2 with the relevant sub-idea(s) in preparation for writing distractors. We then drafted multiple-choice items, and began a months-long iterative process of conducting cognitive interviews with students (well over 50) and revising items. A pool of 35 items was piloted with approximately 2,000 middle grades students in spring 2004; at the same time, each item was critiqued through Project 2061 s extensive item analysis procedure (DeBoer, 2005). Results of the piloting and analysis by Project 2061 were used to revise the item pool, which necessitated more student interviews. Ultimately, we field tested a pool of 48 items in fall 2004 with approximately 5,000 middle grades students. The items were split between two forms with 16 items common to each form. 2 We use the term misconception to describe anything that precedes full understanding of a specific idea. Some misconceptions are prior conceptions and may represent important steps in a learning progression. Smith & Banilower Page 3 of 13 Horizon Research. Inc.

4 Lessons Learned in the Development Process Assessment items with distractors based on misconceptions Writing multiple-choice items with misconception-based distractors is a very appealing approach. Multiple-choice items are often criticized as focusing on factual recall rather than conceptual understanding. However, items that use misconceptions as distractors not only probe deeper understanding, but can also serve a diagnostic function in planning instruction. Misconceptions often represent important, perhaps even necessary, steps on a trajectory to full understanding. Items that provide evidence of where a student is on that trajectory can be very useful for diagnosing thinking and guiding instruction. Given the obvious value of misconceptions-based distractors, one may wonder why the approach is not more common. Interestingly, such items present a challenge to development efforts using IRT. To understand this challenge, a bit of background on IRT is necessary. IRT affords many advantages to the test maker. Chief among these is the power to design a test that provides reliable information at the ability level of interest, in our case over a range of ability. As with any theory, IRT rests on a number of assumptions. One of the most important is that the probability of a correct response increases as respondent ability increases (i.e., that the item is monotonic). For monotonic items, graphing the probability of answering an item correctly by ability level results in an S-shaped curve like the one illustrated in Figure 1. In IRT, this graph is known as an item characteristic curve, or ICC, and is central to the theory. The item characteristic curve is the basic building block of item response theory; all the other constructs of the theory depend upon this curve. (p. 7, Baker, 2001). No item will match the shape in Figure 1 exactly, but the general trend of increasing probability with increasing ability must hold. Smith & Banilower Page 4 of 13 Horizon Research. Inc.

5 1.0 Item Characteristic Curv e: ES018V03 a = b = Probabi lity b Ability Figure 1 Sadler (1998) and others have conducted empirical studies suggesting that items with misconception-based distractors present a challenge to IRT. Specifically, such items may not be monotonic; that is, at some point in ability spectrum, a respondent with higher ability is less likely than one with lower ability to choose the correct response. A possible explanation for this finding is as follows: a respondent with no understanding will likely guess and have a 25 percent chance of answering correctly, assuming four choices. A respondent with some understanding (ability) may be drawn to one of the misconception-based distractors, making the probability of choosing the correct answer less than 25 percent. Some of our items exhibit a small degree of nonmonotonicity. However, most met the assumption of monotonicity, and we decided to proceed with an IRT-based model. Insights into student thinking from item analysis Our intent at the beginning of the development process was to generate a single scale that would measure students understanding of the idea that an unbalanced force acting on an object changes Smith & Banilower Page 5 of 13 Horizon Research. Inc.

6 its speed. Analysis of the field test data included examining the dimensionality of the items via factor and cluster analyses. These analyses indicated that our items fell into two groups, each measuring different aspects of student thinking about Newton s First Law. This grouping provides some insight into how students thinking about force and motion is organized. The first set includes items that address each sub-idea in Table 1, and can be thought of as general knowledge of the targeted idea. The second set includes items from only a few sub-ideas, primarily sub-ideas C (an object moves faster and faster as a result of a non-zero net force in the direction of motion), D (an object moves slower and slower as a result of a non-zero net force in the direction opposite its motion), and E (constant speed is a result of a zero net force). All of the items relate to the misconception that a constant non-zero net force applied to an object results in constant speed (or vice-versa, that an object moving with constant speed must be acted on by a constant non-zero net force). Figure 3 shows an item with a choice based on this misconception. The most commonly selected choice was D (40 percent), the correct answer. However, 37 percent of students chose B, indicating they think a non-zero net force is needed to keep the bicycle moving at constant speed. We saw very similar results on the item in Figure 4; 40 percent chose B (the correct answer), and 36 percent chose C. FM009V04 A boy is pedaling his bike on level ground so that he is moving at a constant speed. Which of the following is true about the forces on the bike? A. There are no forces being applied to the bike. B. The total force in the direction of the bike's motion is greater than the total force in the opposite direction. C. The total force in the direction of the bike's motion is getting larger and larger. D. The total force in the direction of the bike's motion is equal in strength to the total force in the opposite direction. Smith & Banilower Page 6 of 13 Horizon Research. Inc.

7 Figure 3 FM003V05 The total force acting on an object in one direction is greater than the total force acting on the object in the opposite direction. What is true about the object? A. It is not moving. B. It is changing speed. C. It is moving at a constant speed. D. It is moving back and forth. Figure 4 This second set of items, as a group, was much more difficult than the first set, indicating that the misconception is very prevalent among middle school students and may dominate their thinking about force and motion. The power and pervasiveness of this misconception are not surprising. All motion on Earth is affected by friction, and unless students are aware of friction s effects, they can hardly help but form the idea that a constant force is needed to make an object move with constant speed. This pattern of student thinking is well documented in the literature. Gunstone and Watts (2002) provide a summary of studies that consistently identified the misconception among students. Although the items related to this idea seemed to form a distinct subset, the inter-item reliability was quite low, below 0.4. To understand why, a bit more background on IRT is necessary. Figure 1 (see p. 5) depicts the item characteristic curve (the ICC). Figure 1 illustrates two other key ideas from IRT as well. The first is the difficulty parameter (a.k.a. the b parameter). In classical test theory, item difficulty typically represents the probability of students answering an item correctly. In IRT, the difficulty parameter describes the ability level at which a respondent has a 50 percent chance of answering correctly (Swaminathan and Rogers, 1991). Difficulty parameters less than zero indicate items are relatively easy; difficulty parameters greater than zero indicate items are relatively difficult. In Figure 1, the item difficulty is , indicating that the item is relatively easy. Smith & Banilower Page 7 of 13 Horizon Research. Inc.

8 The second key idea is that of item discrimination (a.k.a. the a parameter). Item discrimination describes how well an item can distinguish among respondents of different ability levels (Swaminathan and Rogers, 1991). Items for which there is a large change in the probability of responding correctly over a small change in ability are said to be highly discriminating. The more discriminating an item is, the more information it provides about a respondent; in other words, the more reliable the estimate of ability for that respondent is. In regards to ICCs, items that are more discriminating have steeper slopes. The effect of the discrimination parameter on item information is illustrated in Figure 2, which plots the item information for two items with roughly equal difficulty parameters but different discrimination parameters. Information Ability Disc.=1.56 Disc.=0.95 Figure 2 Narode (1987, cited in Sadler, 1998) found that mathematics items with misconceptions-based distractors were both more difficult (higher b parameter) and less discriminating (lower a parameter) than more traditional multiple choice items. A scale constructed of items with low discriminating power cannot be very reliable, as the two are directly linked. The discrimination parameters of the items in the second group are shown in Table 4 below. Generally, a discrimination parameter below 1 (using a logistic metric) is less than desirable. Clearly, low discrimination presents a measurement dilemma; the field is very interested in assessing student thinking areas that are laden with misconceptions, but including the misconceptions as distractors may make a reliable scale difficult to construct. Smith & Banilower Page 8 of 13 Horizon Research. Inc.

9 Table 3 Item Discrimination Difficulty Characteristics of the final scale Given that we could not reliably measure what appeared to be a separate factor, we opted to focus our final scale on overall understanding of the idea that an unbalanced force acting on an object changes the object s speed. Using BILOG-MG 3.0 (Zimowski, Muraki, Mislevy, & Bock, 2003), we estimated the discrimination and difficulty parameters (i.e., a two parameter logistic model) 3 for all items that loaded on the first, more general factor. IRT allows the construction of scales with specific properties, a distinct advantage over classical test theory. The ultimate goal of a scale created using IRT is to generate ability estimates for test takers. In IRT, ability is plotted on a scale from negative to positive infinity in terms of standard deviations, with a mean of 0. However, practically all test takers fall within the range -3 to +3 on the ability scale. Our goal was to create a scale that would allow us to accurately estimate ability over a wide range. Using the difficulty and discrimination parameters, we selected 25 items that covered the content domain. Table 3 shows the number of items addressing each sub-idea. The items total to more than 25 because some items address more than one sub-idea. 3 A 3 parameter logistic model did not fit the data any better than the 2 parameter model. In the interest of simplicity, we opted for the 2 parameter model Smith & Banilower Page 9 of 13 Horizon Research. Inc.

10 Table 3 Number of Items Addressing Each Sub-idea Sub-ideas: A force is a push or pull interaction between two objects, and has both magnitude and direction. All of the forces acting on an object combine through vector addition into a net force; they either balance each other out (net force is zero), or act like an unbalanced force (net force is not zero). If an object is moving faster and faster, then there is a net force acting on the object in the same direction as the motion. If an object is moving slower and slower, then there is a net force acting on the object in the direction opposite to the object s motion. If an object has constant speed in a straight line (or zero speed), then there is no net force acting on the object. This can occur either when the forces on the object are balanced or when there are no forces exerted on the object The force of friction acts to oppose the relative motion of two objects in contact. Friction acts on both objects along the surfaces in contact with each other. The magnitude of friction depends upon the smoothness/roughness of the surfaces and how hard the objects are pushed together. Number of items Estimating ability accurately requires an adequate amount of information. The amount of information a test provides is described by the test information curve. The curve is constructed simply by summing the information contributed by each item. Again, we were interested in constructing a test that functions well over a broad ability range, which stands in contrast to other purposes, for example a credentialing exam. In the latter scenario, the test constructer s interest is in maximizing the amount of information at the ability determined to be necessary for credentialing. Figure 5 displays the test information curve for our 25-item scale. Clearly the scale provides a maximum amount of information near the middle of the ability scale. Consistent with our goals, the scale provides information for making sufficiently reliable ability estimates between about -2 and Smith & Banilower Page 10 of 13 Horizon Research. Inc.

11 Test Information Curve for 25-item Scale Information Ability Figure 5 Conclusions We set out to develop an instrument that measures student understanding of ideas related to Newton s First Law; specifically, the idea that an unbalanced force acting on an object changes the object s speed. The instrument represents an important contribution to the field in two regards. First, the development process itself and the resulting instrument provide insight into student thinking about the targeted concepts. The misconception that a non-zero net force results in constant speed appears to be quite prevalent among middle grades students, so prevalent that it may dominate their thinking about Newton s First Law. More generally, although the instrument was not developed to be a diagnostic measure, it does shed light on student thinking, as most distractors were written from documented misconceptions about force and motion. Second, the work provides researchers with a valid, rigorously constructed, minimally burdensome tool to use in studying teaching and learning at the middle grades level. In particular, the tool allows researchers to study the effect of different instructional approaches on students understanding of the targeted concepts. The development process revealed a particularly challenging measurement dilemma. Items that used strongly held misconceptions as distractors tended to be poorly discriminating. That is, Smith & Banilower Page 11 of 13 Horizon Research. Inc.

12 they did not distinguish well between students who understood the target idea and those who did not. In an IRT measurement framework, such items do not function well in estimating student ability. Our work suggests that while using misconceptions-based distractors is very appealing from a diagnostic perspective, the approach can, especially when strongly held misconceptions are employed, make scale construction quite challenging. It is clear that more work is needed in this area. References American Association for the Advancement of Science/Project (1993). Benchmarks for Science Literacy. New York: Oxford University Press. Baker, F.B. (2001). The basics of item response theory. ERIC Clearinghouse on Assessment and Evaluation, University of Maryland, College Park, MD. DeBoer, G.E. (2005). Aligning student assessment to state and national content standards. Paper presented at the NSTA National Convention, Dallas, Texas. Driver, R., Squires, A., Rushworth, P., and Wood-Robinson, V. (2002). Making Sense of Secondary Science: Research into Children s Ideas. London and New York, NY: RoutledgeFalmer. Driver, R., Guesne, E. and Tiberghien, A. (2002). Children s Ideas in Science. Philadelphia, PA, Open University Press. Gunstone, R. and Watts, M. (2002). Force and motion, in Driver, R., Guesne, E., & Tiberghien, A. (eds.). Children s ideas in science (pp ). Philadelphia, PA, Open University Press. Hestenes, D., Wells, M., and Swackhamer, G. (1992). Force Concept Inventory, The Physics Teacher, 30 (3), Narode, R. (1987). Standardized testing for alternative conceptions in basic mathematics. In J.D. Novak (Ed.), 2 nd International Seminar on Misconception and Educational Strategies in Science and Mathematics (Vol. 1) (pp ). Ithaca, NY: Cornell University Press. Smith & Banilower Page 12 of 13 Horizon Research. Inc.

13 Sadler, P.M. (1998). Psychometric models of student conceptions in science: reconciling qualitative studies and distractor-driven assessment instruments, Journal of Research in Science Teaching, 35 (3), Swaminathan, H. and Rogers, H.J. (1991). Fundamentals of Item Response Theory. Thousand Oaks, CA: Sage Publications. Zimowski, M, Muraki, E., Mislevy, R, and Bock, R. (2003) BILOG MG-3. Assessment Systems Corporation: St. Paul, MN. Smith & Banilower Page 13 of 13 Horizon Research. Inc.

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Kamla-Raj 010 Int J Edu Sci, (): 107-113 (010) Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories O.O. Adedoyin Department of Educational Foundations,