![]() |
| Home > Plan to Grade > Developing a Grading Plan > Frisbie & Waltman Article | ||||||||||||||||||||||||||||||||||||||||
An NCME Instructional Module on: |
||||||||||||||||||||||||||||||||||||||||
| Developing a Personal Grading Plan1 | ||||||||||||||||||||||||||||||||||||||||
| David A. Frisbie and Kristie K. Waltman, University of Iowa Educational Measurement: Issues and Practice, Fall 1992 |
||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
The purpose of this instructional module is to assist teachers in developing defensible grading practices that effectively and fairly communicate students' achievement status. In formulating such practices, it is essential that teachers first consider their personal grading philosophy and then create a compatible personal grading plan. The module delineates key philosophical issues that should be addressed and then outlines the procedural steps essential to establishing a grading plan. Finally, the features of several common methods of absolute and relative grading are compared. This instructional module has been designed to help prospective and beginning teachers sort out the issues involved in formulating their grading procedures and to help experienced teachers reexamine the fairness and defensibility of their current grading practices. It can be applied at any grade level and in any subject matter area in which letter grades are assigned to students at the end of a reporting period. The content focus is limited to grading, so their modes of evaluating and reporting student progress are not addressed. With regard to the purpose of grades, the position we will assume and defend is that grades are intended mainly to communicate the achievement status of students. The grade, then, symbolizes the extent to which a student has attained the important instructional goals of the reporting period for which the grade is assigned. Grades would not be needed if there were no need to communicate achievement to students and parents (or others outside the school setting). Grades are not essential to the instructional process: teachers can teach without them and students can and do learn without them. Grades do serve several other important functions that are secondary to their communication role, however. Grades provide incentives to learn for many students. Most students are motivated to attain the highest grades and to receive the recognition that often accompanies such grades, and they are motivated to avoid the lowest grades and the negative outcomes that sometimes are associated with those grades. Grades also provide information to students for self-evaluation, for analysis of strengths and weaknesses, and for creating a general impression of academic promise, all of which may enter into educational and career planning. Finally, grades are used to communicate students' performance levels to others who want to know about past achievement or want to forecast future academic success. Prospective employers and those who are charged with deciding who qualifies for honors, who is eligible for basketball, or who should be the class valedictorian. This module is organized to demonstrate the process a teacher might follow in devising a grading plan. First, some of the philosophical issues inherent in the grading process are identified, and then steps to follow in creating a grading plan are outlined. Finally, some of the most common methods of assigning grades are analyzed. The primary objectives of this module are to enable the reader to (a) describe the main questions of value that need to be considered in formulating personal grading philosophy; (b) explain how written university grading policies, reporting forms, and department-level expectations can help or hinder the development of a personal grading philosophy; (c) identify the essential procedural grading plan; (d) explain how the decisions about defining the grade symbols directly influence other subsequent decisions in creating a personal grading plan; and (e) analyze the strengths and weaknesses of each of several common methods of assigning grades. Teachers who implement the recommendations of this module should end up with a defensible grading plan that is in harmony with their personal grading philosophy and the grading policy of the institution in which the plan will be implemented. Developing a Grading Philosophy The process of grading requires teachers to make a number of decisions that are grounded in their personal value system. What to do about grading or how to do it is often less a matter of correctness and more a matter of preference and perceived value or importance. In this section, we identify a number of "should" questions, questions about which reasonable people might disagree because of their personal beliefs, values, and experiences. "What should a B [3.0] mean? Should any student be assigned an F [0.0] grade? How many A [4.0] grades should be assigned in a class?" These are questions for which research studies cannot provide answers, but they are the types of questions that must be answered by each teacher who issues grades. 1. What meaning should each grade symbol carry? A grade of C [2.0] can tell how much Rudy knows, how he compares to his classmates, how hard he has tried, how much he has learned this quarter, or how well he has behaved this term. Since it cannot tell all of those things at once, what should it be limited to telling? 2. What should "failure" [0.0] mean? There is, undoubtedly, more emotion associated with the F grade than any other, largely because of the negative consequences for many students who receive it. What does F mean? Should it mean the student knows nothing, knows the least within his class group, can do only the lowest level of work in the curriculum, hasn't tried to learn, or hasn't learned much in 10 weeks? 3. What elements of performance should be incorporated in a grade? Once a teacher has decided on the meaning the grade symbols should convey, much effort will be required to keep contaminating information out of the grade. Teachers are constantly making observations and judgments about a variety of characteristics of their students. Such information can be used to evaluate communication skills, interpersonal relations, attitude, and motivation, but not all information gathered need be funneled into the grading decisions. What should be included and what should be kept out? 4. How should grades in a class be distributed? In some universities, written grading policies dictate the nature of grade distributions (e.g. the percentages of As, Bs, etc.), however, most universities seem not to have such policies. Thus, most teachers are probably faced with a decision about the percentage of A grades or C grades they should issue. should the average grade be C? Is it okay if everyone gets an A? Should there be an equal number of B and D grades? 5. What should the component be like that go into a final grade? The separate score or grades that are combined to form the final grade for a reporting period must, above all, convey the meaning the teacher previously decided upon for the grade symbols. Should rough drafts count? How about scores from a test that turned out to be too hard? What about practice trials for performance tests? How many components should there be as a minimum? 6. How should components of the grade be combined? Suppose Dr. Voss uses three tests, a short paper, and an individual project for third quarter grading in his class. Should each of the five components be worth 20% of the final grade or should some be more heavily weighted? What should he think about when making that decision? 7. What method should be used to assign grades? After component scores have been combined, a final grade needs to be assigned to each student. The method of assignment ought to be consistent with the decisions made earlier about the meaning each grade symbol should have. For example, it would be illogical to grade on the curve if grades are to be based on absolute standards of performance. Which of the several methods of absolute grading is best? 8. Should borderline cases be reviewed? If borderline cases are to be reexamined to decide on the appropriateness of the grades, here are some questions the teacher needs to address: How close to a cutoff point does a score need to be before it is considered borderline? Should only grades just below a cutoff be checked or should those just above be looked at also? What additional information should be examined to help make the borderline decisions? Should students be allowed to furnish extra credit work to raise a borderline grade?9. What other factors can influence the philosophy of grading? A teacher's personal philosophy of grading also can be shaped by department and university grading policies and general practices within their discipline. For example, some university grade-report forms provide descriptive phrases to define each grade symbol. In such cases, written university policy is inherent in the reporting form even though grading procedures are not prescribed explicitly. In the absence of written policy, however, the most recent grades issued become the norm; practices that depart noticeably from the norm are likely to be squelched, regardless of the philosophy of the grader. Establishing a Grading Plan This section of the module details the sequential steps involved in applying a personal philosophy of grading to form a personal grading plan. It is a personal plan because it incorporates the personal values, beliefs, and attitudes of the particular teacher who will use it to assign grades. And though a philosophy of grading is the foundation for establishing a grading plan, the plan is also shaped and influenced by current research evidence, prevailing lore, reasoned judgment, and matters of practicality. Step 1. Identify and implement university policy. If there is written university policy on grading, teachers are obligated professionally (and probably legally) to follow it. The policy may be in the form of detailed rules or it may be a set of general statements from a resolution or a student conduct code. It may simply be reflected in the reporting form sent to students, in the statements of purpose on the report card, or in the explanations of the meanings of the grade symbols used. What should you do if your philosophy and preferred grading procedures conflict with written policy? First, a discussion with your department chair may be the most reasonable approach because the chair is the first line of enforcement of university policy. If the results of such a meeting are not satisfactory, a next step would be to follow the existing policy while informally surveying your colleagues to see whether they would support a change. If so, efforts to alter the policy to fit the philosophies of the staff could be very productive. Step 2. Decide what the meaning of each grade symbol will be. There are three facets to the meaning of a letter grade, and the teacher needs to make a decision about each facet for his or her plan. First, the grade compares performance either to a relative standard (norm-referenced) or to an absolute standard (criterion-referenced). For example, a relative comparison is being made if a C grade means "average performance compared to others in the class," but an absolute comparison is being made if it means "demonstrated attainment of the most important objectives." It is essential for the teacher who adopts a criterion-referenced meaning to develop a description of the learning outcomes that define each grade symbol. Figure 1 illustrates the types of phrases that can be used to differentiate levels of performance on the absolute grading scale. (See also Sample UW Guideline) These phrases are contrasted with descriptors of relative grades that depend entirely on average performance to obtain their meanings. Note that to describe a "B student" using absolute standards, no reference is made to the achievements of other students. Instead, the comparison is based on the knowledge and skills studied and the extent to which prerequisites for future learning have been attained. The selection of a relative or an absolute grading standard is very critical because, once that selection is made, all of the tools of assessment that are used to obtain grading information should be designed in accordance with that selection -- either norm-referenced or criterion-referenced. |
||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
| A second facet of the meaning of grade indicates whether achievement or effort is being described. Obviously effort and achievement are not independent, but a single grade cannot describe both unambiguously. Ideally, separate grades or marks should be used for each trait so that the two can be described more purely at the same time. If only one grade can be issued, however, describing achievement rather than efforts seems more beneficial.
The third facet is a time-related reference -- growth vs. status. If a grade is to indicate the amount of growth from the beginning of the grading period until the end, the highest grades should be assigned to those who demonstrate the greatest gains. In many subject areas, those with high beginning achievement levels will likely be able to grow the least. In fact, in some units of instruction, the highest achieving student may grow very little, if at all. But, assigning a C or D [2.0 or below] grade to such a student seems counter to the general notion of what grades usually connote. In short, most parents, students, and teachers are interested in whether growth has occurred, as they should be. But more important to them is the level of achievement at a particular time and whether that level is sufficient for moving onto the next sequence of the instructional program. Step 3. Check the grade meanings against your instructional approach for logical consistency. A teacher who uses an outcomes-based approach or a highly individualized approach to instruction would not logically choose to use grades that have a norm-referenced meaning. Another teacher who depends heavily on the principles of cooperative learning would not likely use norm-referenced meaning. Another teacher who depends heavily on the principles of cooperative learning would not likely use norm-referenced grades because of the competition they breed. Teachers who are devoted to a specific instructional or teaching philosophy need to develop a grading plan that is compatible with their teaching philosophy. Step 4. Identify evaluation variables, reporting variables, and grading variables separately. The interpretability of a course grade will be jeopardized if the grade is made to carry too many pieces of information. This is the main reason why effort should be separated from achievement and growth should be separated from status when establishing the meaning of each grade symbol. Failure to make these separations will introduce irrelevant noise; static in any communication leads to misunderstanding and subsequent inappropriate decisions and impressions. One way of guarding against the threats to clear communication involves planning for evaluation. That is, just as plans should be made about what to teach and how to teach, concurrent plans should be made about the type of evaluation information that should be gathered during instruction. Teachers gather preinstructional information about students' entering behaviors, they gather additional information to monitor student and class progress, and they obtain further information to decide if students are ready to move on to a new instructional unit. Thus, the evaluation variables that teachers depend on include such learner characteristics as interests, preferences, academic ability, past achievements, attitudes, effort, conduct, study skills, interpersonal skills, and the like. There are too many such variables to enumerate, but teachers can identify many of them and make definite plans to gather information about them. But having gathered such a wealth of information, it is not their intention to report the outcomes or judgments about all of them to students. Ordinarily, they select a small subset of such variables, which can be called reporting variables as required by the institutional reporting methods, and they will use symbols or narrative comments to pass on the selected information. Finally, from the set of reporting variables described above, a teacher will select those that provide information that is consistent with the meaning of the grades the teacher plans to assign. This subset of reporting variables can be labeled grading variables. The teacher who is determined to use grades to describe achievement levels will temporarily set aside indicators of effort, demeanor, attitude, and congeniality in favor of performance assessments and scores on tests, papers, and projects. The latter reflect achievement more accurately. Note that it is possible to distort the meaning and value of certain grading components that, on the surface, appear to be relevant grading variables. For example, if the social studies essay scores of some students are reduced because of deficiencies in writing mechanics, how well do those scores describe achievement in social studies? If the teacher assigns an A to a group's project, what does that A mean for a number of the group who made little contribution to planning, conducting, or summarizing project activities? If the grade on a paper is dropped a full letter for each day it is late, what does the final grade on a late paper indicate about the achievement in language arts? If a student has an unexcused absence on the day of a test, what does an F grade for that test contribute to a quarter grade that is supposed to describe achievement? This is not the place to argue the merits of such policies or to explore alternative actions, but it is germane to point out that "relevant" grading variables can be distorted. Tainted component scores cause tainted composites. Tainted composites lead to misinterpretation. Step 5. Check to see what the grade distributions in your department have been like at your course level. If no written university policy exists, the grades issued in the most recent years will be the norm against which the reasonableness of each teacher's grades will be judged. How would your chair (and other faculty) react if your outcomes-based approach resulted in A grades for all of your students? This hypothetical question can not be answered, but it points out that grading patterns that depart significantly from local history generally will be questioned. Suppose you teach an honors class in math and also have a regular math class. Should the grade distributions be similar in the two classes? If the grades from the two classes were merged into a single distribution, would that large distribution have the same number of A grades as would be assigned in two regular classes (assuming no honors section)? If written policy does not speak to these issues, the grades from the past few years are probably the best indication of what the current outcomes should be like. Step 6. Decide on the kinds and number of grading components needed. Is it reasonable to base a 10-week English grade only on the score from a single test? Most would say, "Definitely not." Would scores form only two tests be sufficient? "Better," most would probably say, "but far from ideal." Generally, the more good information available for assignment of grades, the more likely those grades will represent actual achievement levels accurately. There is no minimum number of tests or other grading components that should be used; the overriding concern is to assess attainment of as many of the instructional objectives as possible so that grades will represent accomplishments with respect to the entire domain. The types of grading components required should be determine by examining what the instructional objectives require At this stage, it is also important to rule out the use of certain achievement-oriented evaluation variables from the set of grading variables. All of the instructional activities and exercises that students complete for practice purposes should be regarded as evaluation variables that inform teachers about progress during learning, not status indicators at the end of a learning experience. Daily homework, periodic quizzes, and responses to oral questioning are examples of evaluation variables that generally should not be regarded as grading variables. As long as a grade is intended to describe achievement status at the end of an instructional segment, assessment designed mainly to monitor progress during instruction should be excluded. Should the contribution of individual students to a group project be factored into the grading of the project or the quarter's work? Can individual contributions be teased out? Should all group members be assigned the same grade? Should teachers simply provide evaluative reactions to group work but not treat such results as a grading variable? Surely a student's grade should not be embellished or tainted by the achievements of others. Again, tainted composites lead to misinterpretation. Many assessment critiques require a particular communication skill (writing, reading, speaking, drawing) that may not be well developed in some students. For example, a preponderance of essay testing may favor good writers, or the use of only objective tests may disadvantage poor readers. Obviously, students with limited English proficiency will be at a disadvantage no matter which medium of communication is used. The components of a grade ought to be selected or developed so that achievement in the subject area of interest (e.g. social studies) will not be marked by the language skills required by the assessment method. Step 7. Determine how much weight each grading component will have. The role of instructional objectives is central to the process of combining grading components, just as it is for deciding which components to use. The task of formulating weights involves deciding how important each component score or grade is in describing achievement at the end of the grading period. The information in Table 1 illustrates the process of determining weights. |
||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||
Table 1 shows that three science units were completed during one quarter, each unit consisted of 12 objectives, and each unit was to have equal weight (as about 33%) in the quarter grade. The objectives measured by each grading component are identified by their number. Here is the initial thinking for determining the weights for the components of Unit I:
What factors entered into the thinking about component weights in the scenario above? One factor was the importance of the component as indicated in part by the number of objectives it encompassed. Another factor was uniqueness. Two components that measured any objectives in common were given less weight individually than two components that measured an equal number of unique objectives. (Notice how Objective 36 in Table 1 was handled.) A third factor, not evident in the scenario or Table 1, is the accuracy of the scores obtained from a component. For measures of similar skills, the one that provides the most accurate scores ought to be given the most weight. Step 8. Determine how components will be combined to create a composite score or final grade. Once component weights have been established, the teacher must decide how to combine the same. The considerations and procedures for proper weighting differ for the norm-referenced and criterion-referenced situations. The differences are detailed in another instructional module and will not be repeated here (Oosterhof, 1987). For norm-referenced purposes, the variability of the scores of each component influences the weight the component will have in the composite. For criterion-referenced purposes, it is the total points associated with each component that matters most. Step 9. Choose a method for assigned grades. The relative merits of the various common methods of assigned grades to composite scores are reviewed below. At this stage of establishing a grading plan, it is important for the teacher to choose or adapt a method of grade assignment that is consistent with the meaning that the grade symbols are intended to carry. Unfortunately, some of the most common methods of assigning grades yield results that are neither norm-referenced nor criterion-referenced. Consequently, teachers need to look carefully at methods of grade assignment that seem worthy of adoption. The final aspect of assigning grades is the matter of dealing with borderline grades. For some teachers, the question is not how to treat borderline cases; it's whether to do it at all. They regard their grading practices as rigid procedures that produce highly objective grade results. For them, a review of borderline cases could insert subjectivity into the process and lead to outcomes that they would feel uncomfortable defending. However, others are driven by the apparent subjectivity inherent in several aspects of the grading process and by the desire to be fair in grading. Their notion of fairness is to err in favor of the student and award the higher of two grades if an error is going to be made. The reconsideration of borderline cases, then, is one way to ensure that certain errors will not be too influential in determining a student's grade. What basis should be used for deciding whether to raise a grade in a borderline situation? Nearly always, achievement information that was not used to assign the tentative final grade should be taken into consideration. This advice is consistent with the premise that a grade should describe achievement rather than effort or some other trait. Homework quality, quiz score average, quality of class participation, and contributions to cooperative learning experiences are all possible achievement-oriented evaluation variables that could be suitable for borderline reviews. Some teachers hold one high-quality piece of achievement data in reserve for just such purposes. Grades derived from any of the relative grading methods will have certain shortcomings that are inherent in any grading intended to have a norm-referenced meaning. For example, unless the person interpreting the grade knows which reference group was used, the grade means very little. Was it the student's class, a combination of classes, or classes from the past two years? Further, by definition, a norm-referenced grade does not tell what a student can do; there is no content basis other than the name of the subject area associated with the grade. Grading on the Curve
Since some teachers who use the method rightly believe that classroom groups are too small for their achievement score to resemble a normal curve, they choose percentages that, in their judgment, are more realistic. So they may decide on 20%, 35%, 30%, 10%, and 5%. The percentages are selected arbitrarily and are treated like grade quotas so that the top 20% of students in terms of their composite scores will earn an A, the next 35% would be assigned a B, and so on. Grading on the curve is a simple method to use, but it has serious drawbacks. The fixed percentages are nearly always determined arbitrarily, and the percentages do not account for the possibility that some classes are superior and others are inferior relative to the phantom "typical" group the percentages are intended to represent. In addition, the use of the normal curve to model achievement in a single classroom is generally inappropriate, except in large required courses at the high school and college levels. Distribution Gap Method
In some score distributions there are many wide gaps; in others there are only a few narrow gaps. The sizes and locations of the gaps are determined by random errors of measurement as well as by actual differences among students in achievement. For example, Mike's 197 maybe would have been 203 (if there had been less error in his scores), and Theo's 205 maybe would have been 200. Under those circumstances, the A-B gap would be less obvious, and to many final grade decisions would have been made by reviewing borderline cases. When gaps are wide enough, this method helps the teacher avoid disputes with students about near misses. But when the gaps are narrow, too much emphasis is placed on the borderline information that the teacher had decided was not relevant enough or accurate enough to be included among the set of grading components that formed the composite. Only occasionally will the gap distribution method yield results that are comparable to those obtained with more dependable and defensible methods. Standard Deviation Method
Suppose you have formed composite scores for your class of 25 students and that the average was 129 and the standard deviation was 10. (Consult an introductory measurement or statistics book to see how to compute these statistics simply.) Assuming C to be the average grade, we can find the cutoff between B and C by adding, for example, one-half of the standard deviation to the average (129 + (0.5)(10) = 134). Then the A-B cutoff is found by adding 1.5 standard deviations (for example) to the average (129 + (1.5)(10)= 144). By subtracting corresponding values from the average score, the C-D cutoff is found to be 124, and the D-F cutoff is 114. (Can you verify these values?) The ranges for each grade are the following: A = 145 and up, B = 135 - 144, C = 124 - 134, D = 123 - 114, and F = 113 and below. These ranges can be made smaller or larger for groups of higher or lower ability level by adjusting the number of standard deviations used to find the cutoffs. For a particularly able class, for example, the A-B cutoff might be only one standard deviation above the average and the B-C cutoff might be 0.3 above, rather than 0.5. Unlike grading on the curve, this method requires no fixed percentages in advance, and unlike the distribution gap method, the cutoff points are not tied to random error. When the teacher has some notion of what the grade distribution should be like, some trial and error might be needed to decide how many standard deviations each grade cutoff should be from the composite average. When a relative grading method is desired, the standard deviation method is most attractive, despite its computational requirements. Absolute grading methods produce grades that share some general shortcomings, independent of the particular method that generated the grades. For example, unless they are accompanied by a description of the performance standards or the content domains that have been studied, the meaning of an absolute grade is obscure. Furthermore, no criterion-referenced grading method produces grades that are strictly absolute in meaning. Such grades are based on performance standards that nearly always have normative basis. A "B writer" should be able to use correct referencing techniques, the teacher may say, but if most college students do not and cannot, the standard is likely to be lowered to reflect reality (the norm). Note that adjusting grades instead of modifying the standards would contribute to meaningless grades. Fixed Percent Scale
Unfortunately, a percent score will be meaningless unless the domain of tasks, behaviors, or knowledge upon which the assessment was based is defined explicitly. That is, a test score of 100% should mean that the student has complete or thorough attainment of the key elements of the area of knowledge that was sampled by the test. But if an assessment is developed in such a way that the underlying content domain is ill-defined or nebulous, the percent-correct scores from it will have no meaning beyond the specific tasks that comprise the assessment. Scores of 80% on a math test and 75% on a speech say little about performance unless we know the difficulty of the domain of math problems and which important criteria were used to score the speech. In sum, percent scores cannot provide a reference to absolute performance standards unless the underlying knowledge domain is adequately described. Another serious drawback of this grading method is the fact that the percent-score ranges for each grade symbol are fixed for all grading components. For example, the fact that 93% is needed for an A places severe and unnecessary restrictions on the teacher when he or she is developing each assessment tool. If the teacher believes there should be some A grades, a 20-point test must be easy enough so that some students will score 19 or higher; otherwise there will be no A grades. This circumstance creates two major problems for the teacher as the assessment developer. First, it requires that assessment tasks be chosen more for their anticipated easiness than for their content representativeness. As a result, there may be an over representation of easy concepts and ideas, an overemphasis on facts and knowledge, and an under representation of tasks that require higher order thinking skills. The teacher may need to "fudge" on the domain definition to accommodate the fixed grading scale. A further limitation of this method relates to the accuracy of the assessment information obtained. Since the grade cutoff scores usually are located between the 60% and 100% points on the percent scale, most of the scale points (0-60) are of no value in describing the different absolute levels of achievement. For example, if A and B performance must be in the range of 85%-100%, the very best B achievement and the very worst B achievement are separated by only eight points (85-92), as are the very best and very worst A achievements (93-100). These are fairly narrow score ranges, especially considering the fact that a 100-point scale is available for use. Because these ranges are narrow and fixed, they will contribute to fairly inaccurate grades when the scores of any single grading component are not very dependable. If the grade ranges could be made larger when the scores of certain components are fairly inaccurate, then more accurate grades would probably result. The fixed percent scale method usually produces grades that have little meaning in terms of content standards, and it often yields grades that are of questionable accuracy. The percent cutoffs for each grade are arbitrary and, thus, not defensible. Why should the cutoff for an A be 93, 92, or 90? Further, why shouldn't the A cutoff be 88% for a certain text, 91% for another, and 83% for a certain simulation exercise? Is there any reason why the same numerical standards must be applied to every grading component when those standards are arbitrary and void of absolute meaning? Total Point Method
One of the difficulties of using this method is that often a decision has to be made about the maximum score on a project or test before the teacher has had ample time to think about the key ingredients of the assessment. Here's how this circumstance can contribute to poor assessment development practices: Suppose I need a 50-point test to fit my grading scheme, but I find as I build the test that I need 32 multiple-choice items to sample the content domain thoroughly. I find this unsatisfactory (or inconvenient) because 32 does not divide into 50 very nicely (It's 1.56!) To make life simpler, I could drop 7 items and use a 25-item test with 2 points per item. If I did that, my point totals would be in fine shape, but my test would be an incomplete measure of the important unit objectives. The fact that I had to commit to 50 points prematurely dealt a serious blow to obtaining meaningful assessment results. Another potential drawback to the total point method is the ease with which extra credit points can be incorporated to beef up low point totals. This practice can simultaneously distort the meaning of the content domain and final grade. When the extra tasks are challenging and relevant to current instruction, this seems like a reasonable way to individualize and motivate high achieving students. In such cases, the outcome is likely to make high point totals even higher. But extra credit that simply allows students to compensate for low test scores or inadequate papers is not reasonable, especially if the extra work does not help them overcome demonstrated deficiencies. The point here is that this method of grading makes it convenient for teachers to allow extra credit work of the latter form to compensate for low achievement. When that happens, the grades take on a new meaning because the relevant domain of knowledge and skills gets redefined by the nature of the extra credit tasks. Content-Based Method
Suppose you have prepared a 30-item test to measure the achievement of most of the objectives in a unit of instruction. Assuming that grades A through F will be assigned to test scores, you will need to develop a brief description of the performance levels you expect students to reach for each of the five possible grades. For example, you might describe C expectations as "knows basic concepts and can do the most important skills; lacks some prerequisites for later learning." Using descriptions like these, you can begin an item-by-item review of the test. For question No. 1, ask whether a student with only minimum achievement (D) should be able to answer correctly. If so, record a D next to the item; if not, pose the same question for grade C achievement. This process continues until the first item has been classified. For items that the teacher believes most A students will not necessarily answer correctly, a symbol such as N can be used to indicate that no grade level applies. (For items worth more than a single point, you will need to decide the minimum number of points that students at each achievement level should be able to earn.) After you have classified each item with a symbol, the D-F cutoff score is found by adding the number of D symbols. Then the C-D cutoff is obtained by adding the number of D and C symbols. The B-C cutoff is the sum of D, C, and B symbols, and the A cutoff is the sum of the D, C, B, and A symbols. To account for negative errors of measurement, you should lower each grade cutoff by one or two points. Such adjustments for error at this stage of grading would make it unnecessary to review borderline cases at a later time. All grading methods involve subjectivity, and this one requires two main types of subjective decisions. The first type entails the development of explicit expectations for the achievers at each of the letter-grade levels. What is B achievement like and how is it different from C achievement? Good teachers might disagree with one another about how to define these performance standards. The other subjective decision making occurs when items are reviewed to determine the grade category to which each one belongs. Again, good teachers may disagree about whether a "B student" should be able to answer a particular item correctly. Notice that these two types of judgments do not require that subjective decision be made about individual students. There is no need to decide, for example, whether Jana is a C student or whether Matt could answer a certain question correctly. The judgment required here is about standards and about the particular tasks that students at each level should be expected to do. Personal Grading Practices Evolve Since both philosophies and instructional approaches change as curriculum changes, teachers need to be prepared to adjust their grading plans accordingly. With experience in assigning grades, reporting to students, and observing the impact of grading on learning, many teachers rethink their responses to the philosophical questions enumerated in the "Developing a Grading Philosophy" section. The meanings of the symbols, the characteristics to be judged, the components to include in a grade, and the method used for assigning grades are all issues of value that take on new importance or new meaning as teachers accumulate grading experience and observe the practices of colleagues. Grading practices also may change as a teacher's instructional approach changes. For example, a teacher who begins experimenting with cooperative learning strategies would start depending more on group projects and presentations for assessment information. The nature of the grading components being used may need to change, as would any grading practices that foster competition among learners. In short, a teacher's grading practices are likely to evolve slowly over time as his or her grading philosophy changes, as experience in grading accumulates, and as a base of grading data from several classes becomes available. As the nature of the curriculum changes and teachers fine-tune or modify their instructional approaches, the procedures outlined here can be reviewed to adjust inconsistencies in philosophy and practice. Reference Oosterhof, A.C. (1987). Obtaining intended weights when combining students' scores. Educational Measurement: Issues and Practice, 6(4), 29-37. Annotated References The references in this section cover a broad range of topics on grading, as do several other excellent introductory measurement texts. We have chosen to highlight some of the unique or particularly strong parts of these references as an aid to those who seek additional reading. Carey, L.M. (1988). Measuring and evaluating school learning. Newton, MA: Allyn and Bacon. Chapter 13.
Ebel, R. L. & Frisbie, D.A. (1991). Essentials of educational measurement (5th edition). Englewood Cliffs, NJ: Prentice Hall. Chapter 15.
Hills, J.R. (1981). Measurement and evaluation in the classroom (2nd edition). Columbus, OH: Bell & Howell. Chapters 14-19.
Oosterhof, A.C. (1990). Classroom applications of educational measurement. Columbus, OH: Merrill. Chapters 21-22.
| ||||||||||||||||||||||||||||||||||||||||
| Note: This instructional module has been adapted from the original with permission by changing specifics to better fit the University of Washington. In most cases, we have not altered the article to fit the UW's 4.0 to 0.0 grading system due to the complexity involved. However, the underlying principles, applied here to A through F grades, also apply to UW's numerical grading system. |
||||||||||||||||||||||||||||||||||||||||
Last updated: 09/04/07 |
||||||||||||||||||||||||||||||||||||||||