Towards a Unified Theory of Grading

Grading is a form of communication. The problem is that it’s a shorthand form of communication used by people who do not agree (or even discuss) what the symbols mean. Worse, each recipient of these symbolic communications has their own rough idea of what they mean, based on their own experience and anecdotal evidence, and often interpret them much more broadly and personally than they are intended.

Students probably feel about grades much the same way that I feel about ballet: the work which goes into producing them is impressive, it’s supposed to be for my benefit, but the dancers seem to mean a great deal more than I’m getting. The gestural vocabulary seems arbitrary, the ultimate aims are unclear, and the standards of quality are ineffable. Obviously, some of this can be overcome with preparation: reading the program at a ballet is like reading the syllabus at the beginning of the semester, giving you a rough idea of what the point of the exercise is and what’s worth keeping an eye on and what you’re supposed to get out of it. But even though students experience grades more often than they do ballet, the range of meanings and purposes and uses of grades makes it a great challenge to understand systematically, like trying to develop an sense of dance aesthetics without ever seeing two dances in the same genre.

To try to correct some of the inscrutability of grades, it would help immensely if we teachers could begin to articulate some general principles which we try to uphold publicly. [ed. — that’s awful, try again] In other words, what’s grading about? Why do we give grades, and how can we make grades more consistent and more effective communications? I’m sure others have trod this path before me, but I’m going to start with an attempt to articulate the purposes of grades to the various constituencies of the academy, then suggest how we can begin to unify some of the cleavages between these constituencies.

The fundamental constituencies of higher education are:

  • Teaching Faculty (though there are differences between adjunct and tenure-track, tenured and untenured, etc., I’m going to ignore those for the moment)
  • Students (student level and major matter, but not for the moment)
  • Administrators (including faculty “wearing their administrative cap” as chairs, deans, etc)
  • Employers (including graduate schools; anyone who might read a transcript or wonder about a former student’s academic performance).

The basic functions of grades are:

  • Rewards and Punishments for motivating work
  • Task-based Performance evaluation, an absolute judgment on the quality of presented work, often in the form of a test score or paper grade
  • Class or school normed Relative Evaluation, grades curved or adjusted to match local norms or to produce a distribution rather than a description
  • An evaluation of the personal work habits and character of the student, irrespective of the performance
  • A measure of the Potential of the student. Note how we talk about “A-students” and “B-Students” and “C-students” etc, as though the grades were inherent in their persons.
  • To prove that the student and teacher did something over the semester.

Historians always get criticized for choosing “boring” narrative presentations over many “stimulating” alternatives, so let’s try to put these constituencies and purposes in a chart and see if that helps.

Table 1: Grading Constituencies and Purposes

Faculty Students Administrators Employers
Rewards / Punishments necessary definitely useful
Evaluate Performance primary not consistent some
Relative Evaluation often mostly yes at best
Personal Evaluation marginal effect strongly felt often usually
Measure Potential sometimes depends on relationship of course to career plans why else ask for a transcript?
Work Product required complicated definitely

Note: Cells left blank may be read as “irrelevant” or “couldn’t care less.”

What is immediately obvious from this table (aside from the fact that narrative presentation isn’t the worst thing in the world) is the disjunction between the intent of grades when given by faculty and the interpretation of grades by everyone else. Not that this is a surprise to almost anyone who has given grades. Since teachers are the ones giving grades, and are in a position to influence directly the thinking of students and administrators (not much we can do about employers for a generation or two), I think that the most important actor in this dynamic is the faculty, and they/we are the primary target of what follows.

Why do we give grades? The historical reasons are somewhat obscure: teachers have been ranking pupils and test-takers for millennia. So we have to untangle what we do today, rather than talk about “original intent.” What does it mean when we give a grade on an assignment, or at the end of a semester?

The primary purpose of grading is a measure of “quality” (cf. Socrates, Pirsig), specifically the quality of a student’s performance. This is a pretty strong consensus definition of grading intent, I think. Not only do I think this is the right emphasis, I think that the pseudo-legal syllabus-as-contract concept is going to force us to be increasingly careful (rigid and detailed) about defining assignments and grading standards. One way to think of this is to consider the grader as an agent of karma: it’s not personal, really, it’s the natural function of the universe to assign merit or demerit to actions. Or, as Robert Ingersoll said, “In nature there are neither rewards nor punishments; there are only consequences.” So it should be with grading.

Grading may be absolute or relative: generally speaking, task-specific grades are more likely to be absolute, while semester-end grades are more likely to be relative and to include “intangible” elements like effort (sometimes folded into a “participation” grade) and improvement over time (a.k.a. Trending). Sometimes both absolute and relative grades are given, as on a test where the number of questions/points is given along with a curved letter grade. Relative grading is where a lot of the grade inflation comes from, particularly combined with standards-lowering pressures. Interestingly, many faculty who set clear absolute standards, even relatively low ones, find their overall grades declining. Of course this usually creates a backlash, enforced by student evaluations of faculty and administrative pressure, to shift back to relative grading. Absolute grading is often seen as imposing an outside standard (e.g., my use of a paper grading rubric developed at Harvard), a charge made more common by the frequency with which newly minted Ph.D.’s are hired by lower-tier institutions.

Grading should not be an evaluation of the student’s personality, moral character, or attractiveness. I know that a lot of students think it is, and there might indeed be unconscious or subtle shadings, but I also know that a lot of faculty work hard to ensure that their personal feelings or unacknowledged biases do not affect grading. Sometimes a result of kindness, as in a semester where personal affairs intervened in academic affairs, an end-of-semester grade ignores or undervalues the actual work performed in favor of consideration of the general abilities and future potential of the student [ed.- nice use of weak passive voice]. This may be unavoidable, even laudable, but should be very, very rare. So rare that students don’t think it ever happens.

One of the stickier bits of the Grading Knot is the way in which grades are used as motivating tools. That grades work as motivating rewards and punishments depends a great deal on students’ taking grades personally. A “D” motivates a student to do better only if they care that they got a “D” instead of a “C” or “B”; an “A” only creates a sense of accomplishment if the student feels a strong sense of connection to their work or to the grade as a measure of themselves. Students don’t just get grades, they feel graded, and only insofar as that is true are grades effective rewards and punishments. Now we can say that we don’t use grades in that way, but we do: for example, by including in a course grade ‘preparation’ or ‘participation’ we raise the stakes for these activities-good-students-do-anyway and punish the students who don’t follow that model. And I can’t be the only person who feels like an deliverer of justice when giving a justly deserved bad grade to a bad paper, or a good grade to a student who has clearly worked on skills and content. We want to give something extra to those who work harder, and punish those who slack and slide. But we shouldn’t, and if we can articulate clear grading standards, we shouldn’t need to.

Another complication the way in which grades serve as “proof of purchase” for courses taken, a nexus for the economic relationships of academia. Once a student registers for a course, there is usually a short window in which they must decide if the course will “count” or if they will withdraw; generally students paying by the credit hour may get a full refund on their payments during this window. Many institutions permit withdrawal through the first half of the semester but include a “W” non-grade on the transcript; some permit later withdrawals, though some of those note particularly late retreats with a “WF” [withdrawn failing]. Assuming that a student does not withdraw though, the course is paid for and a grade must be assigned at the end of the term. Filing grades is the last duty of faculty at the end of a term (oftentimes coming after graduation ceremonies). The grade is proof that the student took the class, as well as proof that the instructor did their job of evaluating student work, and it is crucial for the institution to “close the books” on the semester.

Whose grade is it? The student has paid for course, but only gotten the privilege of being graded; the instructor is, in some sense, a service provider, but the contract is with the institution rather than the student; the institution has a financial relationship with both student and instructor, but can make no promises to either about delivery of satisfactory goods. The student would like the highest grade possible, or at least one in line with their generally inflated sense of achievement (there’s research to back this up) and “needs.” The instructor has an implicit (remarkably, I don’t think I’ve ever seen a faculty handbook which actually addressed this explicitly) duty to evaluate all students’ work comprehensively and fairly. The instructor’s academic reputation also rests, to some extent, on the grades they give: generally low grades can enhance an image of rigor and quality, but the attendant low student evaluations can create doubts about teaching ability; generally high grades can create a cadre of very satisfied students, and doesn’t really hurt one’s reputation for quality unless word gets around; average grades, however achieved, never really redound to the instructor’s benefit unless paired with some other extraordinary achievement. The institution’s need for satisfied students and satisfactory rates of progress and graduation conflict with its need for a reputation for rigor and quality product (i.e. students).

Grading is indeed a Gordian knot, insoluble by simple compromise. The desires and purposes and interpretations of grades are too disparate to integrate gently. Like Alexander, we must cut the knot instead of unraveling it. As I’ve suggested above, the only fundamentally satisfactory long-term solution is for faculty to develop (with input from the other constituencies) and articulate and apply consistent and (mostly) absolute standards for grades for assignments and for courses. If the consensus is strong and reasonable and clear, it will frame the discourse in such a way as to bring the other constituencies into both understanding and participation in the educational process in new ways.

Jonathan Dresner is Associate Professor of East Asian History at Pittsburg State University. He blogs at Frog In A Well: Japan, among other places. This was originally published at Education News in January 2005. Previous writings include “Grade Inflation… Why It’s A Nightmare”.

  1. David Doria says:

    One thing that has always bothered me is the granularity of grading scales. The point of the exercise of giving a grade should be an attempt to classify how well the student learned the material. It seems reasonable to classify this level of understanding into “not at all”, “not very much”, “ok”, “pretty well”, and “excellent”, which seem to correspond to the typical F,D,C,B,A. What does NOT make sense is to assign a number on a scale of 0-100. A grade of 67 seems to indicate somehow that 67 percent of the material was learned. This is, however, not at all the case. Rather, it means that the student answered 67% of the particular questions posed on this assignment correctly. It is extremely rare for faculty to ask exactly the right set of questions to determine if every concept was learned in a reasonable way, so this number seems just about meaningless. I have always been in favor of oral exams. I find it extremely easy to, within a 5 minute conversation with a student, classify their understanding of what they should have learned into one of the five categories described above. I guess at some level you’d have to buy into my “Teach the Why not the How” concept ( to realize that it doesn’t really matter if a student is able to produce the correct numerical value on an exam question, but it is EXTREMELY CRUCIAL that they understand the general “what is going on”. I understand that in large classes they are not reasonable, and this would certainly need to be addressed in order to implement such a system on a large scale.

    Thanks for the stimulating post!

  2. cody.v says:

    great read, looking forward to taking your class this semester.

