This guest post is contributed by Kathryn McDermott and Lisa Keller. McDermott is Associate Professor of Education and Public Policy and Keller is Assistant Professor in the Research and Evaluation Methods Program, both at the University of Massachusetts, Amherst.
On March 29, the U.S. Department of Education announced that Delaware and Tennessee were the first two states to win funding in the “Race to the Top” grant competition. A key part of the reason why these two states won was their experience with “growth modeling” of student progress measured by standardized test scores, and their plans for incorporating the growth data into evaluation of teachers. The Department of Education has $3.4 billion remaining in the Race to the Top fund, and other states are now scrutinizing reviewer feedback on their applications and trying to learn from Delaware’s and Tennessee’s successful applications as they strive to win funds in the next round.
One of the Department’s priorities is to link teachers’ pay to their students’ performance; indeed, states with laws that forbid using student test scores in this way lost points in the Race to the Top competition. A few months ago, James pointed out some of the general flaws in the pay-for-performance logic; here, our goal is to raise general awareness of some statistical issues that are specific to using test scores to evaluate teachers’ performance.
Using students’ test scores to evaluate their teachers’ performance is a core component of both Delaware’s and Tennessee’s Race to the Top applications. The logic seems unassailable: everybody knows that some teachers are more effective than others, and there should be some way of rewarding this effectiveness. Because students take many more state-mandated tests now than they used to, it seems logical that there should be some way of using those test scores to make the kind of effectiveness judgments that currently get made informally, on less scientific grounds.
The problem is that even if you accept the assumption that standardized tests convey useful information about what students have learned (which we both do, in general), measuring the performance gains (or losses) of students in a particular classroom is far more complicated than subtracting the students’ September test scores from their June test scores and averaging out the gains. We’re concentrating on the statistical issues here; there are other obvious challenges in test-based evaluation, such as what to do for teachers who teach grade levels where students do not take tests and/or subjects without standardized tests.
Continue reading “Rewarding Teacher Performance? Resist the Temptation to “Race to Nowhere”” →