By James Kwak
While the spreadsheet problems in Reinhart and Rogoff’s analysis are the most most obvious mistake, they are not as economically significant as the two other issues identified by Herndon, Ash, and Pollin: country weighting (weighting average GDP for each country equally, rather than weighting country-year observations equally) and data exclusion (the exclusion of certain years of data for Australia, Canada, and New Zealand). According to Table 3 in Herndon et al., those two factors alone reduced average GDP in the high-debt category from 2.2% (as Herndon et al. measure it) to 0.3%.*
In their response, Reinhart and Rogoff say that some data was “excluded” because it wasn’t in their data set at the time they wrote the 2010 paper, and I see no reason not to believe them. But this just points out the fragility of their methodology. If digging four or five years further back in time for just three countries can have a major impact on their results, then how robust were their results to begin with? If the point is to find the true, underlying relationship between national debt and GDP growth, and a little more data can cause the numbers to jump around (mainly by switching New Zealand from a huge outlier to an ordinary country), then the point I take away is that we’re not even close to that true relationship.
In their response, Reinhart and Rogoff also argue that it is correct to weight by country rather than by country-year. Their argument is basically that weighting by country-year would overweight Greece and Japan, which had many years with debt above 90% of GDP. Herndon et al. recognize this point:
“RR does not indicate or discuss the decision to weight equally by country rather than by country-year. In fact, possible within-country serially correlated relationships could support an argument that not every additional country-year contributes proportionally additional information. . . . But equal weighting by country gives a one-year episode as much weight as nearly two decades in the above 90 percent public debt/GDP range.”
Both weighting methods are flawed. (Country-year weighting is only flawed if there is serial correlation, which there probably is; if the U.K.’s nineteen years with debt greater than 90% of GDP were independent draws, then it should be weighted 19 times as much as a country with only one such year.) But this brings me to the same point as above: if your results depend heavily on the choice of one defensible variable definition rather than another, at least equally defensible definition, then they aren’t worth very much to begin with.
Here’s another way to put it. Let’s concede the weighting point for the sake of argument. If Reinhart and Rogoff had not made any spreadsheet errors in their original paper—that is, if the only factors at issue were country weighting and data exclusion—they would have calculated average GDP growth in the high-debt category of 0.3%. If they then added the additional country-years as they expanded their data set, while sticking with their preferred weighting methodology, that figure would have jumped to 1.9%—and the 90% “cliff” would have completely vanished. (See Herndon et al., Table 3.) What happened is that Reinhart and Rogoff’s choice to weight by country rather than country-year makes their method extremely sensitive to the addition of new data.
The question to ask is this: If a method produces results that can drastically change by the addition of a few more data points, are those results worth anything? The answer is no.
Update: Or, as Mark Thoma said (not necessarily about Reinhart and Rogoff directly):
“It’s even more disappointing to see researchers overlooking these well-known, obvious problems – for example the lack of precision and sensitivity to data errors that come with the reliance on just a few observations – to oversell their results.”