# A mistake with consequences?

There is an interesting new post up at KlimaZweibel about a paper by Smerdon et al.. This is going to be all over everywhere very soon, so I may as well jump in.

The title, of course, is a snark at RC; see the article A Mistake with Repercussions which points out some errors in a Zorita and Von Storch paper (they got their model setup wrong). [I’ve just snarked them in their comments; it will be intersting to see if it stays]

In this case the problem is rather more arcane, but worth explaining, so let me do that first.

[Update: no, let me first point out that there is a response by Rutherford et al. which appears to say that they fixed all these problems ages ago.]

Suppose you have created a spiffy new method for reconstructing past climate from proxy data, intended to be used over the past 1000-2000 years or whenever. You can try to test that on real-world data, of course, but you run into an immeadiate problem: you don’t know in advance what the right answer is, so you’ll never know how good your method is.

The obvious solution to this problem is to use fake data (OMG I’ve used the word “fake”! Scandal! Pushing it a bit, I could even call this a “Trick” Don’t tell the fools). One approach would be to use simple random data, but this then runs into another problem: the statistics of the random data won’t look anything like the real-world data. So, a far better solution is to take climate model data and use that as your testbed. In this case, you take a long integration of one or more climate models. This is great: the model looks something like the real world (though it doesn’t have to be desperately realistic, and it doesn’t have to have tracked the actual yearly or decadal rise and fall of real-world temperatures), and you now do know the Right Answer: viz, the mean annual temperature (or the hemispheric mean, or whatever it is you care about), because since you have the full global model output you can trivially calculate it.

Then, you take the know locations of your proxies (and you can even include the changing number of locations over time, or model the effects of including more or less) and you interpolate the model data to these locations. And you can even add a carefully-calibrated amount of noise to the interpolated value, to mimic the proxy not providing a true temperature. And then you run this data through your method, and you then compare it to the Right Answer.

What Smerdon et al. say is that Mann et al. have made various errors in their handling of the model data used to test the reconstruction methods: they have got the smoothing wrong, or they have switched locations by 180 degrees. That doesn’t invalidate the methods (a point I’m sure will get lost in the blogonoise) but (if correct) would invalidate the testing.

Add1: so, the point about the test-method only needing to work on data that has some kind of bearing on the real world is the explanation for

As Smerdon et al. correctly point out, this error does not impact the qualitative conclusions drawn from the results and described in Mann et al., 2007a (cf. Figure 1). The global field was still reasonably sampled, and the pseudoproxy locations, while not correct in longitude, are correct in latitude, and reasonably sample the field. It should also be noted that real proxy locations can vary considerably based on various inclusion/exclusion metrics that accept or reject proxies when building an actual proxy network. In fact, our network “D” in Mann et al., 2007a actually used random pseudoproxy locations.

in the R et al. reply. Even if you get all the locations wrong by 180 degrees longitude, your test of the method is still probably a reasonable test (note: that is, as a test of the method. Remember that is what all this is about) because the climate is moderately symmetrical wrt change of longitude (but isn’t wrt change of latitude, obviously).

Add2: there is more that a suggestion that there may be a certain amount of academic point-scoring going on here. Rutherford et al. conclude

In summary, the issues raised by Smerdon et al. (2010), while factual, have no material impact on any of the key conclusions of Mann et al. (2007a). Additionally, they have no impact whatsoever on subsequent studies by us (Mann et al., 2009; Rutherford et al., 2010) where the technical errors they note did not occur, and which reach identical conclusions. In light of these considerations, we are puzzled as to why, given the minor impact the issues raised actually have, the matter wasn’t dealt with in the format of a comment/reply. Alternatively, had Smerdon et al. taken the more collegial route of bringing the issue directly to our attention, we would have acknowledged their contribution in a prompt corrigendum. We feel it unfortunate that neither of these two alternative courses of action were taken.

And indeed that last point has some force. If you find something wrong with someone’s paper, the polite course of action isn’t to rush into print saying “ha ha you’re wrong” but to raise it with the original authors. Of course if you do that then you don’t get an all-important publication point out of it (and a comment counts for less than an article, too). If you raise it with the authors and they blow you off then off course you can go into print.

Add3: probably more important than this, is to look at fig 5(a) from Smerdon et al. What strikes you there is not the difference between the red and blue lines, but the diffrernece between the red/blue lines and the black line – which is to say, neither the wrongly-sampled nor the correctly sampled reconstruction is doing a good job of reconstructing the variance of the Right Answer. As Rutherford et al say

First, Mann et al., 2005 used the Regularized Expectation Maximization method with Ridge Regression (RegEM-Ridge) as a regularization method. RegEM-Ridge has been shown to suffer from a loss of variance when reconstructing the hemispheric mean (Zwiers and Lee, pers. comm., August 2006; Mann et al., 2007a,b; Smerdon and Kaplan, 2007) which is not the case with RegEM-TTLS (Truncated Total Least Squares). This led Mann et al., 2007 to use the TTLS implementation of RegEM. This being the case, we will confine our comments to Mann et al., 2007a. However, it is important that the reader recognize that Smerdon et al. (2010) used RegEM-Ridge and that their results shown in Figure 5(a) show the expected variance loss of a RegEM-Ridge reconstruction whereas RegEM-TTLS faithfully reconstructs the target series (Figure 1).

And indeed, if you look at their figure 1 you see that the shiny new method does a much better job.

Add4: at KZ, Eduardo said: They also assert that the errors have been corrected in subsequent studies. And yet, Rutherford et al. continue to show the wrong NH mean temperature simulated by the ECHO-G model – compare figure 1a in the manuscript by Rutherford and Figure 5b in Smerdon et al 2010. It is obvious that these error have not be corrected. to which I replied:

Fair point. I asked about this, and the wrong figure was transcribed. Looking, the PDF response has been updated to show the correct figure. Both pix are in fig 1 of the Rutherford et al. reply to Smerdon (http://www.meteo.psu.edu/~mann/Mann/articles/articles.html); it looks like they transcribed the wrong one.

Poissonally I’d prefer it if people kept old versions around to compare to rather than updating; but then again, I don’t do that with the blog, cos the software won’t let me.

Add5: the transcription error is now confirmed by Mann at an RC comment.

## 27 thoughts on “A mistake with consequences?”

1. dhogaza says:

The response looks good …

Like

2. carrot eating cockroach says:

You managed to write that many words without giving the term pseudoproxy?

Oh well, this probably will blow up the blogs for a couple months. For the theatrics of it, I actually hope Mann did something unambiguously incorrect here, so people can just say that, have it corrected, and move on.

[Then you will be happy, nearly. http://holocene.meteo.psu.edu/shared/articles/RMWAcomment_2010_jclim_smerdonetal.pdf says “Smerdon et al. (2010) describe two technical errors in the model grid data used in Mann et al. (2005, 2007a). They are correct in the discovery of these errors. At the same time, we feel that they have not adequately addressed the fact that both errors did not occur in subsequent publications and that the main conclusions of Mann et al., 2007a, which supercedes Mann et al., 2005, are not impacted.” So as near as I can tell the assertion is that yes the error is present but no it doesn’t matter any more -W]

Like

3. carrot eater says:

Helps to at least skim the paper and comment before commenting.. this ought not be a hullabaloo, it looks like it’s been worked out. Good on the authors for finding the errors.

Like

4. dhogaza says:

” So as near as I can tell the assertion is that yes the error is present but no it doesn’t matter any more

There were other errors, though, hopefully you’ll work through it.

Most interesting to me is that apparently they accidently stumbled on a bug in one of the canned analysis routines they used (subsequently, they’ve worked around it, so this too doesn’t matter in any big-picture way).

Like

5. PolyisTCOandbanned says:

I wonder why they made the error earlier and then did not make it later? Was it just a natural evolution for other reasons and they did not realize the first error? Or did they fail to correct something they new was wrong (by corrigendum or by some explicit “we are using the 2007 paper to correct a wrong practice of the 2005 paper and this is exactly what it was”.

I find Mike to be pretty darn opaque in correcting himself explicitly. Even when he does correct himself, it seems like he does it en passent and without really plainly, clearly doing so (but giving himself something to refer to if challenged…)

Like

6. dhogaza says:

I wonder why they made the error earlier and then did not make it later?

Well, if you read their response, it seems clear enough. I don’t find it opaque.

1. They published the first paper in 2005, using RegEM-Ridge as a regularization method.

2. They were informed of potential problems using RegEM-Ridge by Zwiers and Lee (via personal communication, i.e. informally) in 2006.

Note the two dates. The answer to your question should be obvious.

“RegEM-Ridge has been shown to suffer from a loss of variance when reconstructing the hemispheric mean”

I take this to mean that the issue wasn’t known in 2005, and reading between the lines, it probably took Zwiers and Lee a fair amount of analysis to demonstrate the problem. Obviously it wasn’t obvious beforehand, otherwise the obvious mistake would not have been made.

3. So they switched to another form of RegEM implementation, Truncated Total Least Squares, which has been shown not to suffer from the problem discovered with RegEM-Ridge, and published again in 2007.

I assume this is the error you’re discussing? There are others listed as well …

Apparently when all the errors are accounted and corrected (or worked around, in the case of the problem with the canned generic mapping tools – library, presumably- that they were using), there’s no significant change to the conclusions of the paper.

If it weren’t Mann, and if it weren’t The Hockey Stick, I doubt if such a fine-tooth comb would’ve been applied to the paper.

But it’s good that minor errors were found and fixed, because it strengthens the work in the end.

Compare this with McIntyre’s efforts …

Smerdon seems to be a straight shooter, in his publications list he links to Rutherford, Mann et al’s reply to an earlier paper of his regarding RegEM (the Smerdon 2008 paper mann refers to in the rebuttal to the current paper being discussed here).

Apparently there’s some history here, though, and it’s not exactly friendly …

we are puzzled as to why, given the minor
impact the issues raised actually have, the matter wasn’t dealt with in the format of a
comment/reply. Alternatively, had Smerdon et al. taken the more collegial route of
bringing the issue directly to our attention, we would have acknowledged their
contribution in a prompt corrigendum. We feel it unfortunate that neither of these two
alternative courses of action were taken.

Like

7. Pete Dunkelberg says:

So the correction is well behind the times. Science has already been self correcting and moved on. Let us do the same and not make red noise with vague personal remarks.

Like

8. carrot eater says:

I think the complaint about being collegial was a bit out of place, in that venue.

[There is probably background here that you aren’t seeing -W]

I see WMC has taken note of Science of Doom. A marvelous site, isn’t it? Filled a certain niche.

I also enjoy that when I google stoat, the description is now, “Taking science by the throat… climate, rowing, and misc trailing stuff.”

[No idea what I meant by “trailing” so I’ve removed that -W]

Like

9. Rattus Norvegicus says:

I think that McIntyre has spawned a bit of a cottage industry in pointing out non-critical problems in Mann’s studies. Maybe it’s an easy way to get publication points…

IIRC, the 2005 paper, was comparing the performance of RegEM-Ridge with previous methods (MBH). I agree with the observation that earlier Mann methods did tend to suppress longer term variability, but of course it appears that Mann also agreed with this observation and then improved his methods. Isn’t this supposed to be how science works?

The fact that the Smirnoff paper was based on a method no longer used should have been a big hint to the editors and reviewers…

Like

10. Steve Bloom says:

SoD is interesting, ce, although I do detect a bit of a lean toward the septics. Maybe it’s just rhetorical, but not long ago that seemed to explain Judy’s stance as well, so I wonder. OTOH SoD isn’t hesitating to get into the scientific weeds, in sharp contrast to Judy.

Expanding on that last point, on one level Judy’s failure to venture into the weeds is due to the material under discussion being entirely outside her specialty, but in that case she seems to be making what amounts to an argument from authority. Of course paleo in particular isn’t within Gavin’s expertise either, and yet he’s managed to become reasoanably conversant with at least the HS-related aspects of it. Judy looks lame by contrast.

Like

11. > the polite course of action isn’t to rush into print
> … but to raise it with the original authors. Of
> course if you do that then you don’t get an
> all-important publication point out of it
>(and a comment counts for less than an article, too)

Ya know, there are a lot of older papers out there that would serve as fodder for a publication-coup-mining operation. Think how fast the computers, statistics, and other technology has changed. There must be hundreds of cases where an ambitious but uncreative publication-hungry chap could go look for a change in methods in a series of papers starting with work published a decade or two ago.

Back a decade or two ago, nobody had the time and tools to reanalyze work — and everyone knew that later work would be done better in many ways (and the limits on words/pages/cites were far more severe).

Just think of how many little tweaks, tricks, and other improvements must have been quietly made over 20 years by scientists, that didn’t get explicitly and verbosely described. Why, new software was coming out at an amazing pace and likely did a better job, whether the researchers even knew what bugs got fixed or not.

Seems like those who can’t do new and interesting work will have a decade or two of opportunity to get publications by mining the old work for bits of unexplained improvement that can be reverse-engineered as hidden error and, in this new age of infinite space and time, published or blogged. There ya go, scientific career path.

Like

12. Steve Bloom says:

That’s *Smerdon*, RN; more like an early Cretaceous herbivore than a brand of vodka. 🙂

Putting it as you do, and it looks entirely accurate so far as I can tell, one wonders how this paper ever got published. Is it yet more evidence that the journals (broadly speaking) bend over backwards to publish stuff crtical of the HS?

Like

13. Steve Bloom says:

Hank, looking at his pubs page he doesn’t appear all that disreputable (yet, noting that it’s early in his career). Also, it looks as if his primary motive is to flog his own method, but if so the focus on Mann’s abandoned method is even more strange.

Like

14. Deech56 says:

Smirnoff? Party at R. norvegicus’s house!

Like

15. Dikran Marsupial says:

“Pushing it a bit, I could even call this a “Trick” Don’t tell the fools).”

I recently found “trick” used to refer to some clever mathematical device in, of all places, a Terry Pratchett novel (Monsterous Regiment, page 252, “I believe I can see a number of other little mathematical, ahem, tricks to make the passage of information even swifter, but I’m sure these have already occurred to you.”). No comment to make on the substantive issue of the thread, but it does show, quite nicely, just how foolish it was to make a fuss about the use of the word “trick”, given that this usage was even known to authors of (highly correlated) “comic” novels ;o)

Like

16. Rattus Norvegicus says:

Steve,

Smerdon does seem to be flogging his own method. It is interesting that back in 2007 it was pointed out in a comment by Mann on one of his papers, that the flaws he had found, while valid, had since been corrected. Therefore I really find it interesting that he is getting a second bite at the apple by making the same criticism of the same abandoned method. Of course this is actually like his fifth paper mining this vein. I think he’s found a way to get cheap publication points.

But he doesn’t seem to be too disreputable and actually does seem to be doing actual legitimate research into reconstruction techniques as opposed to McIntyre-esque “auditing”.

Like

17. Chris Winter says:

“[No idea what I meant by “trailing” so I’ve removed that -W]”

OT: It seems perfectly clear to me that this means outdoor perambulation. Call it hiking, trekking, taking the trail, or going for a tramp (as an Irishman might), it all amounts to the same thing.

[In common parlance yes but it is a phrase I’m certain I would never use. I’m still struggling to think what it might be a typo for -W]

Like

18. Martin Vermeer says:

> I assume this is the error you’re discussing? There are others listed as well …

Actually Dhogaza I don’t think so — the Ridge vs. TTLS issue is mentioned in the Comment, but the main issues discussed in the paper are 1) a 180 degree erroneous longitudinal rotation of the model outputs, and 2) the “fuzzing out” of the W. hemisphere by the GMT grid averaging software routine used.

Both errors are somewhat embarrassing; they would have been caught by making the geographical plots Smerdon makes here. Looks to me Rutherford et al. are playing a game of “but you are doing something silly too!”.

> Apparently there’s some history here, though, and it’s not exactly friendly …

You bet. Dhogaza did you notice that there isn’t even the customary “thank you for bringing this to our attention”? Not saints them, Mann and friends. Damn good scientists though.

Like

19. Martin Vermeer says:

> I assume this is the error you’re discussing? There are others listed as well …

Actually Dhogaza I don’t think so — the Ridge vs. TTLS issue is mentioned in the Comment, but the main issues discussed in the paper are 1) a 180 degree erroneous longitudinal rotation of the model outputs, and 2) the “fuzzing out” of the W. hemisphere by the GMT grid averaging software routine used.

Both errors are somewhat embarrassing; they would have been caught by making the geographical plots Smerdon makes here. Looks to me Rutherford et al. are playing a game of “but you are doing something silly too!”.

> Apparently there’s some history here, though, and it’s not exactly friendly …

You bet. Dhogaza did you notice that there isn’t even the customary “thank you for bringing this to our attention”? Not saints them, Mann and friends. Damn good scientists though.

Like

20. dhogaza says:

Actually Dhogaza I don’t think so — the Ridge vs. TTLS issue is mentioned in the Comment, but the main issues discussed in the paper are 1) a 180 degree erroneous longitudinal rotation of the model outputs, and 2) the “fuzzing out” of the W. hemisphere by the GMT grid averaging software routine used.

Instead of guessing I probably should’ve asked TCO to clearly state which problem he was talking about …

Dhogaza did you notice that there isn’t even the customary “thank you for bringing this to our attention”?

I hadn’t, actually, but that’s a very good point. The note’s definitely … frosty!

Like

21. On the whole, in a rational world, having the sort of information that Smerdon provides in the literature would be a good thing, because it would be sure to be picked up by citation data bases such as google scholar and world of science. Anyone looking at the original paper would find the correction and the comment on the correction and have a good idea of what happened, rather than some opaque, we don’t do that crap anymore.

Clearly a different kind of journal/comment server is called for, one that would be indexed by the citation databases

Like

22. No criticism of Dr. Smerdon intended, I was extrapolating from William’s comment about the temptation to get a paper rather than a comment, thinking that there’s a misopportunity to notice.

Think about the pace of change in the tools scientists have used over the past few decades. (Anyone remember 8″ floppies, CP/M, the dot prompt in dBase II, or WordStar’s “Fatal Error F-27, Disk Full” message?)

There will be plenty of problems from the old tools and methods that can be found in older work, by comparing newer work and seeing what differs.

You know the argument in industry that improving a product can be risky, because making the thing _better_ can be read as an admission it had a problem in its early iteration? I’m saying, imagine that becoming a concern for scientists who know that progress is part of doing good science, who for a while left the old stuff behind — on old media, basically not possible to rerun or redo because of the pace of change in the tools.

People who love _doing_ science won’t find anything tempting about that as a lifestyle. Boring! People who care about _screwing_with_ science might be tempted to follow in the tracks of the lawyers using that approach.

Like

23. PolyisTCOandbanned says:

Mike has a tendancy to use rather complex and new methods for a rather tricky problem. The methods have not been fully explained (worse earlier) and have evolved over time. It’s good to be learning and testing new methods, but then there should be more emphasis on the details of the methods and testing them (even with known standards…yes he has done this lately, more), rather than emphasizing the result. I have seen crystallogrophers get into trouble using new mehtods on new structures, especially when the emphasis of the publication was the structure not the method.

Like

24. Martin Vermeer says:

> I find Mike to be pretty darn opaque in correcting himself explicitly.

But TCO, as you see here, they will admit to real errors. What more do you expect? And when they don’t, have you considered that the reason may be that the “error” actually isn’t, or has been adequately considered — as with the Tiljander or r^2 nonsense?

And how fair do you consider it that Smerdon et al. fail to point out that these errors are limited to papers that have since been superceded? Which they very well knew. A collegially written corrigendum would have had all of that in one place. I would be “not amused” too.

Like

25. The problem, such as it is, is that Mann is serial skimpy, something that should not be done with new methods. OTOH, he goes where none have gone before, which is better than OK. He needs a statistician to clean up afterwards.

Like

26. Martin Vermeer says:

Eli, I am not sure that that is quite fair. I am only a simple geodesist (like old Gauss, and several of your Presidents), but I have no great difficulty following what he (and that al. guy) are doing. E.g., in his 2008 paper (like the two papers at issue here) he uses, pretty much vanilla, a method and code developed by Tapio Schneider and widely used elsewhere… if it is wrong, many more folks are in trouble 😉

All the errors so far that I have seen (exempting perhaps the 98/99 years) are not due to getting the stats wrong, but to dumb mistakes in the processing chain — and invariably without much practical consequence, the kind of “luck” that comes with a physics background…

He would need an army of e-bunnies to find these roaches, not the 1.5 grad slave that he has today. And not so much a highly paid statistician either.

Like