Repeatability of Large Computations

Some parts of the discussion of Oh dear, oh dear, oh dear: chaos, weather and climate confuses denialists have turned into discussions of (bit) reproducibility of GCM code. mt has a post on this at P3 which he linked to, and I commented there, but most of the comments continued here. So its worth splitting out into its own thread I think. The comments on this issue on that thread are mostly mt against the world; I’m part of the world, but nonetheless I think its worth discussing.

What is the issue?

The issue (for those not familiar with it, which I think is many. I briefly googled this and the top hit for “bit reproducibility gcm” is my old post so I suspect there isn’t much out there. Do put any useful links into comments. Because the internet is well known to be write-only, and no-one follows links, I’ll repeat and amplify what I said there) is “can large-scale computer runs be (exactly) reproduced”. Without any great loss of generality we can restrict ourselves to climate model runs. Since we know these are based effectively on NWP-type code, and since we know from Lorenz’s work or before that weather is chaotic, we know that means that on every time step, for every important variable, everything needs to be identical down to the very last bit of precision. Which is to say its all-or-nothing: if its not reproducible at every timestep down to the least significant bit, then it completely diverges weatherwise.

I think this can be divided down into a heirarchy of cases:

The same code, on the same (single-processor) machine

Nowadays this is trivial: if you run the same code, you’ll get the same answer (with trivial caveats: if you’ve deliberately included “true” random numbers then it won’t reproduce; if you’ve added pseudo-random numbers from a known seed, then it will). Once upon a time this wasn’t true: it was possible for OSs to dump your code to disk at reduced precision and restore it without telling you; I don’t think that’s true any more.

The (scientifically) same code, on different configurations of multiple processors

This is the “bit reproducibility” I’m familiar with (or was, 5+ years ago). And ter be ‘onest, I’m only familiar with HadXM3 under MPP decomposition. Do let me know if I’m out of date. In this version your run is decomposed, essentially geographically, into N x M blocks and each processor gets a block (how big you can efficiently make N or M depends on the speed of your processor versus the speed of your interconnect; in the cases I recall on our little Beowulf cluster, N=1 and M=2 was best; at the Hadley Center I think N = M = 4 was considered a fair trade-off between speed of completion of the run and efficiency).

Note that the decomposition is (always) on the same physical machine. Its possible to conceive of a physically distributed system; indeed Mechoso et al. 1993 does just that. But AFAIK its a stupid idea and no-one does it; the network latency means your processors would block and the whole thing would be inefficient.

In this version, you need to start worrying about how your code behaves. Suppose you need a global variable, like surface temperature (this isn’t a great example, since in practice nothing depends on global surface temperature, but never mind). Then some processor, say P0, needs to call out to P0..Pn for their average surface temperatures on their own blocks, and (area-)average the result. Of course you see immeadiately that, due to rounding error, this process isn’t bit-reproducible across different decompositions. Indeed, it isn’t necessarily even bit-reproducible across the same decomposition, but with random delays meaning that different processors put in their answers at different times. That would depend on exactly how you wrote your code. But note that all possible answers are scientifically equivalent. They differ only by rounding errors. It makes a difference to the future path of your computation which answer you take, but (as long as you don’t have actual bugs in your code or compiler) it makes no scientific difference.

Having this kind of bit-reproducibility is useful for a number of purposes. If you make a non-scientific change to the code, one which you are sure (in theory) doesn’t affect the computation – say, to the IO efficiency or something – then you can re-run and check this is really true. Or, if you have a bug that causes the model to crash, or behave unphysically, then you can run the code with extra debugging and isolate the problem; this is tricky if the code is non-reproducible and refuses to run down the same path a second time.

Obviously, if you make scientific changes to the code, it can’t be reproducible with code before the change. Indeed, this is practically the defn of a scientific change: something designed to change the output.

The same code, with a different compiler, on the same machine. Or, what amounts to much the same, the same code with “the same” compiler, on a different machine

Not all machines follow the IEEE model (VAXes didn’t, and I’m pretty sure DEC Alpha’s didn’t either). Fairly obviously (without massive effort and slowdown from the compiler) you can’t expect the bitwise same results if you change the hardware fundamentally. Nor would you expect identical results if you run the same code at 32 bit and 64 bit. But two different machines with the same processor, or with different processors nominally implementing IEEE specs, ought to be able to produce the same answers. However, compiler optimisations inevitably sacrifice strict accuracy for speed, and two different compiler vendors will make different choices, so there’s no way you’ll get bit repro between different compilers at anything close to their full optimisation level. Which level you want to run at is a different matter; my recollection is that the Hadley folk did sacrifice a little speed for reproducibility, but on the same hardware.

Does it matter, scientifically?

In my view, no. Indeed, its perhaps best turned round: anything that does depend on exact bit-repro isn’t a scientific question.

Why bit-repro doesn’t really matter scientifically

When we’re running a GCM for climate purposes, we’re interested in the climate. Which is the statistics of weather. And a stable climate – which is a scientifically reliable result – means that you’ve averaged out the bit-repro problems. If you did the same run again, in a non-bit-repro manner, you’d get the same (e.g.) average surface temperature, plus or minus a small amount to be determined by the statistics of how long you’ve done the run for. Which may require a small amount of trickery if you’re doing a time-dependent run and are interested in the results in 2100, but never mind.

Similarly, if you’re doing an NWP run where you do really care about the actual trajectory and are trying to model the real weather, you still don’t care about bit-repro, because if errors down at the least-significant-bit level have expanded far enough to be showing measureable differences, then the inevitable errors in your initial conditions, which in any imaginable world are far far larger, have expanded too.

Related to this is the issue people sometimes bring up about being able to (bit?) reproduce the code by independent people starting from just the scientific description in the papers. But this is a joke. You couldn’t get close. Certainly not to bit-repro. In the case of a very very well documented GCM you might manage to get close to climate-reproducibility, but I rather doubt any current model comes up to this kind of documentation spec.

[Update: Jules, correctly, chides me for failing to mention GMD (the famous journal, Geoscientific Model Development) the goal is what we call “scientific reproducibility”.]

Let’s look at some issues mt has raised

mt wrote There are good scientific reasons for bit-for-bit reproducibility but didn’t, in my view, provide convincing arguments. He provided a number of practical arguments, but that’s a different matter.

1. A computation made only a decade ago on the top performing machines is in practice impossible to repeat bit-for-bit on any machines being maintained today. I don’t think this is a scientific issue, its a practical one. But if we wanted to re-run, say, the Hansen ’88 runs that people talk about a lot then we could run them today, on different hardware and with, say, HadXM3 instead. And we’d get different answers, in detail, and probably on the large-scale too. But that difference would be a matter for studying differences between the models – an interesting subject in itself, but more a matter of computational science than atmospheric science. Though in the process you might discover what key differences in the coding choices lead to divergences, which might well teach you something about important processes in atmospheric physics.

2. What’s more, since climate models in particular have a very interesting sensitivity to initial conditions, it is very difficult to determine if a recomputation is actually a realization of the same system, or whether a bug has been introduced. Since this is talking about bugs its computational, not scientific. Note that most computer code can be expected to have bugs somewhere; it would be astonishing of the GCM codes are entirely bug-free. Correcting those bugs would introduce non-bit-repro, but (unless the bugs are important) that wouldn’t much matter. So, to directly address one issue raised by The Recomputation Manifesto that mt points to: The result is inevitable: experimental results enter the literature which are just wrong. I don’t mean that the results don’t generalise. I mean that an algorithm which was claimed to do something just does not do that thing: for example, if the original implementation was bugged and was in fact a different algorithm. I don’t think that’s true; or rather, that it fails to distinguish between trivial and important bugs. Important bugs are bugs, regardless of the bit-repro issue. Trivial bugs (ones that lead, like non-bit-repro, to models with the same climate) don’t really matter. TRM is very much a computational scientist’s viewpoint, not an atmospheric scientist’s.

3. refactoring. Perhaps you want to rework some ugly code into elegant and maintainable form. Its a lot easier to test that you’ve done this right if the new and old are bit-repro. But again, its coding not science.

4. If you seek to extend an ensemble but the platform changes out from under you, you want to ensure that you are running the same dynamics. It is quite conceivable that you aren’t. There’s a notorious example of a version of the Intel Fortran compiler that makes a version of CCM produce an ice age, perhaps apocryphal, but the issue is serious enough to worry about. This comes closest to being a real issue, but my answer is the section “Why bit-repro doesn’t really matter scientifically”. If you port your model to a new platform, then you need to perform long control runs and check that its (climatologically) identical. It would certainly be naive to swap platform (platform here can be hardware, or compiler, or both) and just assume all was going to be well. If there is an Intel fcc that makes CCM produce an ice age, then that is a bug: either in the model, or the compiler, or some associated libraries. Its not a bit-repro issue (obviously; because it produces a real and obvious climatological difference).

Some issues that aren’t issues

A few things have come up, either here or in the original lamentable WUWT post, that are irrelevant. So we may as well mark them as such:

1. Moving to 32 / 64 / 128 bit precision. This makes no fundamental difference, it just shifts the size of the initial bit differences, but since this is weather / climate, any bit differences inevitably grow to macroid dimensions.

2. Involving numerical analysis folk. I’ve seen it suggested that the fundamental problem is one with the algorithms; or with the way those are turned into code. Just as in point 1, this is fundamentally irrelevant to this point. But, FWIW, the Hadley Centre (and, I assume, any other GCM builder worth their salt) have plenty of people who understand NA in depth.

3. These issues are new and exciting. No, these issues are old and well known. If not to you :-).

4. Climate is chaotic. No, weather is chaotic. Climate isn’t (probably).

Some very very stupid or ignorant comments from WUWT

Presented (almost) without further analysis. If you think any of these are useful, you’re lost. But if you think any of these are sane and you’re actually interested in having it explained why they are hopelessly wrong, do please ask in the comments.

1. Ingvar Engelbrecht says: July 27, 2013 at 11:59 am I have been a programmer since 1968 and I am still working. I have been programming in many different areas including forecasting. If I have undestood this correctly this type of forecasting is architected so that forecastin day N is built on results obtained for day N – 1. If that is the case I would say that its meaningless.

2. Frank K. says: July 27, 2013 at 12:16 pm … “They follow patterns of synthetic weather”?? REALLY? Could you expand on that?? I have NEVER heard that one before…

3. DirkH says: July 27, 2013 at 12:21 pm … mathematical definition of chaos as used by chaos theory is that a system is chaotic IFF its simulation on a finite resolution iterative model…

4. ikh says: July 27, 2013 at 1:57 pm I am absolutely flabbergasted !!! This is a novice programming error. Not only that, but they did not even test their software for this very well known problem. Software Engineers avoid floating point numbers like the plague…

5. Pointman says: July 27, 2013 at 2:19 pm Non-linear complex systems such as climate are by their very nature chaotic… (to be fair, this is merely wrong, not stupid)

6. Jimmy Haigh says: July 27, 2013 at 3:25 pm… Are the rounding errors always made to the high side?

7. RoyFOMR says: July 27, 2013 at 3:25 pm… Thank you Anthony and all those who contribute (for better or for worse) to demonstrate the future of learning and enquiry.

8. ROM says: July 27, 2013 at 8:38 pm… And I may be wrong but through this whole post and particularly the very illuminating comments section nary a climate scientist or climate modeler was to be seen or heard from. (He’s missed Nick Stokes’ valuable comments; and of course AW has banned most people who know what they’re talking about)

9. PaulM says: July 28, 2013 at 2:57 am This error wouldn’t be possible outside of academia. In the real world it is important that the results are correct so we write lots of unit tests. (Speaking as a professional software engineer, I can assure you that this is drivel).

10. Mark says: July 28, 2013 at 4:48 am Dennis Ray Wingo says: Why in the bloody hell are they just figuring this out? (They aren’t. Its been known for ages. The only people new to this are the Watties).

11. Mark Negovan says: July 28, 2013 at 6:03 am… THIS IS THE ACHILLES HEAL OF GCMs. (Sorry, was going to stop at 10, but couldn’t resist).

Refs

* Consistency of Floating-Point Results using the Intel® Compiler or Why doesn’t my application always give the same answer? Dr. Martyn J. Corden and David Kreitzer, Software Services Group, Intel Corporation

PRISM: any substance?

So the world is desperately excited by a programme called “PRISM”, and we learn that – shockingly – the NSA reads people’s emails. Can that possibly be true? Hard to believe, I realise, but stay with me.

The National Security Agency has obtained direct access to the systems of Google, Facebook, Apple and other US internet giants, according to a top secret document obtained by the Guardian

sez the Graun, and the WaPo says much the same (Update: care! See below). But Google says they’re wrong:

we have not joined any program that would give the U.S. government—or any other government—direct access to our servers. Indeed, the U.S. government does not have direct access or a “back door” to the information stored in our data centers.

Early Warning, who is usually sensible, says Google is lying. But I tend to trust Google, certainly more than I’d trust the Graun or WaPo to understand tech. EW’s belief that Google is lying appears to stem from the US Govt confirming the existence of PRISM: but its an awfully long way from “existence” to “details of the story are correct”. And indeed the US have said explicitly that details are wrong.

I can’t tell where the truth lies, but I suspect that the Graun has indulged in what Wiki would call “Original Research”, which is to say connecting the dots a bit further than the sources permit. This is the key slide, and the key words are “Collection directly from the servers of…”. Weeell, its only a powerpoint slide, hardly a careful analysis. It looks like the real meaning of “directly from the servers of” is actually “we put in requests, following the law, and they comply with that law by providing data”. Which is a very different thing to direct access. The former is known and boring (even if you don’t like it); the latter would be new. The Graun knows about the distinction and is definitely claiming the latter (they have to be, otherwise there is no story): Companies are legally obliged to comply with requests for users’ communications under US law, but the Prism program allows the intelligence services direct access to the companies’ servers.

Another thing that suggests strongly to me that this is only an analysis-of-received-data type operation is the price tag: $20M/y. That doesn’t sound like the kind of money to fund searching through all of even just Google’s vast hoards of data, let alone all the rest.

If you wanted a conspiracy theory, the one I’d offer would be that this is to deflect attention from the “Verizon revelation” about the phone records. You get people wildly excited about direct access, based on some ambiguous slides. That all turns out to be nonsense, and so people then start waving all the rest away.

[Update: According to Business Insider the WaPo has modified and weakened its story somewhat. It does indeed say “updated”, though not in what way. I did like BI’s “Many have questioned other aspects of the revelations, such as the amateurish appearance of the slides (though they are believable to those with government experience)”.]

[UUpdate: there is a US govt factsheet. Some of it is potentially weaselly Under Section 702 of FISA, the United States Government does not… – yeah, but what about things *not* done under section 702? However, it does make some direct positive statements PRISM is not an undisclosed collection or data mining program. It is an internal government computer system used to facilitate the government’s statutorily authorized collection of foreign intelligence information from electronic communication service providers… So it looks more and more to me as though either the US govt, and Google, are lying to us directly; or (far more likely) the Graun and WaPo are wrong.]

[UUUpdate: the Graun sez Technology giants struggle to maintain credibility over NSA Prism surveillance. The substance is the same: Graun makes claims, the companies say they’re wrong, and the Graun has no evidence. The institution that is leaking credibility is the Graun, not the companies.

And: just when you thought they couldn’t lose the plot any more, we have them calling this the biggest intelligence leak in the NSA’s history. That’s twaddle. So far, this is nothing: they have no substance.]

[UUUUpdate: at last, the dog that didn’t bark in the night speaks, though softly. Bruce Schneier, who I’d have hoped would be on top of this, has some stuff to say. He praises whistleblowers in general; I agree. But he only talks about PRISM in an afterword, and its pretty clear that he doesn’t know what is going on either. He praises Edward Snowden but I think that is premature – some of the stuff the Graun has him saying makes him sound rather tin-foil-hat to me.]

[Late update: the Graun has now admitted that the original story as wrong, although to their discredit only by implication. They were no honest enough to publish an upfront correction – or, in other words, they are simply dishonest.

Kevin Drum points out that the Graun was mislead by the words “direct access” in the original powerpoint -and makes the obvious point (that I’ve though of, but not written down): why didn’t Snowden tell the Graun this? Its hard to think of a reason that rebounds to his credit. the most obvious are (a) he’s clueless, or (b) he knew that with that error corrected, the powerpoint was dull. Its not possible that it was an oversight, since the Graun talked to him *after* the story was public, and this was a major point.

More: The Graun (or is it just Glenn Greenwald?) is claiming total accuracy and no backpedalling. Read his point (4). How odd.]

Much later: even though the “direct access” claim has been thoroughly refuted, the Graun is still peddling this crap on Friday 12 July 2013. Have they no shame?

Refs

* NSA admits it created internet so it could spy on it
* Google’s Real Secret Spy Program? Secure FTP

Book of the New Sun

aldrin Gene Wolfe, Book of the New Sun:

The picture he was cleaning showed an armored figure standing in a desolate landscape. It had no weapon, but held a staff bearing a strange, stiff banner. The visor of this figure’s helmet was entirely of gold, without eye slits or ventilation; in its polished surface the deathly desert could be seen in reflection, and nothing more.

(I remembered this roughly, but the exact text is from here. The picture I nicked and cropped doesn’t match this description; I don’t know if there is one that does).

Ultimately, the Apollo programme was rather pointless, a dead end. It must have required great courage to trust in the lunar lander and return system. And the entire thing was of great grandeur, yes, and inspiring to many of course, and produced some unforgettable images. And text. But the sane consequent was robot exploration, and even that (e.g. Curiosity) lacks vision in a way (“What shall we do next?” “Oh, I dunno, how about we just dump something bigger down on Mars?” “I suppose it’ll have to do”). The path forwards must be making it self-sustaining, which I think points towards comet or asteroid mining or the like.

[Update: http://upload.wikimedia.org/wikipedia/commons/0/09/Apollo_14_Shepard.jpg might be the image that Wolfe had in mind, though it is too cluttered -W]

Strange days indeed

Congratulations to SpaceX, who have connected their Dragon to the ISS.

dragon-msnbc

[That’s a screen-grab, BTW, not a clickable video. Go to msnbc for video.]

That isn’t what I find so strange, though it is potentially the start of a big exciting Newe Worlde.

What was so strange, so bizarre, was the mixture of the real-time video from the ISS with the Dragon capsule on the end of the robot arm with the world turning underneath it oh so beautiful and delicate, and all flung carelessly out onto the web for anyone who wanted to watch; with the stupid irritating Pringles advert I was forced to sit though for ten seconds before watching the video.

Refs

* The Lesson of SpaceX’s Dragon by David Appell

Imagine a World Without Free Knowledge?

wiki-sopa

“Learn more” is http://en.wikipedia.org/wiki/Wikipedia:SOPA_initiative/Learn_more: SOPA and PIPA are just indicators of a much broader problem. We are already seeing big media calling us names. In many jurisdictions around the world, we’re seeing the development of legislation that prioritizes overly-broad copyright enforcement laws, laws promoted by power players, over the preservation of individual civil liberties. We want the Internet to be free and open, everywhere, for everyone.

Refs

* Google
* Beeb
* Wikipedia blackout forces students to copy from printed ‘hardcopy websites’

Registration

I’ve just tried to turn on registration, to deal with spam. It is probably doomed. Please try to leave a comment on this post letting me know how it worked. If it just totally f*cks up, then email me (wmconnolley (at) gmail.com). If just-post fails, try previewing first.

OK, it is totally f*ck*d. Thanks for your emails. I’ll turn it all off now. My apologies.

I seem to have left approve-all-comments turned on. I’m going to leave that, for at least a bit.

Note that according to the settings, any “authenticated” commenter doesn’t need approving.

[Most amusing failure email: “For some reason it thinks I am from Finland… and I dont understand Finnish”. For those who said: I couldn’t see any way to register: yes, I noticed that, I assumed it would work, somehow, for you lot.]

Recycling old posts

* We don’t even know how many legs he’s got