If You Publish Enough Results, Eventually One Will be the BEST

A recent post at Bishop Hill drew my attention to the fact the Berkeley Earth temperature group (BEST) has published a document giving their results for the year 2014. They say:

The global surface temperature average (land and sea) for 2014 was nominally the warmest since the global instrumental record began in 1850; however, within the margin of error, it is tied with 2005 and 2010 and so we can’t be certain it set a new record.

Out of habit, I decided to verify this by checking their data. I went to their data page and clicked on "Summary Page" in their All Land category. There, I found this graph:


They have another reconstruction including ocean data. I haven't looked into that one yet since it is relatively new and isn't comparable to their previous results (and has never been published in any literature). Given those confounding factors I figured I'd just look at the land record as a simple check. As long as nothing seemed out of place, I'd stop there.

I found something out of place as soon as I opened the data file for the graph. I wanted to just plot the graph myself and verify it was accurate. When I did, I immediately saw the line:

This analysis was run on 12-Oct-2013 00:45:15

BEST just told everybody 2014 is tied for the hottest year in its record. It is not okay to do that when showing people work done in 2013. Results which end in September 2013 cannot back up conclusions about how hot 2014 was.

Perturbed, I went to their Findings Page, clicked on Summary of Finding and checked the graphs there. I recognized them as the same graphs BEST has had for their summary since 2012. About to give up, I saw a link under one of them said, "More recent data." I checked the other images, and it turns out all four images link to the same two data files. Confusingly, neither data file actually contains the data shown in two of the four images. Regardless, the "More recent data" file contained the line:

This analysis was run on 21-Aug-2014 03:12:47

Which is more recent than the data file I found on the summary page linked to on the Data page. It's not clear to me why both of these files are available on the website at the same time in different locations. Confused, I went back to the Data page and found the newer results by clicking on the link directly below the one I had used before.

That means underneath "Summary Page" is a line which says "Monthly Average Temperature" that uses different results, and neither link goes to results which support BEST's claim 2014 is tied for the warmest year in their record. I find that incredibly baffling. I was under the impression when you publish results for people to see, you should publish the data used to come up with those results.

But what I find more baffling is BEST's decisions regarding what results files to link to. I don't understand posting images with links to data files that don't contain the data shown in the images. I don't understand having linking to two different sets of results on the very same page with no warning, much less explanation, of the difference.

But what I really don't understand is why these different results files can change without any record being kept. You see, I happen to have kept copies of results files for BEST's global temperature record. I did it for convenience as I wanted to have the data on my computer. When I looked at the previous copies, I found I now have four different versions. Well, I actually have five. There were two versions of their global land record published on their website at the same time, both said to have been calculated at the same time. They're not identical, but they're similar enough I won't count them separately.

The other four are not so similar. To show this, I decided to create a .gif showing all four (with a 10 year running average applied):


The differences are significant. They are also largely limited to the past. One could easily think this means they aren't relevant to BEST's claims about record temperatures. One would be wrong. BEST, like all temperature constructions, picks a time period where it "aligns" its data. The effect is to make all of its data match as best it can in that period.

That inevitably makes the data match better in the calibration period than anywhere else. It also means changes in its data will tend to manifest more greatly outside the calibration period than inside it. Since BEST calibrates its data to a modern period, the past will necessarily vary more from version to version than the present will.

That gives the impression our certainty of recent temperatures is greater than it actually is. The reality is the choice of calibration period is largely arbitrary. Were a different one used, modern temperatures might have varied more from version to version.

Incidentally, BEST claims:

Numerically, our best estimate for the global temperature of 2014 puts it slightly above (by 0.01 C) that of the next warmest year (2010) but by much less than the margin of uncertainty (0.05 C). Therefore it is impossible to conclude from our analysis which of 2014, 2010 or 2005 was actually the warmest year.

It's difficult to see why anyone would take uncertainty estimates calculated to the hundredth of a degree seriously given the differences in the various results BEST has published. BEST has never compared the various results its published, but if it had, it'd find the different versions often don't lie within each other's uncertainty ranges in modern times.

(Individual graphs for each version can be found here, here, here and here.)

January 1st, 4:16 AM Edit: I've confirmed there is at least one more version of the BEST global land record which preceded any of the ones I've shown in this post. I'll have to see if I can dig up a copy of it somewhere.


  1. I pointed out on Lucia's blog in response to her comment that "2014 is the warmest year on record"

    Sou didn’t say that, and neither did Nick, but anyway, it’s a record for now. Remember we’ve seen before 0.02 differences get erased in the reconstructions as more data comes in.
    This isn’t like reading a single thermometer, it’s the output from a complex set of algorithms, whose value changes as we get more data, and whose value changes as the algorithm gets tweaked. So just because there’s a 0.02 difference right now, doesn’t mean the difference will stay positive.

    this is apropos of what you were talking about too. I went back and looked at my GISTEMP archives, and I was able to find a number of years of data. I've placed the archive of annual temperatures here.

    Variations in as much as 0.08° in the last decade are seen. So even as a "warmest year on record" the meme fails.

  2. It turns out I've been able to find five different versions online thanks to web archives. One is the preliminary results used by WFT, and the other four are the ones shown in this post. That gives me some hope that I may have copies of all their published global results. Sadly, there are hundreds of thousands of other results files I didn't copy and don't appear to have been archived. (BEST representative Steven Mosher suggests it was my responsibility to archive all these files since I'm concerned about this issue, but that's insane.)

    Anyway, I'll try to get links posted for them in a few hours. I've only had access to a phone for the last day or so, and collecting the links on it would be a huge chore.

  3. Brandon, to be fair, GISTEMP doesn't archive their results either. They also now block webarchive from even archiving the files.

    Transparent…. like a brick wall.

  4. Carrick, that's a weak defense though. The stated reason for BEST's creation was concerns about how the existing groups were getting their results, including the criticism they weren't open. BEST repeatedly promotes itself as open and transparent, wanting credit for it allowing anyone to verify its work. It's front page even says:


    We continue to lower the barriers to entry into climate science by posting all our raw data and our analysis code online to provide an open platform for further analysis. We also post all our Berkeley Earth papers, memos, graphics and analysis code.

    I'm not sure how BEST believes it is "transparent" when it routinely changes its results without any sort of record. Heck, BEST made a major change to its (pre-Kriging) methodology a while back, and it didn't bother to make a note of it. Even worse, their website still described the old methodology. It was only after I repeatedly pointed out the contradictions in its papers/web site/public statements they updated the site to reflect the change in methodology. And in an example of full transparency, when they finally updated the site, they didn't bother to note there had been a change. They didn't even bother to inform me of it. I have a mention on their acknowledgments page, but I have no idea what for because nobody from BEST has ever acknowledged any points I made (at least, not in public or in communication with me), even after making multiple changes because of them. Actually, I take that back. Steven Mosher did acknowledge I was right when I pointed out one link on their site was broken.

    Anyway, I'm going to have to take a little longer to post links to the various versions than I thought. When going to grab the links for the versions I showed in this post, I realized there are more versions than I thought. When checking the files on my computer, I realized there was a different URL used for global land results I had forgotten about. I had one version from it stored on my computer, and I found two more in web archives. That puts the current list up to eight different versions (excluding one which was a near-identical copy of another).

    I'm going to do a little more checking to see if I can find any other versions. I'm also going to see about rewriting the code I used to plot the graphs. My laptop died due to the power cord coming unplugged (the thing is old, and the battery is dead) so I lost the previous work. It's not a big deal, but it adds some tedium.

    On the bright side, I remember there was a strange quirk in the ACF of the BEST series at one point. There has always been a clear seasonal pattern in the BEST record, but it hasn't always manifested in the same way. Whether it was present/absent in the temperatures/uncertainty has been somewhat inconsistent. With how many versions I've found, I might be able to track down when it changed (and in what ways).

  5. Oh, I meant to point out a potential issue in the BEST uncertainty calculations. BEST gets its uncertainty calculations by using a jackknife approach to see how much variance there is by using only subsets of their data. This, of course, wouldn't account for any systematic effects created by their methodology.

    There's a different issue though. There are a number of steps in the BEST methodology. It's not clear to me BEST reruns all of them when using its jackknife approach. It's possible BEST only reruns the Kriging step. If so, that's a serious problem. The steps where BEST calculates the climate field and the breakpoints are significant sources of uncertainty even if they don't introduce any systematic effects (they do).

    I'd like to assume they rerun the full methodology when calculating uncertainties, but their descriptions of their uncertainty calculations don't make it clear they do.

  6. Slight correction to my previous post. There aren't quite as many versions as I thought (at least, not archived ones). I double counted a couple of the versions I used in this post because the post's listed date is the last month of the record. This made me miss some overlap as I only looked at date in the files listing when the analysis was run.

    With that correct, I found six versions from the BEST website, each going up to the listed month:

    November 2011
    July 2012
    March 2013
    September 2013
    December 2013
    July 2014

    The first five are archived versions. The sixth is the current version. I assume it will change again at some point in the future. If so, it'll need to be replaced with a link to the archived version.

    There's also a version available via WoodForTrees. That one is a preliminary result, and I didn't look for it on BEST's website. It might be available somewhere in the same format as the other files. I don't know offhand.

    What I do know is this adds up to (at least) seven different versions of the BEST global land record, none of which are offered for examination by BEST so people can see how their results have changed. The closest BEST comes to acknowledging/explaining the differences between them is this statement:

    Berkeley Earth Analysis Change Log

    Since its publication in early 2013, the code base continues to be refined and improved. The code provides the best documentation of the changes that have been made. A change log summarizing the major changes will be available shortly.

    Which has been on their website for over a year now (it was added mid-2013). I'm not sure what "shortly" means to BEST, but I'm pretty sure most people don't interpret it as, "In over a year."

    I'm also not sure how BEST thinks its "code provides the best documentation of the changes that have been made." It's not like they use some code repository which shows the changes they've made. At best, all a person can do is find the most recent version of the BEST code and try to compare it to any previous versions that might be available.

    As it stands, the BEST code repository (username "installer," password "temperature") says it is on "Revision 1153." I don't know what revision number they were on when they published their various papers. I certainly don't know what revision number they were on when they published each of the seven sets of results I listed above.

    If the SVN code repository is built nightly, I could probably back-calculate the revision number for any particular date, but even if I could do that, as far as I know there's no way to retrieve old versions of the code. How a person is supposed to use the code to document changes between versions is beyond me when we can't access the code used for those versions.

  7. It took me a few days to get through will this but here's my analysis for BEST:


    These are annual averaged data. Each line represents a different year that was picked, for instance 1995 is blue.

    What is shown is how the "global mean temperature on record" of that year varies with reconstruction data.

    The vertical axis is the temperature for a given year as a function of the year of the reconstruction minus the temperature of the most recent reconstruction (July 2014).

    So over time we are seeing a variation of about -0.03° to +0.04°C.

    The ordering of the warmest year also changes over time. Here's the top five by year of reconstruction:

    11/2011 2010 2007 2005 1998 2002
    07/2012 2007 2010 2005 2002 1998
    03/2013 2007 2013 2005 2010 2002
    09/2013 2007 2005 2010 2013 2002
    12/2013 2007 2005 2010 2002 2013
    07/2014 2007 2005 2010 2013 2002

    As you can see there is considerable volatility to these numbers.

  8. Carrick, that's interesting. I'm curious about something you didn't include. The baseline temperature used by BEST changes from version to version. I'm busy and out of the house for most of today so I can't check the numbers myself, but if I remember right, it changes by several tenths of a degree. That shouldn't affect intra-version records, but it would increase how much variance there is between versions' absolute temperatures.

    I'm curious how much we should care about that. I get baselines are usually ignored, but if your baseline can be off by two tenths of a degree or more, it's hard for me to understand how your uncertainty level can be as small as five hundredths of a degree. How can you know your intra-year variability better than your baseline?

  9. Oh, I should point out I probably won't be able to spend much time examining stuff involving BEST for the next week or two. I've slacked off way too much on writing the second part of my overview of the hockey stick controversy, and yesterday I realized I never wrote my submission to the IPCC about the issues I've discussed with its WGII report. I'm going to try to focus on those. Or at least, on the eBook. My self-imposed deadline for it is the end of this month!

    I think that's a problem which comes with all this being basically a hobby. Without actual deadlines, I find it too easy to slack off.

  10. I won't harass you with this then. I've got two papers that I've promised my boss. One of them will be a solo paper, which PopTech some how things is evidence of transcendence. >.<

  11. Carrick, it's no problem either way. Discussing things is one thing. Digging through people's code can take some time. I'd like to examine the code BEST uses to calculate its uncertainty levels, but I don't want to get bogged down with it right now.

    Of course, if you need an excuse to work on other things, this could be a good one!

  12. Apparently blogger Anders wrote a post responding to an article someone wrote about temperature adjustments being bad or some such. I didn't look at it, and I don't really care, but it led to Shub Niggurath and Steven Mosher talking about BEST, and Mosher made an amazing remark:

    so its a fundamental misunderstanding to assert that temperatures from one location are SMEARED into another location.

    I can't tell if Mosher is just playing stupid, or if he really believes the comments he makes defending BEST.

  13. Upthread I said:

    There’s a different issue though. There are a number of steps in the BEST methodology. It’s not clear to me BEST reruns all of them when using its jackknife approach. It’s possible BEST only reruns the Kriging step. If so, that’s a serious problem. The steps where BEST calculates the climate field and the breakpoints are significant sources of uncertainty even if they don’t introduce any systematic effects (they do).

    I can now confirm the BEST code recalculates the climate field when using its jackknife approach. It does not, however, recalculate breakpoints. That means their jackknife calculations are performed on a data set which has effectively already undergone homogenization. That means they're not removing 1/8th of their stations (in each iteration) like they claim as those stations still affect the results by way of the breakpoint calculations.

    I have a lot more to say about this, but I'll probably wait until I write a post.

Leave a Reply

Your email address will not be published. Required fields are marked *