An Interesting Update to BEST's Standards

A couple weeks ago I suggested it is peculiar for groups like Berkeley Earth (BEST) to make claims about what years were or were not the hottest in the temperature record. Each time they update their results, values for previous months change, sometimes by more than the stated uncertainty in their results. My view was uncertainty levels that can't handle changes between versions of a data set should be taken with a grain of salt.

There's a more interesting issue though. That post showed there are at least four different sets of results given by BEST calculations. In a later comment, I updated that number to seven. There are at least seven different sets of results published by BEST, and BEST hasn't archived a single one of them. BEST hasn't done a single thing to allow BEST to compare one version to another.

Yesterday, I discovered this problem goes even further. It turns out there are currently results from three different sets of calculations published on the BEST website. They're all published alongside one another as though they represent the same thing.

It is weird to publish results from different calculations as though they are all taken from the same calculations. The basic idea of version control is to make it clear what results go with what data and code. Failing that, you should at least make it clear what results go with what results. You shouldn't publish a mishmash of results from diffent calculations as though they're all from the same set of calculations.

BEST was created to address concerns people had raised about the global temperature record. One of those concerns was a lack of meaningful version control. I've long suggested BEST should address that concern. BEST has dismissed that idea. When I raised the issue just last month, BEST spokesperon Steven Mosher said:

If you want to keep a history of changes its simple. just do it.
you dont want to keep a history. I judge your intentions by your behavior. Your behavior says you dont care. I judge your actions, not your words.

Here we have a BEST spokesperson telling me it is my responsibility to keep track of the changes BEST makes. BEST creates tens of thousands of different files. It doesn't tell people when it is going to make changes to them. It doesn't even tell people when those files have changed. It would be an enormous task for a person outside of BEST to make a full archive of BEST's files, much less to keep track of when they change.

But even if that weren't true, why should it be up to people not associated with BEST to do basic version control for BEST? Isn't that something BEST ought to be doing? It turns out, BEST thinks so. Or at least, it did three years ago. BEST prepared a slideshow for a presentation at the American Geophysical Union annual meeting, one of the largest science conferences in the world:

2-3-BEST-AGU

One of the slides in it shows BEST agreed with my view:

2-3-BEST-Standards

Three years ago, BEST said it should "Subject the whole thing to version control so users can see how it changes over time." They thought users should be able to see how their methodology and results change over time. It was even part of their promotional strategy for their work.

Now, having never done anything to deliver on this idea, BEST members tell people if they want any sort of version control, they should do it themselves. It's remarkable how their standards have changed.

14 comments

  1. It is a sorry state of affairs. Well done for tracking it. I guess you will get the usual climate science brush off for trying to hold them to account. It is a shame. It should not be hard to be better than Jones and Mann.

  2. Diogenes, thanks. I've pretty much given up on holding BEST to any sort of meaningful standards. I hadn't expected to write anymore posts about it because it seems such a foregone conclusion. I just happened to check their SVN and found changes had been made to what documents were available, poked around in there and saw this Powerpoint presentation. I couldn't resist after that. It's too remarkable BEST began by planning to do exactly what I want them to do but now scoffs at those who suggest they do it.

    The sad thing is this has a lot of parallels with Cook et al. You may remember I showed when Dana Nuccitelli came up with the rating categories they used, he said they should do multiple comparisons between different categories to establish the strength of different consensus positions. Had they done so, they would have been clear about the distinction between humans contributing to global warming and humans being responsible for most of it. That was their plan when they began. It was only after they got their results they changed that plan. Even worse, they portrayed anyone who advanced the same sort of comparisons they had planned to use as dishonest.

    BEST planned to subject their work to version control. I say BEST should subject its work to version control, the same as they had planned to do at the start. BEST responded by calling me dishonest. BEST may not be as bad as Skeptical Science, but the parallel here is disturbing.

    On a different note, does anyone know if BEST ever tells people just how it estimates regional fields when comparing stations to determine their "empirical breakpoints"? I was surprised when I saw their code compares stations not by their correlations, but by the correlations of their first derivatives. I don't remember that ever being mentioned in their descriptions of their methodology. Did I overlook it, or did they just not bother to mention that part?

  3. If they are NOT doing version control, and your article suggests they are doing NONE, then their results can not be trusted at all. It actually means that they can not tell INTERNALLY assure which version of data, calculations/programs were run at any given time. It means their test are potentially wrong, their various production runs are potentially wrong, and every report is potentially wrong. And, they will never know, nor will they be able to correct it!

    Version control is part of the Software Configuration Management function. Even Wiki defines it:
    "In software engineering, software configuration management (SCM)[1] is the task of tracking and controlling changes in the software,(let me add data in all its versions) part of the larger cross-discipline field of configuration management.[2] SCM practices include revision control and the establishment of baselines. If something goes wrong, SCM can determine what was changed and who changed it. If a configuration is working well, SCM can determine how to replicate it across many hosts." (My add)

    Without version control there is no system! They would not be able to ACCURATELY describe which version of the software was implemented against which version of the data. What they have is a set of uncontrolled, ever changing implementations that are unreplicable on a day to day basis.

  4. CoRev, to be fair, I can't speak to whether or not BEST actually does version control. All I can speak to is what they do publicly. We know BEST does things privately they don't tell anyone about (such as testing the effects of problems they don't even disclose the existence of). It could be version control is one of those things. There's no way to tell.

    If they do do version control privately, the fact they keep it private is a problem. If they don't do version control at all, that's an even bigger problem. Either way, there is a problem. The question is just how serious a problem it is.

  5. CoRev, they use subversion for version control of the software, but subversion does not retain the last-modified date and they do not use version keywords to allow one to figure out "which version of the code" a particular file is.

    You basically have to diff it against other commits to figure out what's changed, if anything. As they use it, you can get a copy of the latest version of the code, but there's little to no documentation for how this is different than other versions. ,

    As far as I can tell, they do not archive the data version or the results, which is the bigger problem, since the data that come from GHCN and other sources (even for a previous year) change as more stations either report or are converted into digital form (in the case of hand-written METAR reports).

  6. Carrick, I'm beginning to believe they do almost no version control. I'm sure they think they are doing a good job of controlling their Software and data, but if we could look at their processes I am just as sure there is a lot of churning going on trying to rebuild lost data and S/W versions during testing. With the pressure to produce monthly in competition with the other reporting organizations, those last days must be chaotic. In the end, the first steps each month are probably to last months errors. Improvement is 2nd and 3rd priorities.

    Maybe Zeke and/or Mosher could refute or confirm.

  7. I don't think BEST reruns its calculations every month. I know they talked about how they'd like to at one point, but I don't think that ever materialized.

    As for their SVN, I probably ought to give BEST a bit more credit than I have. It is possible to figure out what changes have been made to their code thanks to their SVN. The problem arises with their data and results. The SVN doesn't archive BEST's results. There's no record of how often they've run their calculations or what the results have been when they did. I believe there are results corresponding to the results published in (some of?) their papers stored on the SVN, but the results they've published to their website are not stored. Similarly, the results they used in their recent report about whether or not 2014 was the hottest year have never been published anywhere, much less archived so people can compare them to previous results.

    On a related note, it's not clear to me they archive the data as used in their calculations. I believe their SVN archives all the "raw" data they begin with, but I can't find anything beyond that. In theory I think we could dig through BEST's SVN to tell what code and data were in use at any given point then try to rerun their entire process (including merging and filtering the raw data) in the hopes of replicating the results they didn't bother to archive. It would be an enormous task though, and we wouldn't be able to confirm our results since we'd have nothing to compare them to.

    It's not completely terrible, but it is pretty bad. For instance, suppose someone wants to check one little thing, like where the data they see for a station on the BEST website came from? They might have to download ten or more different data sets, all taken from different sources, and try to find that single station in each. Having done so, they'd then have to try to figure out how the different records in those files were combined to create what BEST uses. At this point, they might discover the results don't match. Looking at the data file, they might see the file was created a year ago. With that knowledge, they could dig through the BEST SVN to find commits for each of the ten data sets as of a particular date of last year. If they still couldn't get the results they found in the data file they found on BEST's website, they could dig through BEST's SVN to find any code commits which might have changed how the data was processed. After all that work, they might get fed up and decide to take a break. Two weeks later they might revisit the problem only to discover the data file they had been examining on the BEST website has vanished and been replaced by a new one because BEST did a stealth update to their website.

    Of course, this assumes they know as much about BEST as I do. If not, they might well start with the data page on the BEST web site and see it has multiple intermediary data sets created while BEST does I/O process on the "raw" data. After downloading at least one 500MB file, and possibly two others (and as many as nine 200 MB files), they'd likely spend a while figuring out how to parse those files to find the station records they are interested in. Having done so, they'd realize those files on the BEST website are (last I checked) over a year out of date.

    Thinking to process the data themselves, they might go to the Sources File page of the BEST website. After downloading more files there, they might discover they are looking at files two or more years out of date. It is only after all this trouble caused by BEST not disclosing the dates associated with the files they publish on their Data page they'd likely try digging through the SVN like I described above.

    For those who are curious, some of this might be based on my personal experience. I spent some time trying to figure out some odd problems with BEST's published records for Antarctic stations. I've had three different BEST team members tell me there is no data for Antarctica before 1960 (1955 or 1950, depending on the comment and the person) even though that's clearly untrue. When I pointed out that was untrue, the story changed, and then they just stopped talking. Since they can't be bothered to even look at the data they make claims about, I thought I'd try to figure out what's going on with their Antarctic data. Thus far, all I've discovered are problems. One of the biggest ones is there just doesn't seem to be an explanation for the step change in the BEST uncertainty levels. BEST says its because there was no data in Antarctica before the step change, but that's just not true.

    But there are a variety of other issues. For instance, some stations are being duplicated. One appears to have been used three times. Another seems to have had part of a different station tacked onto it 30+ years before its record actually begins. For others, data in the BEST data set simply doesn't seem to have been used. And my absolute favorite one is several Antarctic stations have "empirical breakpoints" found in them by comparing them to their neighbors during time periods BEST says there is no Antarctic data. If there is no data in Antarctica before 1950, how are they calculating "empirical breakpoints" in Antarctica prior to 1950?!

  8. My last comment was supposed to say: "the first steps each month are probably (MAKE CORRECTIONS) to last months errors."

    Brandon, you just described why I think they do little version control. Clearly, they do not want replication of their process by outsiders. I'm not sure even they could replicate one production run to the next.

  9. CoRev, I'm still quite annoyed about the fact BEST went to the media to give its view on whether or not 2014 was the hottest year even though they haven't published anything to back up what they say. I think it's wrong for a group to try to get people to look at their results via the media while refusing to publish those results, much less the code/data used to generate them.

    It also shows the weakness of their version control. If and when they do publish results which cover all of 2014, will those results match what they gave out to the media? Who knows. And if somebody wanted to verify the numbers they gave out to the media? It probably couldn't be done. Nobody knows just when BEST did the calculations or with what data. And suppose BEST does publish them? Will it overwrite the results it currently has on its website? If so, will it tell anyone? Will they let anyone compare the new results to the old ones? Who knows.

    Here's a challenge I'd like to see someone complete. I'd like to see someone go to the BEST data page and recreate the data sets posted there, explaining the steps they took to do it. I don't think it could be done with the information currently provided by BEST. I'm not sure the people at BEST could do it. I'm not even sure the people at BEST could tell you what data was used to create each of the data sets.

    Maybe BEST could manage it. I don't know. What I do know is they changed the website to remove the dates listed with data sets. That means nobody currently looking at the site could even tell when the data sets were created. The best they could hope for is to download each one and look to see how far the data extends. If just figuring out when results were published is the first step necessary in replicating results, you know there's a problem.

  10. Brandon:

    I don’t think BEST reruns its calculations every month. I know they talked about how they’d like to at one point, but I don’t think that ever materialized.

    I think they aren't updating partly because their code runs so slow, which in turn is at least partly the result of using interpreted code, and probably partly due to the way the data are being stored on their system (ascii versus binary format).

    If you're getting paid to write a software package, and you end up with the code so slow that you can't do what you promised you'd do with it (to your funding agency), I'd say the onus is on the software writer(s) to get your code to deliver what you promised it was going to deliver.

    They've turned all of the problems that have been found into a giant excuse making exercise... Data weren't available before then. How do you know that the continent isn't warming and cooling at the same rate. Our code just runs to slowly to do this. et

  11. Carrick, that sounds about right. The only defense I can offer for BEST is I don't know they actually promised anyone anything to their backers. They've made a point of saying their donors don't influence what they examine. It may be they got their funding with no strings attached. Or maybe they have met the strings that were attached. I don't know since I don't know what (if any) requirements were placed on their financing.

    Of course, that's just for promises to the backers. BEST certainly hasn't fulfilled a number of promises it has made to the user. The most obvious of these is BEST hasn't been open and transparent like it claims, intentionally hiding problems from its users.

  12. Mark Bofill, there isn't one. I suppose I should change that. I never really thought about it because I've given out my e-mail address a number of times on climate blogs so it's no secret (and really, it's just my name separated by a period at Gmail).

    I think I'll add it to my About page real quick and think about making a Contact button later on.

Leave a Reply

Your email address will not be published. Required fields are marked *