Why I Don't Trust BEST

The Berkeley Earth surface temperature (BEST) project was supposed to be a great thing. It was supposed to resolve the concerns skeptics had raised about the modern temperature record. It was supposed to resolve not just technical issues skeptics had raised, but also basic concerns about openness and transparency.

Once upon a time, people managing the modern temperature records wouldn't even share basic information like what temperature stations they used. It was disgraceful, and it caused a lot of distrust. It was also one of the main reasons BEST was formed. BEST was supposed to help resolve the trust issues by being completely open and transparent. BEST has promoted it's openness and transparency time and time again, and its one of the most touted aspects of their project. The problem is, it's a lie.

Now, I have a number of outstanding issues with the BEST methodology. I discussed some of them in a post a few months ago. I'm not going to worry about them today though. Today, I'm only going to look at matters of openness and transparency. As a preface for this, I want to offer my favorite quote from Richard Muller, the head of the BEST team:

And if you don't like our results, my [advice] is to change the program, but be open and transparent about it. Let us know what you changed.

That is how science should work. BEST provides its data and code, and if people want to make changes to it, they can - they just need to tell people what those changes were. The thing is, that includes the BEST team. If BEST wants to make changes to its data and code, it needs to tell people just like anyone else would. It doesn't.

A few months back, I created this .gif to show four different versions of the BEST global record which have been published throughout the years:

1-16-BEST-Versions

None of those are archived by BEST. The differences between them have not been discussed or explained. They haven't even been acknowledged. BEST has simply updated its website, overwriting its old results and leaving no record.

But it turns out there were more than four different versions of BEST's global temperature record. A couple days after making that .gif, I discovered there were at least three more. Here is a .gif showing seven different ones I've managed to find (listed date being the most recent data point in the version):

example_1

There might be more. BEST doesn't archive its results anywhere for people to see so it's impossible to know how many different versions there have been. Even worse, BEST doesn't update its results in anything resembling a consistent manner. For instance, BEST has a Summary of Findings page with a number of images like:

Capture

If you look at the .gif above, you'll see that is the version going up to November 2011. The calculations for it were run in March of 2012, but the data file linked to beneath it is to "More recent data," results from calculations run in January of 2015. They've rerun the calculations for the results in that figure at least five times since they made it, but they've never updated it.

That's just an image though. More troubling is data files aren't updated either. If you look for BEST's results for the United States, you'll find an informative little page with some helpful images. But if you look at the data file for the page, you'll find:

This analysis was run on 12-Oct-2013 00:45:15

So the global BEST results were updated in January of 2015, but regional results were last updated in October of 2013. Why aren't BEST's regional results updated along with its global results? That's not the question to ask. Maybe there's a reason. Maybe there's a reason global results were updated in January of 2014 and again in January of 2015 (and possibly other times) while regional ones were not.

Maybe. But What there is no reason for is BEST's failure to warn its users. Sure, a user can happen to notice the difference in datestamps across various files if they're careful, but how many people will manage that? Heck, how many people will even look at those data files? How many people will think, "They updated some results in 2015, but I should check to see if the results they publish alongside those were also updated"? None. If you publish a bunch of results on a website together, people are going to assume they all go together.

And this isn't just a matter of regional vs. global data files. You can also pull up data for individual cities. When you do, you find data files which say:

The Berkeley Earth analysis was run on 15-Nov-2013 19:55:48

So city data files were last updated in the middle of November of 2013 (yet for some reason have results for November of 2013), regional data files were last updated in October of 2013, and global results were last updated in January of 2015. BEST updates its data set every month. That could well mean none of these data files were calculated using the same data. Even worse, it is practically impossible to know which version of the BEST data set might have been used to produce any of these files.

And this isn't just a matter of a lack of sensible archiving/updating policies. It's not just a matter of making sure your published results go together or can be verified. Because BEST erases its old results whenever it updates these files, it gets to cover up its mistakes. I recently came across an old post by blogger Deep Climate where he showed a strange divergence between two versions of BEST's global results. A BEST team member, Zeke Hausfather, showed up to explain the difference:

Deep,

The primary divergence between the 2011 and 2012 Berkeley series was due to a bug in latitudinal weighting that was fixed.

I don't get along with Deep Climate, but I agree with the questions he asked in response:

So Berkeley Earth released a result in 2011, and then corrected a major bug with the July 2012 update, but did not advise anyone of a major discrepancy. When was BE going to release that fact? Never?

I agree even more strongly with something he updated his post to say:

[UPDATE 8/20: As Berkeley team member Zeke Hausfather notes in a comment below, the Berkeley 2011 series had a significant error due to incorrect latitudinal weighting. That problem resulted in an elevated trend compared to the corrected 2012 series, with a four-sigma difference in the 2000s decade. It also had a clearly unrealistic absolute average temperature a full 2 C lower than other Berkeley series, including the contemporaneous GHCN 2011 series. It is therefore somewhat surprising that the Berkeley Earth team did not realize that this series was problematic.

It is not yet clear when the error was discovered, but it seems it must have been earlier this year. At some point (I can’t recall exactly when), the Berkeley Earth data analysis page was updated with a corrected chart that showed a little less post-1980 warming, and the link to the underlying data was removed. That page is still available; it contains no indication or notice that the previous version had been corrected.

I’m inclined to give the Berkeley Earth team the benefit of the doubt and presume that the failure to disclose this error was not an intentional effort to mislead. However, it does point up a serious problem in Berkeley Earth’s science communication.

Prior to reading Deep Climate's post, I had never heard about this bug. I've spent a ton of time looking at the BEST website. I've read all of its published papers and memos, multiple times. I didn't see a word about this bug. That's because BEST didn't bother to tell anyone about it. BEST screwed up in its earlier version, changed its code to its mistake, overwrote the old results and just pretended nothing ever happened.

I thought it was bad enough BEST has changed parts of its methodology without disclosing such (e.g. completely changing how it handles seasonality in its data), but this is just absurd. Short of finding a single post from several years ago on some relatively unknown blogger's website, there would be no way for anybody to know about this bug. And even if you did find out about the bug, good luck figuring out what effect it had - BEST doesn't provide its work/results so you can check.

To make matters worse, back in the middle of 2013, BEST changed one of its pages to add:

Berkeley Earth Analysis Change Log

Since its publication in early 2013, the code base continues to be refined and improved. The code provides the best documentation of the changes that have been made. A change log summarizing the major changes will be available shortly.

That was two years ago. For two years, BEST has been promising tot provide a "change log summarizing the major changes" to its work as part of its effort to be completely transparent. It's obscene.

On top of that, four months ago, BEST published posts on several blogs saying the adjustments they make to their data don't have any significant effects (e.g. here). I disagree with that conclusion, strongly, but again, this post isn't about methodological issues. It's just about openness and transparency.

BEST published some results it got when it redid its calculations without adjusting data (primarily in the form of images). Naturally, some people wanted to look at the results of the tests to see for themselves. They couldn't. The results of the tests weren't available. BEST provided a few images, but that's it. BEST had decided to tell everybody what the answers were while not providing the material to allow people to verify those answers. Even worse, a couple months later, Richard Muller (who you'll remember is the head of the BEST project) gave an interview to get PR for BEST in which he said:

"Furthermore, because of the interest, we re-analyzed all the data with ZERO adjustments, just to see what we would get. These results have been made available online. What we found was that the conclusions we had previously drawn were unchanged. The data are available here

Anyone who followed that link would quickly see the data was not available there. It wasn't available anywhere. Muller, for whatever reason, falsely told the media BEST has made its results sans adjustments available to the public. Somehow his false claim then went unnoticed for a while until I saw it and spoke up. After I e-mailed BEST and contacted the author of the article, this note was added:

[ Link not currently working, BEST tech team are aware of the issue and we will update when we have more information]

It takes a huge stretch of the imagination to read, "Link not currently working" to mean, "The data was never publicly available like Muller claimed, and the link he provided isn't currently working because there was never any real link he could have provided." I don't know what BEST told the author of the article to get that note added, but what she said on Twitter should raise some concerns:

BEST never made the data public. It's that simple. How in the world did anyone come up with the idea migrating to WordPress caused a link to break? I have no idea. What I do know is that tweet was written over a month ago. The article was updated over a month ago. That means BEST has known about the false statement in that article for over a month, and it still has not done a thing about it.


I have a bit of good news though. I've been in contact with BEST about this issue a bit, and I've been informed they aim to have their results sans adjustments made publicly available toward the end of this month. That's cool. I'm happy about it. I'd just like to know one thing. Why has it taken this long?

Actually, I'd like to know a second thing too. BEST is supposed to be completely open and transparent. The effect adjusting their data has on their results is obviously a matter of interest. Despite that, they didn't publish information on that for several years. While I'm obviously curious why they wouldn't publish such information from the start, what I'd like to know is... when were they going to publish it?

Seriously. I pointed out they hadn't released those results months ago when they were writing posts about them. I was blown off. I said it wasn't write to expect people to believe what you say about data if you don't make the data available for them to check for themselves. They didn't care. The only time I heard anyone from BEST say they should release the material is after I pointed out BEST had falsely told the media they already had.

Is that a coincidence? Maybe. Or maybe, if I hadn't noticed Muller's falsehood, BEST would have kept not releasing the material while telling the media it already had. I don't know. What I do know is this isn't being open and transparent.

Unless failing to disclose significant bugs in your work so as to ensure people don't find out about them is being "transparent." And not publishing results of tests you perform is being "open." And falsely claiming to have published things you haven't published is being "honest."

I threw that last one in there because it still bugs me to no end BEST hasn't admitted the truth regarding Muller's false statement. After a month, the only conclusion I can reach is they're being dishonest.

7 comments

  1. An observation:
    BEST seems to promote itself as a library. Librarians keep careful records about their collections. BEST acts more like a bakery. Bakeries only care if customers buy the finished product.

  2. IIRC, BEST set out to address the issue of whether the historical temperature record was being biased by station quality, data adjustments, and Urban Heat Island effect.

    While BEST may be far from perfect, do they adequately answer that question?

  3. Russ R., BEST wasn't created just because of technical issues. Richard Muller has talked about how he wasn't sure if he could trust the mainstream results of climate science due to the behavior of various people. That's why he always stresses BEST being (supposedly) open and transparent.

    Regardless of that though, I'd say the answer to your question is no. On the issue of data adjustments, I mentioned this in my post, but BEST has never even disclosed the effect adjustments have on its results. I gave some more detail about that in an earlier post. Basically, what information BEST has provided shows its adjustments alter the total amount of warming it finds and greatly reduces the spatial resolution of its data set.

    Which ties into the issue of station quality/UHI. With the ridiculous reduction in spatial resolution caused by BEST's data adjustments, I'm not sure we can say anything about issues like that. BEST smears warming signals so much something like a quarter of the continental United States has a spurious warming trend added in (even though Muller pretty much denied anything like that ever happens with BEST's results in the interview I mentioned in this post). We're talking about spatial resolution so poor it can't pick up patterns that cover a land mass the size of Australia.

    On top of that, I'm not even sure BEST's approach to studying the UHI issue makes any sense. I talked about why I say this before, and Steven Mosher has said I'm completely wrong on the issue (calling me a liar in the process). However, he conveniently failed to give any information, details or even response to help people judge whether or not I am. What I said is taken directly from what a BEST team member, Zeke Hausfather, said, so it's hard to see why I'd be wrong.

    Assuming I'm not completely misunderstanding something, it appears BEST studied the UHI issue only after homogenizing its data. That is, it adjusted its data set to make urban and rural stations more alike, then it tested to see if it could find a difference between urban and rural stations.

    I could be wrong though. Mosher claims I'm lying about this issue, and he says the data for BEST's UHI testing is available on the BEST server, which would prove me wrong. I haven't been able to find that data, and Mosher hasn't been forthcoming with anything like a filename to help out, so I'm not sure on the subject right now.

    BEST's work has a significant number of undisclosed aspects and details, as well as what is probably at least one serious methodological problem. None of that is enough to say global warming might not be real or anything like that. The planet has certainly warmed. BEST just hasn't answered the questions it set out to answer, and its claimed levels of certainty in its results cannot possibly be justified. To be honest, I think BEST's current results are actually worse than what we had before. I'd trust GISS or HadCRUT's work more than BEST's.

  4. I thought that BEST had actually resolved the issues that it set out to address, and as such, I'd effectively removed "bias in the surface temperature record" from the list of climate stuff I'm skeptical about. From what you've written, it looks like I may have to add it back to the list.

  5. Russ R., I wouldn't expect too much from any biases in the temperature record. Quite a while back, I said the absolute maximum effect I could believe things like UHI had on the temperature record is 50%. That is, at a minimum, there has been, at a minimum, at least half as much warming as the current instrumental records say. I suspect the actual amount would be significantly greater, and I accept the possibility problems in the instrumental record are actually reducing the amount of warming instead of increasing it. All in all, I think issues with the instrumental record may influence our estimates of sensitivity and whatnot, but I don't think they will overturn any major ideas.

    The biggest reason I find the issue important isn't actually the direct impacts it would have (though I do think it matters quite a bit for things like examining climate model validity). The main reason I find this issue important is it is pretty much the simplest task for the global warming movement. If they want people to be worried about global warming, I think the most fundamental thing to do is to come up with a good way to measure global warming. If they can't do that well, there's little reason to believe they can do other, more complicated, things well. That extends beyond the quality of their work as well; it goes to the integrity of their work.

    Let's suppose people doing all sorts of stupid and potentially dishonest things managed to get the "right" answer on the temperature record. Does that inspire confidence climate science will get the "right" answer on climate modeling? Emission scenario projections? Feedback estimations? Anything? Not to me. Even if none of the issues I have with BEST could change their results in a meaningful way, the fact BEST can behave in outright dishonest ways, while be promoted as being completely open and transparent, means I can't have much faith in other work.

    It's sort of like the hockey stick fiasco again. The hockey stick wasn't important for the scientific debates around global warming. What makes it important to me is it showed the standards of the global warming movement. With the hockey stick, terrible and fraudulent work was promoted at the highest levels without any due diligence. BEST is like a much reduced version of that. BEST's work isn't terrible or fraudulent, but it is flawed and described dishonestly. In the same way, BEST isn't promoted anywhere near as highly as the hockey stick, but it is still fairly widely promoted by people who haven't done any due diligence on it.

    Seriously. How does nobody care BEST created a website to encourage people to look at the historical temperatures of any city or specific area of the planet when looking at temperatures of anywhere in Illinois is effectively the same as looking at temperatures of anywhere in Ohio? Or Nebraska? Or New York? Instead of increasing the precision of its results (on a spatial scale), BEST has greatly reduced it. Even if BEST does a good job of estimating the global temperatures, it's practically useless for estimating temperatures of any area or region. And nobody seems to be bothered by that.

Leave a Reply

Your email address will not be published. Required fields are marked *