Misleading People about BEST

Last month I wrote a post saying:

BEST homogenized rural and non-rural stations then found little difference between the two. Rather than saying, “Oh, that’s what homogenization does,” it said, “Clearly, UHI isn’t a problem for our results.”

This weekend I came across a link on Twitter which makes things even worse. The link has a transcript of an interview with Richard Muller, the head of the BEST project. In it, he says things which are wrong, if not outright dishonest.

The example which stood out the most to me was:

Two more things. The urban heat island effect. That was something we studied I think in a clever and original way. [As opposed to] using all the stations, we could derive the temperature rise based only on rural stations. We got the same answer.

I don't know why Muller felt using only rural stations would be "a clever and original way" of studying the urban heat island (UHI) effect. If you think there is a problem with data taken from urban areas, an obvious thing to do is look at the data for non-urban areas. There is nothing "clever" about that. That's why plenty of other people had thought of doing the same thing, meaning it wasn't "original" either.

But that's a side issue. The main issue is what Muller says here is simply wrong. As I pointed out in my previous post, BEST team member Zeke Hausfather recently said:

To be fair, the separate study that Berkeley did was on homogenized data, so a lack of detectable UHI mainly just indicates that it was effectively removed.

Berkeley Earth, or BEST, did its testing for the UHI effect on a homogenized data set. That homogenized data set was created by taking data from rural and non-rural areas and modifying them to be more like one another. That means BEST used non-rural data to modify its rural data then used the resulting "rural" data to test for the UHI effect. And then BEST's team leader said:

we could derive the temperature rise based only on rural stations. We got the same answer

In what world are results "based only on rural stations" when those "rural stations" were modified using data from non-rural stations?

That's like me saying there is no income inequality between white and black people because after I homogenized the data for white and black people together, I couldn't find a difference between them. Then, having homogenized the data together, I can remove data from one group or the other without my results changing. With such a little trick, I can show income inequality has vanished!

Or, you know, I could not be dishonest. There is no way anyone hearing Muller speak would realize what BEST had done. Anyone hearing BEST calculated results "based only on rural stations" would assume those results were "based only on rural stations." None of them would assume those results were "based only on rural stations which were modified by using non-rural stations." Despite this, Muller has the audacity to say:

I don't think anybody who has responded in the media so far has actually studied our work. We don't expect immediate agreement on such things. What we expect is that by being transparent, open and clear...

Muller's description of how BEST examined the UHI effect was not remotely "transparent, open and clear." Nobody listening to Muller could have possibly guessed what BEST actually did. Even worse, Muller then said:

- by having the data online and the computer programmes so people can see precisely what we did

But to this day, BEST has never published the code it used to test its methodology. That article was written over two years ago, and still, code and data for tests BEST performed are not available. In fact, just a few weeks ago, BEST member Steven Mosher argued against providing it. People wanted to see what tests were performed in order to verify BEST's work, and BEST's response was to refuse.

There are a number of other issues with that interview, but I'm too tired to comment on everything. I just wanted to make sure someone noted the misleading portrayal of BEST's work on UHI by BEST team leader Richard Muller.

It's pretty bad to talk about being open and clear while telling people things you know will mislead them.


  1. Brandon - I missed Mosher arguing against releasing the code. Where did that discussion occur?


  2. Ian, I'll get to that after I post some excerpts from a different exchange which I think sum up why I am a critic of BEST. A couple weeks ago I said:

    I’d wager 90% of criticisms of BEST stem, at least in part, from BEST not even attempting to make things clear. If you want people to trust your results, you shouldn’t just hand them code and data and say, “Here, spend a couple months examining it.” You should do simple things like:

    1) Explain what decisions go into your methodology.
    2) Explain what effect those decisions have.
    3) Explain why those decisions were made.

    BEST’s papers don’t do that. Neither do the appendices or methodological descriptions you’ve posted. The only way a person can figure out 1) is to examine the code. The only way a person can figure out 2) is to rerun the code for every issue. The only way a person can figure out 3) is to… well, they can’t. You guys haven’t explained the reasons for most of your decisions, and people can’t read your minds.

    If you guys have truly done the work to examine the issues like you say you have, all of those should be simple to do. It would take time, but anyone could do the writeups. I would be happy to. Heck, I’d have done it already if I had any way to.

    Instead, I’m stuck with questions like, “What is the impact of BEST’s homogenization on its results over its entire record” because you guys just don’t publish basic results of tests you’ve performed. You don’t even discuss them unless you get too much media pressure to ignore.

    Steven Mosher responded:

    I too would love to have that level of documentation for you.
    but you’ll have to live with what had to live with when I joined.
    It was better than what I got from hansen or anyone else, so I dont want to make perfection the enemy of the good.

    I don't think that's an acceptable response so I said so:

    See, I can’t accept that. You set standards for what should be done. BEST hasn’t lived up to those standards. That’s bad. That’s bad even if other groups also failed to live up to those standards.

    And really, while other groups were initially far less up front with their methodology and data, nowadays it seems they’re better. Every time I’ve looked for an explanation of something GISS or HadCRUT does, I could find the answer. I could usually even find a clear description of what they did. I could usually even find at least some commentary on what effect it has, if not some results detailing it.

    With BEST, that’s not the case. I’ve read every paper and post BEST has published. There are still tons of details I don’t understand the reasoning behind. There are some aspects to the methodology I would have never even realized existed if nor to examining the code. That’s bad. A person should be able to understand what was done and why by reading the documentation published along with the results.

    That pretty much sums things up to me. A set of standards were agreed to as to what should be done. BEST was created with the stated purpose of meeting those standards. BEST failed to meet those standards. Mosher, one of the people who pushed the most for those standards, then joined the BEST team and defended their failure to meet those standards.

    Given that context, I'll provide one example of Mosher arguing against releasing code/data. During a thread about BEST's results over at Judith Curry's place, Mosher repeatedly insisted BEST had done tests which showed their work was fine, but the data, code and results for these tests have never been made available. Willis Eschenbach discussed this with a focus on one of the issues BEST supposedly tested for, saying:

    Steven, is there some part of your own motto “Free the data, free the code” that is unclear to you? In this thread you’ve stated many times that you’ve done this test or that test … sorry, but without the data and the code for your tests that’s just an advertisement, not science in any form.

    To date we have no idea what kind of tests you’ve done on the question of the saw-tooth problem. It’s clear that if you take a trendless saw-toothed wave and scalpel it at the drops, it gives an erroneous trend. So it is most assuredly a potential problem. How does this potential problem affect your reconstruction? Well … we don’t know.

    All we have to date is lots of claims from you that you’ve tested it and it’s not a problem, accompanied by a consistent refusal to release the code and data used in the tests … sorry, but in 2015 that doesn’t cut it.

    It may well be that it’s not a problem in your implementation and methods, Steven, and I’m not claiming that it is. However, it’s clear that it’s a potential problem, and so far you haven’t presented one single scrap of evidence that it’s not a real problem … and curiously, despite hiding your data and code and offering endless vague platitudes instead of test results, you’re more than willing to abuse me for continuing to ask for your data and code and test results. Doesn’t exactly inspire confidence …

    Science? I don’t think so … you don’t seem to get it. You seem to think that we should trust you. I’m sorry, but that ship has sailed, not just for trusting you, but for trusting any scientist. In 2015, without data and code it’s just advertising.

    My suggestion for how to end this endless back and forth is for you to write up and publish these tests that you’ve done. You have a blog, you can describe the tests you ran on the sawtooth question and provide your data and code so we can all see if there are bugs in the code or faults in the logic.

    Because simply claiming over and over that you’ve done the tests and found “mouse nuts” is … well … less than convincing …

    Mosher's response was:

    You’ve made your personal animus against Muller and Rohde quite clear.

    read this for your answers.

    “When you ask a question in bad faith, you are essentially looking for a way to demean, degrade, or otherwise destroy your target. A good example of an obviously bad faith question is the perennial favorite “When did you stop beating your wife?” as it instantly casts doubt upon the person asked the question.

    Eschenbach's response sums it up:

    I see. In other words, why should you make the data and code available to me, when my aim is to try and find something wrong with it?

    Steven, I truly can’t believe this. You are channelling Phil Jones to a T. Your claim is that because you don’t like the way I asked, you are justified in not revealing your data, your code, and your test results.

    You don’t like the way I asked? You think I have “bad faith”?? Well, aren’t you a delicate flower. I didn’t say “Mother may I”, I didn’t phrase things the way you require, so you’re gonna take your ball and go home?

    I’m sorry, my friend, but revealing your data and code is not optional. It doesn’t depend on whether you think the person is asking in “good faith” or “bad faith”. It doesn’t matter what they plan to do with the data and code. Transparency is the heart and soul of science.

    If you are acting like an ethical scientist, you release your data and code when you release your results, without being asked. If you are acting like a not-so-good scientist, you release your data and code and results on request.

    And if you’re not acting like a scientist at all, you whine that the person is asking in “bad faith”, and on that ludicrous basis you refuse to reveal your data and code.

    I could provide some other examples, but I find that one to be the most striking. It's my understanding those two are friends, yet Mosher still refuses to provide data and code for results he wants people to accept because of tests they cannot verify. And his excuse? Because his friend is acting in bad faith.

    There are also some quotes which show BEST knows its uncertainty calculations are under-estimated because of issues it doesn't tell anyone about. One such issue even introduce a temporal bias which artificially inflates past uncertainties while deflating present uncertainties. BEST dismisses these issues as being too small to matter, but they then turn around and talk to the media about results based upon how small their uncertainty levels are. In doing so, they say things they couldn't say if they were honest about the problems with their published uncertainty levels.

    I'm not dredging those quotes up just yet though. I'm planning on uploading a post summarizing some issues which cause BEST's published uncertainty levels to be smaller than BEST knows they ought to be next week. I'll include the quotes then. Until then, cheers!

  3. Have you looked at Phil Jones's UHI paper? I haven't looked at the latest report, but previously I got the impression that they took his work on China UHI to be the last word and no consideration of UHI was needed.

  4. MikeN, I've looked at it. It's not a very good paper. I think Jones's work was given more credit than it deserved for a while, but that doesn't seem to be the case anymore. Nowadays a number of other papers seem more important. Of course, some of those papers are bad as well. There are some pretty good ones, however.

  5. "But to this day, BEST has never published the code it used to test its methodology. That article was written over two years ago, and still, code and data for tests BEST performed are not available. In fact, just a few weeks ago, BEST member Steven Mosher argued against providing it. People wanted to see what tests were performed in order to verify BEST’s work, and BEST’s response was to refuse."


    the dataset used for testing is in the SVN. its always been there.
    Even My alternative tests are there.

    For UHI. there were two data sets.

    Rural and non rural. rural is homogenized with rural. urban with urban

    The test that willis was asking for was my sawtooth test on an issue raised at WUWT. That work hasnt even been published yet.
    stop lying

  6. Steven Mosher, if your response amounts to little more than, "Nuh-uh," there's little point in making it. Nobody will be convinced by comments like, "stop lying." This is especially true if when pointing out my supposed lies you say:

    For UHI. there were two data sets.

    Rural and non rural. rural is homogenized with rural. urban with urban

    Which is a direct contradiction to what I took your teammate Zeke Hausfather as saying when he said:

    To be fair, the separate study that Berkeley did was on homogenized data, so a lack of detectable UHI mainly just indicates that it was effectively removed.

    Which caused me to come up with the interpretation you now dispute. When I expressed this interpretation at Judith Curry's blog, Zeke responded to me without disputing my interpretation. I do not see how to square Zeke's description of what BEST did with your description. I certainly don't see why he'd respond to me misinterpreting his remark and incorrectly describing what BEST did without saying anything to suggest my description is wrong.

    If you want to have an actual discussion, I'm game. However, if you can't elevate yourself above the level of, "Nuh-uh" and "You're a liar," I don't see a point. People can assume I'm a liar and everything you say is perfectly true and accurate, but I don't think many will. I think most people will find your comments unconvincing so long as you continue to insist on doing nothing to give anyone cause to believe them.

  7. So, can we take Mosher's continuing silence here as either an admission of error, or of lying himself?

  8. Anto, I wouldn't take it as anything more than Steven Mosher being an obnoxious brat hell bent on avoiding anything resembling a reasonable discussion. That doesn't mean he's wrong. It just means one should be highly skeptical of anything he says.

Leave a Reply

Your email address will not be published. Required fields are marked *