I Can't Believe it Worked

I'd like to apologize. I can't though. I don't feel bad about what I did. I intentionally deceived people, but I did it in a way people wouldn't be fooled (at least, not for long). You see, I knew what I said in this fairly visible post at Watts Up With That? was wrong. I knew what I said in post here a few days ago was wrong. I said it anyway.

You see, for the last couple years I've tried to draw attention to a number of problems with the BEST temperature record. It didn't work. I couldn't get anyone at BEST to care, much less to acknowledge the problems. I won't rehash the whole history here. Suffice to say I've tried talking to BEST members Steven Mosher, Zeke Hausfather and Robert Rhode about some issues that are pretty much indisputable. Zeke was friendly but didn't know enough to answer, Rhode didn't respond and Mosher was... well, Mosher. I think it'd be best if I don't say more.

The point is I've been unable to get meaningful responses to simple points for over two years now. It's been said insanity is doing the same thing over and over expecting different results. That was on my mind when I discovered some new problems with the BEST temperature record. I considered writing those last two posts about BEST in an accurate way, but I knew I'd just get ignored if I did.

So I decided to play a trick. I knew BEST members would just ignore me if I pointed out these problems in a normal fashion. I also knew at least one BEST member, Mosher, would quickly respond to me if I made a bad argument he knew he could rebut. The solution was obvious. If I wanted a response to my valid points, all I needed was to make my valid points in a bad argument.

So that's what I did. I had three problems I wanted to highlight. First, BEST fails to recalculate its breakpoints during its jackknifing calculation, causing its calculated uncertainty to be too low. Second, BEST's choice of baseline period in its uncertainty calculations screws with those calculations. Third, BEST's results are unduly influenced by a handful of stations in Antarctica, including a number of stations whose record in the BEST data set have problems (such as one station appearing at least three times).

The third problem was responsible for a step change in BEST's uncertainty levels. I knew this because it was discussed in one of their papers which I had reread when examining the other two problems (which is what made me even look into the issue). I also knew that step change coincided fairly closely with the timing of one of the other problems. That gave me the idea of conflating the two issues.

I figured if I blamed this step change in uncertainty levels on the choice of baseline period in BEST's uncertainty calculations, I would look like a fool to the people at BEST. Thinking me a fool, they would consider me an easy target and would respond to say I was wrong about the effect of the problem. But to say I was wrong about the effect, BEST would have to say what the actual effect was.

It worked. The three BEST members I've tried talking to before quickly responded to tell me I was wrong. In doing so, they confirmed the problems I had found. The choice of baseline in uncertainty calculations was acknowledged to influence those calculations:

2) In the statistical calculation, the choice of a 1960-2010 baseline was done in part for a similar reason, the incomplete coverage prior to the 1950s starts to conflate coverage uncertainties with statistical uncertainties, which would result in double counting if a longer baseline was chosen. The comments are correct though that the use of a baseline (any baseline) may artificially reduce the size of the variance over the baseline period and increase the variance elsewhere. In our estimation, this effect represents about a +/- 10% perturbation to the apparent statistical uncertainties on the global land average. Again, this is completely separate from the large step-increase in uncertainty associated with the absence of Antarctic data.

As far as I've been able to tell, BEST has never disclosed this issue before. It's unlikely they would have if I had continued raising issues in the normal fashion. It was only because I created my strawman that I was able to get BEST to acknowledge this problem. Similarly, for the issue of not rerunning its breakpoint calculations, BEST has now said:

3) With regards to homogenization, the comments are only partially correct. The step that estimates the timing of breakpoints is presently run only once using the full data set. However, estimating the size of an apparent biasing event is a more general part of our averaging code and gets done separately for each statistical sample. Hence the effect of uncertainties in the magnitude, but not the timing, of homogeneity adjustments is included in the overall statistical uncertainties.

I've always been told there are no adjustments to the BEST data so there wouldn't be any need to estimate the size of them, but now I'm told otherwise. I'm not sure what ot make of that. What I do know, however, is BEST has confirmed my point that they do not estimate the uncertainty introduced by their homogenization.

Finally, I got BEST members to raise the issue of the importance of data in Antarctica for me. This is the real source of the step-change I highlighted. Here are a couple comments from BEST members on it:

Prior to 1960 there is no data at all in one of the world’s continents, Antarctica, which significantly increases the uncertainty in the global reconstruction.

1) First off, the step-wise shift in uncertainty has nothing to do with normalization or data processing issues. The large increase in uncertainty prior to about 1950 is a simple consequence of the complete absence of weather stations in Antarctica prior to the 1950s.

It is trivially easy to see neither remark is true. I was able to quickly respond showing there was data prior to 1950 because I had spent time studying the BEST data for Antarctica prior to writing either of my posts. That's why I knew there are actually ten BEST stations available for Antarctica before 1955 (actually eleven, but one is a duplicate of another).

This let me raise a point I find amusing. Two of those stations get their early portions discarded by BEST because BEST found "empirical breakpoints" in them. Empirical breakpoints are found by comparing stations to their neighbors. That means BEST is simultaneously claiming there is not enough data to estimate temperatures in Antarctica before ~1960, but there is enough data to tell when stations measuring those temperatures are wrong!

So do I owe anyone an apology? I'm not sure. Regardless of my intentions, I did intentionally mislead readers. On the other hand, I misled readers in a way I knew would be easy to correct. I would have written a correction myself if nobody else had done so. That means there was really never any harm to my deception. This is unlike the deception BEST pulls when it releases reports showing things like:


Even though the responses to me show BEST knows there are problems with its uncertainty calculations which means it cannot possibly achieve the precision it claims to have, and that its "95% confidence limit" is actually lower than 95%. I know, "He did it first" isn't a good justification for anything, but if BEST is going to knowingly deceive people about its results, I can't feel bad about using tricks to get them to admit their knowledge of their deception.

Of course, I could be lying about all this. I might have just made a boneheaded mistake in my posts which correctly identified two undisclosed problems with the BEST methodology. I might have just inadvertently got BEST to admit it knows the precision it gives for its results is exaggerated. It might just be chance this is the only time BEST has acknowledged any of the many issues I've raised with its work.

That'd be a heck of a coincidence though.


  1. Thanks for your efforts on this. I also found a similar problem with the CI which I reported to them within a couple of weeks of finding out about the new temp metric:


    I believe the rescaling which occurrs for each Jacknife series violates the statistical assumptions of the Jackknife calculation. Not sure though which way the CI would go, I just didn't like the un-vetted mathematical implications of it.

  2. Jeff Id, thanks for your comment. Unless I'm mistaken, BEST added a line to their paper to "address" the issue you raised. In it, they basically say non-linearity in their methodology means their confidence intervals aren't actually accurate. I believe they hand-wave it away by saying it should have a small effect, but they don't actually demonstrate the problem is unimportant. (I'm on my phone right now so I can't look up the exact quote.)

    I don't think BEST should be able to just dismiss problems as not mattering on nothing more than their word. Maybe they've done tests/math to figure out the impact of various issues with their methodology. If so, they should publish what they've done so people can know what impacts these issues have. They have a website. It would be easy to add a Caveats page which discusses the issues they know exist. Instead, BEST apparently just doesn't tell anybody about the problems they know exist in their work.

    By the way, it's interesting to note after a flurry of comments telling me I am wrong, the BEST teams members stopped talking. That strongly suggests I was right about my trick. The only reasons they responded to me was I made an obvious mistake. As soon as I showed I knew what I was talking about so they couldn't just make things up, they quit.

    Also interesting is in order to say I am wrong, three BEST team members have told us there is no data in Antarctica prior to 1960, 1955 or 1950, even though their website makes it obvious that's not true. When I pointed out they were just making that up, they shut up rather than admit their mistake or explain how it affects their criticism of me. Zeke at least tried to suggest BEST just doesn't use that data (which is a baffling claim), but that's the grand sum of how BEST addresses obvious errors they make.

  3. Brandon, it would be ironic if your origin comment turned out to be right. I don't have any opinion on this, as I've been trying to get my papers written while keeping on schedule in my research work.

    Jeff ID, yes I remember that too. I was underwhelmed with their response to your comments.

    It's interesting that Stephen Mosher mentions testing code on the WUWT thread. Yes, part of verification is testing. But part of validation is making the testing procedures and results available for external scrutiny.

    I thought rthe concept behind BEST was to provide a more transparent fully tested analysis platform. If so, it needs some work.

  4. Carrick, could you clarify what you're referring to when you say my "origin comment"? I'm not sure if that's supposed to be "original" or if you're meaning origin in a mathematical sense. Regardless, I do understand what you mean about time issues. I'm disappointed to know Part 2 of my overview of the hockey stick debate is going to be delayed because of this. I just couldn't keep ignoring issues with BEST in light of people discussing what it shows about the adjustments to the Puerto Casado station.

    It’s interesting that Stephen Mosher mentions testing code on the WUWT thread. Yes, part of verification is testing. But part of validation is making the testing procedures and results available for external scrutiny.

    I thought rthe concept behind BEST was to provide a more transparent fully tested analysis platform. If so, it needs some work.

    One of the most remarkable aspects of all this is Mosher has been a big proponent of making work open and verifiable, including full archives and changelogs so people can examine the entire workflow, not just the current results. Now that he's joined BEST, he disputes the need to even archive past results.

    He's even said its okay to publish press releases and reports without making the underlying data/code available because you only need to publish those along with peer-reviewed papers. It's incredible. It's especially incredible since the BEST website says things like:

    We welcome the additional review of anonymous journal referees (and indeed, agree that this is still an important gateway to ensure that all papers have a minimum of reviews) -- but we feel that our work should be judged primarily on the statistical approach used and the transparency of our analysis -- not on the fact that we have been published in a peer reviewed journal.

    If they want the transparency of their work to be one of the two ways it is judged, I'd say they're asking for a bad grade.

  5. I'm still reading the other links Brandon provided. Thanks for those, it seems that Brandon has been uncovering some unusual BEST results for some time. I have read some of these articles in the past while browsing climate blogs but reading the links in order helps. Blogging is mostly reading right?, and it is mostly thankless as well.

    I do think Mosher is getting hammered a bit hard for his contribution to BEST, he often gets hammered these days. He is the most visible of the group in the blog climate spectrum but others developed the breakpoint and CI methods but my understanding is that Mosher just signed on to help. It's kind of cool to get your name associated with a serious introduction to a new and 'more' open temp series. I think he was only involved in code though as his expertise probably doesn't extend to CI calculations or stats in general. On finding the problem I had identified with the CI calc, I really expected a quick and simple improvement or maybe some fun discussion with the authors, but instead I received silence from the main authors of the work. In fact, I don't recall any response at all from the main authors although Mosher tried to help. Steve has regularly answered my questions on these and other matters though and has helped with a data extraction code project for sea ice without any expectations on his part. I really don't have much contact with him by email, but objectively, he appears to be a bit sandwiched between the authors of the problems and the reasonable critiques.

    However, the wild intent of Brandon's recent posts led to a bit of a memory regarding temp series in general. I haven't told anyone the intent of what I had done before, but this happened just prior to Climategate being released.

    When CRUTEM was refusing to publish code and global temps were not transparent, I wrote a number of articles designed to deliberately increase the tone of the conversation. I had done the same for Mann and with Ian Joliffe at Tamino previously to reasonable effect. Just prior to climategate (days or weeks -without checking) I had written a couple of articles with some scientifically edgy comments about temp series in general that were starting to get some attention in blogland. I beleive that these, and my regular presence at CA, encouraged some of the CRU reaction post-climategate in that code and data was finally released. I worked very quickly to verify the results of their release as I had worked so hard to encourage it and wrote blog instructions for replicating their temp-series results.

    The edgy articles were so scientifically close to the border that Lewandowsky picked up on it years later, and unwittingly or intentionally put my name in two separate psychology papers accusing me of various forms of cognitive dissonance (layman psych). He actually misread phrasing of what I had written but that was the point of my article at the time, to make people pay attention to the withholding of information that those particular scientists attempted to pull. It's rather a fun conversation at the office when we talk about my internationally inaugurated psychological problems. Like these articles, I didn't expect anywhere near as much reaction as I received... or really any result at all. After all, blogs are literally nothing but words placed at an empty and unadvertised location on the internet.

    Stay careful right, or perhaps you might get a call from the British secret service on Saturday morning about some wildly dramatic climate-induced situation.

    Mueller received a very large compliment from Steve McIntyre some years back about his intelligence and general statistical ability some years ago. On discovery of the jackknife inconsistency, I had expected an immediate recognition of a minor problem followed by some interesting improvements to the methodology. It hasn't happened yet but someday people will either realize that the CI isn't mathematically accurate or as you have shown, trends aren't right, and perhaps BEST will vanish as a failed contribution to science.

  6. It would be nice if BEST improved from these critiques. They do have vastly more data incorporated and that is usually a good thing.

    An uncomplicated simple method provided alongside of the breakpoint homoginization method with the same data would give me a lot of confidence in it.

  7. Jeff Id:

    I do think Mosher is getting hammered a bit hard for his contribution to BEST, he often gets hammered these days.

    Mosher gets "hammered" mostly because of his behavior. He is a terrible spokesperson for BEST. I've seen plenty of people with simple concerns about BEST's methodology trying to talk to Mosher get responses filled with hostility and insults that doesn't address their concerns. Even worse, his responses may well contain false information. I've seen him get basic facts about BEST wrong many times.

    It’s kind of cool to get your name associated with a serious introduction to a new and ‘more’ open temp series. I think he was only involved in code though as his expertise probably doesn’t extend to CI calculations or stats in general.

    Mosher wasn't even involved in the code. The code was written before he joined. I'm not sure just what his involvement in BEST is beyond being a spokesperson.

    Stay careful right, or perhaps you might get a call from the British secret service on Saturday morning about some wildly dramatic climate-induced situation.

    I don't know if I have to worry about that. My phone can't take international calls 😛

    Mueller received a very large compliment from Steve McIntyre some years back about his intelligence and general statistical ability some years ago. On discovery of the jackknife inconsistency, I had expected an immediate recognition of a minor problem followed by some interesting improvements to the methodology. It hasn’t happened yet but someday people will either realize that the CI isn’t mathematically accurate or as you have shown, trends aren’t right, and perhaps BEST will vanish as a failed contribution to science.

    I don't think BEST will ever fix any of these issues of their own accord. There's no reason for them to. They've been accepted as the greatest temperature record out there without any critical examination. Improving their methodology would have no benefit as they've already created the platform from which they can pursue their own interests. That's why you'll see them writing things like a “Skeptic’s Guide to Climate Change” which may well be the dumbest thing I've ever seen written:


    The only way I see the BEST methodology ever improving is if someone outside BEST does it. That's a huge task though. I don't know who would do it. Just replicating their work is painful enough since it can take days to run BEST's uncertainty calculations. My computer crashed the last time I tried before I got a quarter the way through.

    I suppose I could ditch the uncertainty issues and only focus on the issues with trends, but even that is computationally intensive. I only have one computer at the moment, and its a laptop I do a lot of things on. Sacrificing its runtime to try to work out what people have been paid to work out seems wasteful.

    The best idea I have right now is to decimate the BEST data set before testing the code. That might reduce the runtime enough to make testing more viable while letting me still get a better grasp of things. I have a couple other ideas I'd like to try as well, but I'm not familiar enough with Matlab to know how to implement some of them.

  8. If it would be insane to waste your laptop's time on further iteration of the computationally obvious why would anybody want to waste theirs listening to you two obsess ? If you want to refresh your perspective rather than reinforce your belief system, why not try reading a few hundred of the more competently reviewed journals in the field-- nothing beats bandwidth !

  9. Russell Seitz, I don't understand your comment. You ask if it would be "insane" to do something, but nobody here has said anything would be insane. You say the processing would be for the "computationally obvious," but there's nothing obvious about any of this. You claim we "two obsess," but there is no obsession required to have discussions like those that have been going on here. It gets worse after that.

    If you have a point to make, I don't know what it is.

  10. The damage that "scientists" like Seitz inflicts upon science is far worse than their efforts to silence and marginalize skeptic blog authors. (Why else would a science professor at a supposed prestigious educational institution engage in the petulant adolescent behaviors evidenced on his VV-knockoff of Anthony Watt's WUWT.)

    As with any deception, when the message cannot be attacked, Seitz attacks the messenger. Brandon and Jeff attack the data and the methods, ask difficult questions when serious problems are highlighted, and then get silence from the datakeepers.

    Seitz then attacks them. The evidence of this is in Seitz's (1) trying to label them as "insane" if they persist in digging around in the deepest layers of the raw datasets where the homogenization and infilling processes happen, and (2) attempting to suggest they do not read "competently reviewed" journals, because if they did they wouldn't be skeptical. That second one is precious in light of the corrupted peer-review process exposed by the ClimateGate emails. The problem is that scientists like Seitz sold-out their skepticism for academic position and prestige as bestowed from politically- motivated ideological enablers.

    It is precisely because of efforts by the likes of Russell Seitz that the public will lose confidence in the integrity of science. Simple temperature algorithm mistakes can be forgiven, but the outright, knowing deception that is now being ramped-up to even higher levels by the practitioners at GISS, NCDC, CRU will damage the all sciences for decades.

  11. Joel O’Bryan, I can't say I know anything about Russell Seitz. The name has a vaguely familiar ring to it, but I don't know why. If he really is any sort of scientist, he should be ashamed of his comment here. Heck, he should be ashamed of his comment regardless. He should just be doubly ashamed if he's tarnishing "science" with his behavior.

    the outright, knowing deception that is now being ramped-up to even higher levels by the practitioners at GISS, NCDC, CRU will damage the all sciences for decades.

    I don't know of any "outright, knowing deception" from any of the people you refer to. I think I remember Gavin Schmidt has a fairly high level at GISS, and I've called him dishonest before, but that was tied to other issues. For the rest, as far as I know, there is no intentional deception with their results. The people just don't really know much about what they're doing/put much effort into it.

    That said, I do think there may be some deception in the same way there is with BEST. As I've shown with this post, BEST knows there are issues with its work, but it doesn't disclose them. I highlighted two examples, but there are more. That is deceptive.

  12. Brandon,

    Mosher is always saying to free the code, so he can't refuse to free test results without forfeiting his ethical high ground. Before BEST came along, I believe he was mostly interested in UHI https://stevemosher.wordpress.com/

    People forget that the "best" is relative to all competitors and not perfection. Your little deception was merely a clever way to open biased eyes. No harm, no foul -- so long as you don't gloat about it. It is to be hoped the next BEST iteration will be better because of your efforts.

    I wonder why you didn't enlist Steve McIntyre's help to make your point after the initial rebuffs.

  13. Gary, I think it became clear to me Mosher forfeits the ethical high ground when he said BEST doesn't need to make code/data available unless it was used in a peer-reviewed paper. He said that because I complained about BEST publishing results without archiving the data/code that went with those results so people could verify them. He's since defended BEST's failure to archive old data/code, saying if I cared, I would do it myself. I'm not sure how he justifies that argument to himself.

    On the issue of UHI, Mosher and I were supposed to work together on a project for it. That never happened because of interpersonal issues, mostly involving Mosher being unable to read simple sentences and refusing to admit any mistakes. It was even said we should use an arbitrator to resolve things, but he wound up balking at that. If I remember correctly, I posted about as my third post on this site... Ah, yeah. This is it. It's a shame the project never happened because the approach I proposed to testing for a UHI effect is (in my opinion) far stronger than any test used thus far. It was just impossible for the project to happen with Mosher constantly misrepresenting what I was saying.

    As for enlisting help, Steve McIntyre has a lot of other things he is interested in. I have no reason to believe he'd want to waste a lot of time and energy on examining BEST's work. I'm not sure anyone would want to.

  14. Brandon, in an honest world you wouldn't need to use deception to get people to admit the truth, however...

    You're my new goto site and I wish you all the success in your continued work, you're a rock star but not for the reason Tol thinks you are.

    My all-time favorite moment:

    Steven Mosher, I asked a question about a contradiction I found on the website. I then asked you to tell me if the website is wrong..........................................................................................................................................................................................................................If that’s the case, I’ll do it for you. And if people have questions about inconsistencies in what they read in BEST’s material, rather than being an unhelpful prick, you can direct them to me.

    Keep it up we're counting on you.


  15. MIchael Schonewille, I'm flattered!

    And sorry about your comment landing in moderation. I don't like having to manually approve people's first comments, but it's the only way I can be sure spam doesn't go through right now. I'm thinking about adding a CAPTCHA for first time commenters instead, but I haven't gotten around to it yet.

    (The reason I point this out is I'm at a bar for a dart tournament right now so it's just luck I checked my admin panel on my phone.)

  16. Keep up the good work Brandon ... within reasonable bounds I strongly feel the ends justify the means.

  17. A. Scott, I don't believe the ends ever justify the means, but that's a philosophical issue that doesn't need to be resolved here. In this case, the deception is I played dumb and pretended to make a mistake a person could reasonably make. The primary effect was to make myself look bad. That doesn't harm anyone (but myself) so it's fine. I feel a little bad about the deception, but only the way I'd feel bad if I had genuinely made the mistake.

  18. Alrighty then. I found a plugin which lets me filter first-time commenters by using a CAPTCHA. I'm not a fan of the CAPTCHA it uses, but for not having to do any work myself, it's nice enough. From now on, new users shouldn't need to wait for their first comment to be approved. The only way a first-time commenter should land in moderation is if they fail the CAPTCHA.

    We'll see if this works out. I'm a little worried spambots might beat the CAPTCHA (it's pretty easy) and get comments through. If that happens, I'll look at using a more effective one.

  19. Thank you Brandon.

    Brandon, Jeff,

    Please, Keep pursuing the data analyses... wherever it may take you. Let us (including the WUWT blogosphere) know what you find, with periodic updates, as you feel appropriate. (As you likely know, AW's WUWT gets a lot of web hits if you get him to publish your submission. If it's well written, factually-based, and hard-hitting, it can get a DrudgeReport link too, which gets Congressional staffers' notice.)

    From my point of view, and as they are US-taxpayer funded publicly-paid employees, there is no way GISS nor NCDC should be let off-the-hook for their multitude of failures and ethical lapses at data stewardship, regardless of what they think their White House science handler wants. The mere thought of a NASA-NOAA Inspector General office or Congressional oversight hearing should send a chill through the Gavin Schmidt's if they are unethically manipulating the data they are entrusted with.

    BEST is more at liberty to what they want, but since they do take US Govt grants it still must be ethical and above board. Keep them honest. As long as they know others are watching their lowest-level data algorithms and programs, they know have to be careful.

  20. Joel O'Bryan, you're welcome. I don't know of any reason to think these people are handling the data in an unethical manner, but ethics and competence are two entirely different things. Also, handling the data in an ethical manner doesn't mean one discusses the data in an ethical manner. One can do ethical research yet not be ethical in what one says about that research. (Whatever the ethics of BEST's data handling, I think their refusal to disclose issues they know exist with their work is inexcusable.)

    Anyway, I just happened upon a strange issue involving BEST. A couple weeks ago I showed there have been (at least) seven different versions of the BEST record. It's not surprising new results get published as new data comes in, but none of these different versions were archived. I think that's inappropriate. I think it's incumbent upon people to make the various versions of their results available so people can compare them.

    Today I found out there's an additional issue. You may remember me looking at BEST's results and how they relate to the state of Illinois (BEST adds a significant amount of warming to Illinois). Since then, at least two new sets of results have been published by BEST. I thought it'd be interesting to see if things had changed at all. When I did, I found this in the file for Illinois's results:

    This analysis was run on 12-Oct-2013 00:45:15

    I then found this in a file for the global land record:

    This analysis was run on 12-Oct-2013 00:45:15

    And this in a file described as showing results for "Estimated Global Land-Surface TAVG based on the Complete Berkeley Dataset":

    This analysis was run on 21-Aug-2014 03:12:47

    Apparently all the regional results on the BEST website are based on data up to September 2013 while the global results are based on data up to July 2014. Not only is that a little weird, there's a "regional" result for the entire globe meaning there are currently two different sets of global results on the BEST website.

    To make matters even more strange, when I looked at files for individual stations in Illinois, I kept seeing:

    The Berkeley Earth analysis was run on 15-Nov-2013 19:55:48

    With results going up to November 2013. I think it's weird they published results for November 2013 for calculations done on November 15, 2013, but the more interesting part to me is this date is different than the other two dates I show above. That means BEST currently has results published for three different sets of calculations on its web site, all presented as though they're from the same set of calculations. Even worse, the data used for these three calculations isn't available.

    In regard to what it publishes on its website, BEST's station results don't line up with their regional or global results. Their regional results don't line up with their global results either. None of the results can be verified by examining the data sets used to generate them because those data sets are not made available. It's silly.

    What I want is for BEST to archive the data which goes with the results it publishes, whether those results are published in a peer-reviewed literature, on their websites or in unreviewed reports. When I tried to get BEST to do so, I was straight up told they won't (by Steven Mosher).

  21. Brandon,

    I think Mosher worked on the code since initial publication. I don't think he authored any changes to the methodology though. Also, I'm not sure that BEST has been accepted in the literature very often (I still see more crutem) but with its more complete dataset, it might be in time.

    I took the time to look up Seitz. He is holds a fellowship at the Havard Physics department and runs a childish looking blog on global warming. He is apparently an over-the-top advocate with literally zero scientific perspective on the subject of CO2 warming. From Lubos Motl's blog writing, my impression is that makes Seitz pretty normal fare for "top" Harvard university talent these days. Lubos may even know the guy. Seitz type is common in that he assumes that you or I hold irrational perspectives on the subject that we don't, and that is why his comment is unintelligible. I could be more insulting but it is a waste of time talking with him.

    Eric Eich and Lewandowsky did the same to me when I first commented on their misrepresentations of my views. They simply told me that I was wrong and my views were what they said they were, not what I said they were. That sort of psycho-babble reminded me of the shrinks in horror stories who keep patients against their will, even though it is the shrink who is the crazy one. I will never understand the logical disfunction of the extremist liberal mind.

  22. Jeff Id, I don't know what Mosher has done. I just know he said the code was written before he joined. I suppose he might have made changes since then, though the way he phrased it would rule that out (but then, Mosher is not one for clarity).

    As for Seitz, that sounds about right from the little I've seen of him. I visited the URL he submitted with his comment (though he mangled it), and it is remarkably bad. I don't know if this sort of mentality is more common with liberals than conservatives, but it is definitely prevalent to a disturbing rate in a number of fields dominated by liberalism (in the United States, at least). I've met my share.

    I have to wonder what it'd be like if conservatives dominated the fields liberal currently dominate in the United States. There'd probably be as many ridiculous mentalities, but I imagine they'd at least manifest in different ways. If nothing else, conservatives seem less likely to yell at me when we disagree. Maybe it's just me, but disagreeing with liberals has seemed to get me far more abuse than disagreeing with conservatives. Conservatives seem more prone to trying to change the subject/leave. Both shut down discussions just as much, but I do like not having to worry about having bottles thrown at me.

  23. Oh, I forgot there was one other thing. I keep seeing people respond to concerns about adjustments and such by saying things in the form of, "If you remove it, the answer doesn't change." I'm not convinced that's actually true, but it raises a new concern for me. If the answer doesn't change, why do it? Why should a group like BEST go through a ton of computationally expensive processing if it has no effect?

    I suspect the answer is the answer does change. I think the only way they can make these claims is to use a vague definition of "answer" a lot of people might not agree with. For instance, someone might say concerns with BEST's methodology don't matter because they don't affect BEST's answer with the unstated assumption the "answer" is the "global temperature record." Were that assumption actually stated, many people might disagree saying, "We care about more than just the global record."

    Or perhaps it's even worse. When the hockey stick was originally criticized, criticisms were routinely dismissed with the claim the issues raised don't matter for the results. That worked because the people defending the hockey stick would only consider individual criticisms separately from one another, never as a whole. That could true here as well. You'll see people say things like, "Remove all X stations; the answer doesn't change" and, "Don't homogenize the data; the answer doesn't change." Even if they are right about those claims (and I don't know that they are), it doesn't establish neither problem matters. You cannot establish a set of problems don't matter by examining only one problem at a time.

  24. Brandon,

    I'm sure the answer does change with methodology, but I have built my own datasets from global temp and with this data, they won't change enough to really matter to me. The tweaks to the data on land temp are bigger than others believe IMO but they still will show warming. The tweaks to century scale ocean data are far bigger and more dominant but are also completely impossible to perform any effective quality control on. Not that the field hasn't done it anyway.

    I'm an engineer and am used to working with data of various quality, not sure of your background. A number is always an approximation -- even in accounting right. So BEST has some really nice sized databases. Whether the data quality matters, depends on the application you wish to use it for. I am convinced that surface temperature trends represent the data as it stands reasonably well, not that it isn't tweaked a bit. BEST could definitely be better in its simplicity of method, comparisons to simple methods with the same data, and with its totally bogus CI. The engineer in me really wants to see a simple raw average, or even better a method like this one:


    which really makes sense to me. It is hard to beat an average for accuracy.

    The hockey stick discussion is a different beast entirely. It contains actual scientific fraud on a large scale. It is an entire field of incompetence with an impact on a scale well beyond the imaginations of tweaked-up surface temperature trends. The jackknife CI problem jumped out at me with a similar clarity to high-noise proxy regression.

  25. As to the if-you-remove-it comment, Nick Stokes did a nice post on taking 60 global series repeatedly to similar effect. I do agree with your point though that it does matter, and apparently as you have shown, on a local scale temp trends can be reversed.

  26. I don't doubt the planet has warmed, but that's not the only issue that matters. Even the total amount of warming isn't the only issue which matters. Even everyone would stipulate the planet has warmed, say, .8C in the last hundred years, that's not the end of the discussion. We still need look at when it warmed and by how much. For instance, diagnosing the extent of natural variability requires us examine the planet's temperatures. The better we know the details of how the planet's temperatures have changed, the better we can separate out the anthropogenic from natural components. That's true on both the temporal scale and the spatial dimensions. Put simply, you can't discern individual components of a series if the series is too distorted.

    This was the sticking point when Steven Mosher and I were supposed to work on a UHI project together. He only wanted to look at the total magnitude of temperature changes. I said we should look at the total possible effect of UHI. UHI is known to have different effects depending upon the weather. I suggested that could lead to discernible effects based upon seasonal patterns. For instance, a rainy summer would be expected to show less UHI effects than a dry one. That could potentially affect things like interannual variability. Suppose, for instance, 1998 was a dry year with weaker than normal wind patterns (at least in some areas). That could, in theory, result in 1998's temperature being higher than it would be without UHI. That seems like something worth knowing.

    When BEST came out, it was supposed to improve the quality of the modern temperature record. It was supposed to be a big step forward. That's why I cared about it. I didn't pay much attention to the modern record before that. You won't find many comments from me on the topic except regarding BEST. That's because, before BEST, I always felt the modern temperature record was broadly accurate but greatly lacking in the precision needed to answer a lot of questions. BEST then came out, and I assumed we'd have more information and be able to figure more things out. Instead, it seems BEST has provided practically no new information, and the result is we're still stuck with nothing more than, "Temperatures have risen by ~X."

    Which is fine. Global warming is real. Humans are contributing to it. We've caused different parts of the world to warm by different amounts. We can even estimate the total amount with some vague, abstractified numerical value which removes almost all the spatial and temporal information we have. We could do that ten years ago. And just like ten years ago, we still can't explain simple things like why there hasn't been any apparent warming for the last 15 or so years.

    If people are okay with that, so be it. I wouldn't be though. If I thought global warming were a problem requiring urgent, worldwide action, I'd be calling for much better work. If I thought global warming is the greatest threat to man and were in charge of fighting it, I'd fire any scientist who told me, "We can ignore all that because the result for this one abstract calculation doesn't change."

    The incompetence and malfeasance involved in the hockey stick controversy is far worse than what goes on with the modern temperature record, but what stands out to me the most in both is the apathy. There's no great quest for the truth. There's barely even any work put into any of this. If global warming is the problem I keep hearing it is, why isn't anyone treating it like a real problem? I've suggested far more serious effort be put into studying global warming than anyone else I've ever seen, including far greater budgets and the creation of new organizations with proper oversight for fundamental things like data management. As far as I know, I'm the only one who has. I've never seen a single climate scientist propose half the effort or finance for studying global warming as I would.

    And that's a large part of why I don't believe global warming is a particularly serious threat. If it were the threat we're told it is, the people currently promoting concerns for it would be completely unsuited to handling the problem. It's like Barney Fife telling me Mike Tyson is coming, but I should let him handle it.

  27. Somebody at BEST has posted a few figures on twitter, which I'm taking the liberty to reproduce here.

    Effect of homogenization on global temperature index

    The impact of homogenization on the recent global temperature index seems minimal.
    Effect of homogenization on spatial resolution

    No breakpoint adjustments and only metadata adjustments look to have similar spatial patterns. Adding empirical seems to produce substantial smearing. As I said on moyhu,

    It does not appear that the homogenization correction produces a substantial effect, and for BEST, only applying corrections for which there is metadata does not result in a substantial loss of resolution. I do not think the "empirical adjustments" that BEST has been experimenting with are justified, and I think this step should be removed from their monthly updated product, at least until the problems it introduces are fixed.

  28. Carrick, those are interesting figures. The first one pretty much confirms everything I've been saying about BEST's breakpoint calculations and spatial smearing. The second one is less obvious, but may demonstrate something interesting. You say:

    The impact of homogenization on the recent global temperature index seems minimal.

    But this is the expected result. As BEST explains, it calculates a baseline temperature field (for each month) for the globe using the 1961-2010 period. That will necessarily force its results to be more similar in the 1961-2010 period. That means the variance in BEST's results will necessarily manifest more in the past than in the present. In effect, we shouldn't expect homogenization to have a significant effect in the 1961-2010 period because the results of that period have been fixed in place. The 1961-2010 period is, in effect, already homogenized. I demonstrated this a while back when I showed BEST's results claim there isn't a single spot in the world which has cooled since 1960.

    If we agree BEST's baseline process pushes pushes variance to the past, these results become quite interesting. Try mentally correcting for how BEST biases the variance in its results. Imagine if the differences shown in that graph were smoothed out over the entire period. The result is BEST's breakpoint calculations have a meaningful effect on their global trend calculations. The effect would likely be even worse if BEST had shown the results for their entire record, not just 1850 on. (Note how the series shown in the graph diverge more the farther back in time one goes.)

    Based on these two figures alone, I think one can make a strong case BEST's "empirical breakpoint" calculations not only introduce a significant amount of spatial smearing, but they also introduce a bias into BEST's global results which increases the apparent effect of global warming.

    Not only is that a pretty damning agrument, it's also kind of baffling. Why should "empirical breakpoints" increase the global warming trend?

  29. Brandon, I think your argument about BEST pushing the variance out of the 1960-2010 window is an interesting one. It's my impression though, that for this period, homogenization doesn't have much of an effect in any case.

    The way you test for the effect of homogenization is on the trend, since that doesn't depend on the baseline. And it's pretty easy to see that the difference in trends among the methods, 1960-2010, is not likely to gain significance.

    I think the problem with the earlier data (pre 1950) is that we have such an incomplete coverage of the globe, that the difference in the rate that land surface warms and cools at different latitudes is introducing a scaling bias in what we are euphemistically (or optimistically in some cases) referring to as "global mean temperature".

    If you really want to examine the influence of the homogenization algorithms, you really need to restrict yourself to the band of latitudes for which there sufficient coverage that the bias effect of incomplete coverage does not introduce a confounding effect.

    The other thing to keep in mind when comparing BEST to other reconstructions is that BEST is land only (so any effects you are looking at are diluted in the land+ocean series).

  30. Carrick:

    The way you test for the effect of homogenization is on the trend, since that doesn’t depend on the baseline. And it’s pretty easy to see that the difference in trends among the methods, 1960-2010, is not likely to gain significance.

    It's true the baseline of a series doesn't matter for the trend you calculate on it, but that's not what we're talking about. We're talking about the baselines used when doing the calculations to create the series. There is no inherent reason those baselines cannot affect the trend we calculate on the resulting series.

    Consider what happens when we have station records which cover different periods. The trend we get when combining these ten station records will depend, in part, on how we align them. Our choice of baseline could affect how they are aligned. As such, the choice of baseline could affect the trend we wind up with. This is exacberated if the choice of baseline affects things like breakpoint/outlier detection.

    I think the problem with the earlier data (pre 1950) is that we have such an incomplete coverage of the globe, that the difference in the rate that land surface warms and cools at different latitudes is introducing a scaling bias in what we are euphemistically (or optimistically in some cases) referring to as “global mean temperature”.

    In theory, this shouldn't be a problem because BEST explicitly assumes the correlation structure of the planet's temperatures is constant. The problem with that theory is BEST's assumption is wrong.

    I don't think BEST's homogenization should affect any scaling bias in a significant way. BEST's homogenization is supposed to be nothing more than finding breakpoints where there is a data issue. Different parts of the globe warming at different rates isn't a data issue. It shouldn't be affected.

    That said, I don't doubt BEST's homogenization does what it shouldn't do. BEST's homogenization uses the 300 nearest neighboring stations, up to 2500km away. If my geography is right, that's ~20 degrees latitude, in either direction. That'd mean you can information from a 40 degree latitude band used to estimate breakpoints for a single station.

    (It shouldn't be too much of a problem when samples are dense as these stations are weighted by distance, but when sampling is sparse, it could definitely cause issues. For instance, data from land statoins in South America and Australia may be used to calculate breakpoints in Antarctica.)

    If you really want to examine the influence of the homogenization algorithms, you really need to restrict yourself to the band of latitudes for which there sufficient coverage that the bias effect of incomplete coverage does not introduce a confounding effect.

    Why? If BEST's homogenization is being affected by scaling bias issues, that is certainly relevant. We can't ignore scaling bias as an issue because BEST should have found a way to not have it cause problems. Whether or not scaling bias ought to affect the homogenization, what matters is if it does affect the homogenization.

    The other thing to keep in mind when comparing BEST to other reconstructions is that BEST is land only (so any effects you are looking at are diluted in the land+ocean series).

    Definitely. To me, what's important about this isn't how it affects our overall knowledge. What's important about this to me is it shows BEST's work, which many people praise and use, has a significant adjustment made to it, and this was not properly disclosed. A week ago, how many people could have told you BEST's homogenization changes the amount of warming it finds by .2C or more?

    Also, how many people realized BEST's homogenization can change its results by 20% based on data issues it has no evidence exist? There are certainly undocumented data issues which could potentially be addressed by adding breakpoints, but there isn't a shred of evidence that is what BEST is finding. In fact, there's a lot of evidence that's not what BEST is finding.

  31. Brandon, what you are saying is right: I was thinking of the bias as arising just due to scaling bias. But you can have bias introduced either by scaling bias or by a secular drift in the offset value of the temperature scale.

    If there is an secular offset drift (which we can easily picture happening) prior to circa 1950 in the empirical breakpoint method, this will still show up as a bias in the trend.

    I could imagine even in a linear processing algorithm having an error term proportional to the trend. Because there is a net trend in the data, this certainly raises the specter of an accumulation in error in the temperature scale offset value over time.

    Why? If BEST’s homogenization is being affected by scaling bias issues, that is certainly relevant. We can’t ignore scaling bias as an issue because BEST should have found a way to not have it cause problems. Whether or not scaling bias ought to affect the homogenization, what matters is if it does affect the homogenization.

    Because a change in geographic distribution can produce a direct scaling bias, and if you have a scaling bias that changes over time, that will produce the same exact effect as a drift in offset associated with the homogenization algorithm.

    If you want to test the homogenization effect directly, what you do is keep the geographical distribution the same, but reduce the number of stations. If it's an homogenization error, as you have fewer stations, the error should get larger, and what I'd at least expect to see is an error in trend that scales inversely with the number of stations.

    A lot of this is much easier to do with Monte Carlo data of course. If you know what the real temperature is doing, you can track offset error and scaling error directly. And it's easy to build in different scenarios (e.g., uniform rate of warming/cooling) and see how that influences the algorithms too.

  32. Carrick, there's an interesting development on this issue. On Twitter, Zeke and I discussed just what period was used to calculate the BEST climatology field. I pointed out the BEST gridded data is given with a Readme file which says the files' climatology fields were calculated over 1951-1980 while he indicated they use a 100 year period from 1900-2000. This came up at Judith Curry's site since BEST has a blog post on it now so I said:

    It’s also not clear to me the timing of the adjustments is as meaningful as this post makes it out to be. BEST estimates its climate field over a particular period (just which period is used has been a source of some confusion). It is possible the adjustments have a greater effect in the past, at least partially, because they are further from the baseline period.

    Steven Mosher responded by saying:

    WRONG. we used a 100 baseline period here. We studied the baseline period as I told you. It makes no difference.

    It's not clear to me how he thinks this contradicts what I said at all, but his entire comment is kind of insane. For instance, he never told me they studied this issue. What Mosher told me is they studied how their choice of baseline for combining the jackknifed series affects their results, and that it does have an effect.

    Regardless, the more substantial issue is what I brought up in my response:

    Also, what you say does nothing to demonstrate what I said is wrong. You say you used a 100 year baseline, but that doesn’t contradict what I said. In fact, it makes what you and Zeke say incredibly peculiar. You both make the point the effects of homogenization are small after 1900, but if you used a 100 year period, that means the non-baseline period ends ~1900.

    According to BEST, there is little effect of homogenization in their baseline period, but there is a much more significant effect outside the baseline period. And somehow, this is supposed to rebut the idea their choice of baseline period affects where the effect of homogenization manifests...?

    It's baffling, and so is pretty much everything else BEST said in response to me on that post. For instance, I disputed the idea the effect of homogenization is "very little" given it can change the results by ~20%. I also suggested their comments on the timing of the effect of homogenization is misleading because it might be determined (at least in part) by their choice of baseline period. Zeke responded:

    The impact since 1900 is on the order of 0.05 C, not particularly large. The magnitude of this adjustment is effectively the same if you use metadata + empirical breakpoints or metadata alone (see Figure 4).

    Which in no way responds to anything I said. Steven Mosher's responses to me were even worse, with him writing what is probably the most ridiculous comment I've ever seen him write, quite an accomplishment. The ultimate result is I had something like the fifth comment on this post by BEST, raised substantial issues in a clear and direct manner, and BEST team members responded by not addressing the issues I raised, misrepresenting what I said and then ignoring me.

    I don't know how to react. What are you supposed to do if a group like BEST happily responds with complete BS, in public, when you point out legitimate issues?

Leave a Reply

Your email address will not be published. Required fields are marked *