Is BEST Really the Best?

I've discussed concerns I have about the Berkeley Earth Project's attempt (BEST) to construct a global temperature record here a couple times. One time I wrote a post discussing how, according to BEST, the state of Illinois is so terrible at measuring temperatures it needs to have its measurements warmed by about one degree per century.

Another time I highlighted the fact BEST's changes to individual stations are based upon perceived differences so small one can barely see them if they're plotted. As I said then, I think the way BEST changes to these stations reflect nothing more than them over-fitting their data.

Today I'd like to highlight an issue which combines both of those problems.

For a bit of an introduction, I recently looked at the gridded data BEST published showing its estimates of average, monthly temperatures across the globe (available on this page). After some playing around, I decided to extract the values given for the area I live. I then did the same thing with another temperature record, NASA's GISS (available on this page).

The values aren't completely comparable as the GISS data set covers a shorter period. GISS also uses larger grid sizes. Still, I thought it'd be interesting to compare them. To keep things simple, I graphed data from both data set only for the period of overlap. This is what I got (with a five year smooth):


I'm not sure anything more needs to be said. In case I'm wrong, and the issue isn't obvious, I'll plot the same graphs, but this time I'll include trend lines:


It's amazing. From month to month and year to year, GISS and BEST look nearly identical. The high frenquency components of their graphs are indistinguishable. The only meaningful differences between the GISS and BEST estimates for my area is BEST adds a huge warming trend.

To be clear, I don't think this means BEST is fraudulently adjusting the data. I'm not Steven Goddard. I suspect what's actually happening is BEST is smearing warming from other areas into mine. That is, warming in other states is causing Illinois to seem like it's warming far more than it actually is. That's not fraud. That's just low resolution in the estimates.

But here's the thing. BEST is supposed to be the best temperature record. It has a website encouraging people to look at data on as fine a scale as individual cities. WHY?! If BEST can't come close to getting things right for the state of Illinois, why should anyone care at what it says about the city of Springfield, Illinois?

At what scale does BEST stop giving imaginary results and start getting a right answer? It doesn't at the city level. It doesn't at the state level. What about at the regional level? Could it get temperatures right for something like, say, Southeast United States? Nope.

We can't pick areas much larger than that. The Southeast United States is about half the size of Australia. It's about a third the size of Europe. If BEST can't get it right, how could it get Australia or Europe right? And that's ignoring the fact the Southeast United States has far more temperature stations (per area) than either of those!

I get almost everybody seems to agree BEST gets things right at the global scale, but couldn't we all agree there's a problem if BEST can't come close to the right answer when looking at entire continents?


  1. Brandon, I've been trying to warn them that this spatial smearing would happen from the beginning.

    You just can't assume that the temperature correlation function used for interpolation is azimuthally invariant. It's a completely terrible assumption, and will end up taking regions that have larger temperature trends (e.g. from Canada) and smearing them into regions with lower trends.

    I would expect you to see a much larger correlation East-West than North-South. Their response seemed to be … well if you think this is important you should work on it.

    My response is: I don't think yet another temperature series is that important, especially if it's going to interfere with my day job, so I'm not going to work on it, but if I did, I'd certainly want it to be done correctly.

  2. Carrick, I obviously knew the smearing was a problem, but until I looked at the gridded data, I didn't realize how notable the effects could be. Comparing GISS and BEST for my area, there are basically no differences save the warming which BEST has smeared in. I've tested it for half a dozen different areas within the United States. It was the same thing in each. That makes it seem, at least within the United States, the BEST record is just GISS + incorrect low frequency biases. That is not what we should expect from a temperature record said to be the best.

    Of course, the real test comes from looking in other areas. It may turn out BEST is useful for space-time not as well-covered by other data sets. The spatial biases it introduces would be largely offset by being able to extend the record further back in time. They might even be offset by being able to extend the record to cover new areas (though with the size of the spatial biases I've seen, I'm skeptical). In that case, BEST will be better than other temperature records in some cases while being worse in other cases.

    By the way, I got some interesting news which reminded me of you. I need to e-mail you about it though because I'm not sure if the person who passed it on to me wants it publicly discussed just yet.

  3. Brandon, I actually anticipated it would be a significant problem, burt I also think it is a very fixable problem.

    I was surprised you didn't blog the comment you made about the lack of proper versioning over on Judith's blog.

  4. Carrick, I meant to ages back, but it's wrapped up with another issue that's more complicated. Throughout the various BEST releases, the results have gone through a number of changes. There are three particular areas of change which stood out to me. Since all three arose from problems with the methodology, I wanted to write about them while discussing the lack of proper versioning.

    For instance, at one point there were seasonal cycles in temperature values which was supposed to have had those cycles removed. At some point in the future, those values no longer had those cycles. However, at that point, seasonal cycles suddenly appeared in the uncertainty values. Later, they were present in both the data and uncertainty values. I thought rather than just point out the changes, I'd be useful and track down what caused them. That's when I realized how bad the BEST process for releasing data/code/results was.

    I ought to just go ahead and write a post about the versioning issue, but before doing that, I'd need to go back over everything and make sure I know what data/code/results are currently available. I'd also need to find a bunch of pages on the Way Back Machine to show the incorrect descriptions of the BEST methodology and data they left up for months. None of that sounds appealing to me, and the topic is so dull. (I'll admit, I'm mostly in this for the lulz.)

    Besides, I'm back on the Cook et al data issue. It had gotten boring for a while, but I'm entertained by it again.

  5. "I get almost everybody seems to agree BEST gets things right at the global scale, ..."

    No. What measure indicates BEST gets it right on the global scale. (You've made a rather broad brush comment here.)

    "...there’s a problem if BEST can’t come close to the right answer when looking at entire continents?"

    Yes. One correlation function for both mountain and plain? One correlation function for the entire time period? Too know that ;O)

    "I get almost everybody seems to agree BEST gets things right at the global scale, but couldn’t we all agree there’s a problem if BEST can’t come close to the right answer when looking at entire continents?"

    Huh? The parts are wrong but the whole is OK? One had better demonstrate an exceedingly fortuitous cancellation of errors.

  6. mwgrant, it's a good point about the assumption of temporal invariance. I think a better approach would be EOF-based (like NCDC does), since you can eliminate both the assumptions of spatial and temporal invariance.

    As I commented in a different thread, NCDC does not appear to suffer from spatial smearing. They also use EOF, so that does appear to be a better methodology than kriging, at least as it's implemented by BEST.

  7. Yes!!!
    It’s amazing. From month to month and year to year, GISS and BEST look nearly identical. The high frequency components of their graphs are indistinguishable. The only meaningful differences between the GISS and BEST estimates for my area is BEST adds a huge warming trend.

    This is to be expected if the BEST scalpel process acts as a low-cut filter, leaving the most important low-frequency information on the cutting room floor. The high frequency data matches. The low-frequencies do not.

    From: Rasey: Dec. 13, 2012, 11:00 am, comment in WUWT: "Circular Logic not Worth a Millikelvin."

    My comment below takes the importance of low frequency in VPmK and focuses on BEST: Berkley Earth and what to me appears to be minimally discussed wholesale decimation and counterfeiting of low frequency information happening within the BEST process.

    My summary argument remains unchanged after 20 months:
    1. The Natural climate and Global Warming (GW) signals are extremely low frequency, less than a cycle per decade.
    2. A fundamental theorem of Fourier analysis is frequency resolution dw/2π Hz = 1/(N*dt) .where dt is the sample time and N*dt is the total length of the digitized signal.
    3. The GW climate signal, therefore, is found in the very lowest frequencies, low multiples of dw, which can only come from the longest time series.
    4. Any scalpel technique destroys the lowest frequencies in the original data.
    5. Suture techniques recreate long term digital signals from the short splices.

    6. Sutured signals have in them very low frequency data, low frequencies which could NOT exist in the splices. Therefore the low frequencies, the most important stuff for the climate analysis, must be derived totally from the suture and the surgeon wielding it. From where comes the low-frequency original data to control the results of the analysis ?

    The way BEST has treated low-frequency content in the data is not defensible. You cannot filter out low frequency content from 40,000 stations, then expect the high frequency content in a regional krigging to return trustworthy low-frequency signal.

  8. Stephen Rasey, I closed your link tag so it didn't turn several paragraphs into links. Carrick, you had an opening tag for a link but a closing tag for italics. Since your link tag didn't actually have a link, I switched it an italics tag. I'm not sure if that's what you intended, but it's the best I could do.

    As for what you guys said in your comments, I agree with both of you.

  9. Thanks, for the tag repair, Brandon.

    More and more, I am coming to base my argument on the "Chain of Custody" of the important data.

    I believe that BEST, among others, are making extraordinary claims of:
    1. Global Temperatures have never been higher:
    2. Urban Heat Island effects are minor and have been adjusted away
    3. Uncertainty is so good, we can extrapolate a few thermometers all the way back to 1750 with nothing more than 0.5 deg of uncertainty.

    "Extraordinary claims require Extraordinary evidence."
    It is they, the proponents of these claims that have the burden of proof, beyond reasonable doubt.
    As in a criminal trial, those with the burden of proof beyond reasonable doubt, must take care in the chain of custody of evidence. Failure in the Chain of Custody is a factor in raising doubt. The Defense is not obliged to fill in the gaps in a Chain of Custody of some evidence, it is sufficient to show THERE ARE gaps.

    Three years ago, long before BEST published their first results, I predicted that the low frequency information present in long temperature records would be left on the "cutting room floor." It is now high time for them to prove their Chain of Custody of original low frequency data. Their scalpel-sliced 10-year segments are incapable of possessing such low frequencies. But the end products has low frequency. So where did it come from? Is it Live or is it Memorex? It is certainly not live. So is it Memorex or is it Counterfeit - (an artifact that looks real, but is not) ?

    Additional to the low-frequency argument, I am coming to believe that they have been lax in the Chain of Custody of uncertainties. This surprises me. They boast of having good statisticians who developed this scalpel approach. But where are their uncertainty data on their intermediate products, such as the regional grids?

Leave a Reply

Your email address will not be published. Required fields are marked *