2011-11-29 01:11:20Why do the Hadley surface temperatures disagree with other sources?
Kevin C



Note: This article contains original material which has not been peer-reviewed. However given the topicality of the question, it is presented here with the software so that readers may reproduce and critique the results.

Why do the Hadley surface temperatures disagree with other sources?

There are three frequently quoted global instrumental temperature records: HADCRUT from the Hadley centre, GISTEMP from NASA's GISS, and the NOAA index from NOAA/NCDC. All of them show the same long term picture, but they occasionally show differences in the short term variations. This has become particularly apparent over the last decade, with a significant divergence in the 11-year trend from 2000 to 2010:
While 11 years is too short a period to draw any firm conclusions about changes in the temperature trend (as quantified by statistical significance tests), the low 11-year trend in the HADCRUT index has often been quoted by "skeptics" to suggest that global warming has stopped (often without regard for the changes in laws of physics required for this to happen). The table immediately raises an interesting scientific question: why is the HADCRUT trend lower? This has bearing on a question of broader interest: Is the Hadley index a better or worse indicator of recent temperature trends than the other datasets?

Land and Ocean

The global instrumental temperature records are composed of two parts: land temperatures, taken from weather station measurements of air temperature 2m above the ground, and sea surface temperatures (SSTs), measured by ships and buoys. The three indices use different but overlapping datasets for the land temperatures, and also different SSTs. While it is the global index which is of primary interest, in order to understand the differences in the records, we must examine the land and ocean temperatures individually. This article will focus on the land temperatures.

In addition to the three sources already mentioned above, there is a fourth source of land temperature data, the BEST project, which has recently produced a new land temperature record based on more data and a sophisticated statistical analysis. This provides a good baseline against which to compare the Hadley data.

Apples to Apples

In order to compare two land temperature records, we need to ensure that we are comparing equivalent data. The BEST and NOAA datasets provide an average temperature calculated over the land surface only. GISTEMP provides a 'dTs' product, which is not directly comparable to these, because it is an estimate of global temperature based on land temperatures alone. The temperatures from coastal stations are used to infer the temperature of the surrounding ocean. Since the oceans have been warming more slowly, this record show less warming than the pure land-only products.

The same problem exists with the Hadley data: Hadley produce a land temperature record called CRUTEM3, however this is a (rather more crude) estimate of global temperature from land stations only. The differences from a true land-only record are twofold:

  1. The global CRUTEM3 record is not an average by land area. Rather, the hemispheric averages are calculated separately, and then averaged together with equal weight. While a reasonable procedure for estimating global temperatures, this method prevents comparison to the BEST data, because it gives a much greater weight to individual weather stations in the Southern hemisphere.
  2. CRUTEM3 is constructed by averaging temperatures in 5x5deg grid boxes, and then calculating an area weighted average over those grid boxes. The boxes are about 500km square at the equator. Any grid box with a weather station contributes equally to the average, as if it where completely occupied by land (even if 99% of the box is ocean). Coastal stations are thus often over-weighted, although the effect varies with the shape of the coastline.

To compare CRUTEM3 with the BEST dataset, both of these differences must be addressed. The first is addressed by averaging over the whole globe rather than individual hemispheres, the second by including a 'land mask' which weights each cell according to the proportion of land in that cell. The uncorrected and corrected CRUTEM3 records are compared to the BEST record in Figure 1.

Corrected CRU temperatures compared to BEST

The graph makes it clear that the CRUTEM3 record is broadly comparable to the BEST record in terms of long term averages, once a like-with-like comparison is made. However, if we look at the last decade, the records show a difference - the BEST record (along with NOAA and GISS) show continued warming, whereas the CRUTEM3 record shows a significant drop in gradient; which some have interpreted as a 'pause' in warming. What is the cause of this difference?

In 2009 the ECMWF performed a study into the differences between their own temperature reanalysis and CRUTEM, which identified the most significant difference between these records as being due to the poor coverage of the CRUTEM3 data at high latitudes. This arises from two causes:

  •  CRUTEM3 uses an constant angle (5deg x 5deg) grid, and thus isolated stations at high latitudes (where the grid boxes cover less area) have less influence than those at the equator
  •  GISTEMP by contrast allows every station an equal area of influence, determined by a circle of radius 1200km about the station. This radius was determined by examining the data from areas with good station coverage to determine how far the readings from a station could be reliably extrapolated. The weight given to a station decreases with distance from the station.

To test this result, the GISTEMP area of influence method was implemented in the re-implemented CRUTEM3 code. The resulting temperature maps for 2000/01 and 2010/01 are shown in figure 2. Note that the area of influence method gives good coverage of the global land area, whereas the simple grid calculation leaves large areas which do not contribute to the resulting temperature estimate. This is equivalent to assuming that the average of the unsampled regions is the same as the average of the sampled regions: A dubious assumption given the distribution of unsampled regions.

Temperature fields before and after including station area-of-influence

A comparison of the temperature records using these methods (CRUTEM3 land masked record, and the corresponding record with the 1200km radius of influence) are show along with the BEST data are shown for the last 3 decades in figure 3. (The data have been aligned on a period around 1980 for ease of comparison.)

Temperature record before and after including station area of influence (60 month average)

The result is slightly more complex than expected: The CRUTEM3 data, when coupled with the GISTEMP-style area of influence calculation, gives a result which is significantly closer to the BEST result over the past 30 years. The problem of poor sampling in the CRUTEM3 method leads to a larger inflation in temperatures around 2000, coupled with a smaller suppression of recent temperatures. Both of these errors contribute to an underestimation in the temperature trend over the last decade.

In other words the supposed 'pause' in warming over the last decade in the CRUTEM3 data is due to poor sampling, arising from a 'double whammy' of a large overestimation of temperatures at the beginning of the decade, and a smaller underestimation recently. This can be seen in the temperature maps shown in figure 2: In 2000 the missing regions (especially Africa and Antarctica) in the CRUTEM3 record are predominantly colder than normal, and thus CRUTEM3 overestimates the global temperatures during this period. However in 2010 the missing regions in the CRUTEM3 record are predominantly hotter, and thus CRUTEM3 underestimates the global temperatures.

The resulting land-only temperature trends from 2000/01 to 2010/03 (the last complete month in the BEST data) are as follows:

CRU data: CRU method+land mask 0.17 C/decade
CRU data: 1200km area of influence 0.31 C/decade
BEST 0.27 C/decade

The standard uncertainties on these trends are around 0.25 C/decade, and so the numbers tell us very little about the underlying trend. However they do highlight the fact that the apparent 'pause' in the CRUTEM3 trend arises from poor sampling.


  • When using a land mask and an appropriate area of influence for each station, the CRU data reproduces the features of the BEST record well over the past 30 years.
  • The supposed 'pause' in warming in the CRUTEM3 record is indeed associated with poor sampling, particularly at higher latitudes.
  • A large part of this effect is due to CRUTEM3 overestimating temperatures around 2000, in addition to a smaller underestimation of recent temperatures.
2011-11-29 01:20:26
Kevin C


Not very happy with this yet. Not even sure if it is really SkS material. Comments welcome.

2011-11-29 02:15:17


If this material could benefit from peer review, we should let that happen first. SkS is not a scientific journal.

2011-11-29 04:02:39Comment
Robert Way


this might be interesting too


2011-11-29 09:59:03
Tom Curtis


1)  In your graphs, you always show BEST (-0.15C).  I assume this means you have subtracted 0.15C from the BEST data to bring the two records into alignment.  If so, this needs to be explained in the text, and a reason given.


2)  It is not clear to what extent the change in trend for CRU + Land Mask + 1200 km radius is due to the equal weighting of arctic and tropical stations, and to what extent it is due to eliminating areas with no coverage (which hence show a trend equal to the global land area average).  These are distinct effects and should be treated seperately IMO.  This could be done by, as an intermediate step, using a 500 km x 500 km grid globally, or by using a 250 km radius extrapolation equivalent to the secondary GISS product.  You would then finish the analysis with the 1200 km radius extrapolation as currently done.


3)  I certainly consider this article suitable for SkS, particularly with the ammendments above, but


4)  This is probably suitable material for peer reviewed literature.

If you wish to submit the material to a peer reviewed journal, the journal may expect some analysis of the various choices made beyond their meer effects, ie, which methods are most justified methodologically.  I think that is a no brainer except for the 1200 km extrapolation, which is superior but not without problems.  An analysis of extrapolation methods using BEST station data in extapolated cells to cross check would be very interesting.  A journal may also expect a more comprehensive analysis, including SST analyzed seperately, and then together with land in the global products.  If you are not interested in the later (which I am sure involves a lot of work), you may want to consult with somebody in the field (Hansen, or Jones) about how best to approach it for publication.

Needless to say, if you wish to publish the article in the peer reviewed literature first, your publication on SkS should be delayed until the peer reviewed paper has been published.

2011-11-29 22:28:55
Kevin C


Thanks guys, those comments are really helpful. I'll write a longer response later, but the short story is this: I don't think the post is sufficiently compelling at the moment for a couple of reasons, and I've got a couple more ideas which might be stronger. I'll try working those up instead.

One other development: The BEST guys got back to me. They have also concluded that CRUTEM3 needs to be land masked to be comparable to BEST. So their comparison graph is going to change again. I can mock up what it is going to look like if it is of interest.

2011-11-30 00:49:49
Tom Curtis


Speaking for myself, very much of interest.

2011-11-30 03:00:37
Kevin C


Responding to previous points:

Neal: My concern precisely. SkS has posted non-peer-reviewed stuff in the past, e.g. from Tamino or Caerbannog's temperature recon. But it shouldn't be the norm, and it needs to be very compelling. I don't think this post reaches that bar for 2 reasons:

1. The graphics aren't compelling enough.

2. More importantly, it is HADCRUT which is interesting, not CRUTEM3.

Publishing is one route, Tom outlines some of the work needed to bring up to publishable standard. In addition to his points, I'd need to do some forensics on the Hadley code to iron out the final small differences, and lots of analysis. I estimate 2 months full time work - I have ~2hrs/week, so it's not going to happen.

Robert: Thanks, I knew about the ECMWF study, and there's stuff in the most recent GISTEMP paper too, but I hadn't see the RC stuff. It's very good. I don't think any of them have addressed the audience I was trying to write for though. The one thing no-one has tried is a 'fixed CRU'. (Actually, I think AR4 had an optimal averaging applied to the CRU data. I wish they'd publish that monthly!)

Tom: See notes to Neal above. On your questions:

1. Yes, the baseline stuff would need to be made explicit. I looked for the region with greated similarity and aligned on that rather than a standard period. But it needs to be done objectively and explained.

2. Indeed, I calculated equal-area mask results too, but omitted them to keep the story simple. If I were going ahead I would attach a PDF with a more complete set of graphs and descriptions. See the figure below for a comparison.

3 and 4 dealt with above.


2011-11-30 03:06:38
Kevin C


Here is the BEST and NOAA data, plus the properly land-masked CRU. The CRU data is now on the BEST baseline (1951-1980), which is why it doesn't look quite the same as the figure in my first post.



This is 120 month moving average as per BEST. I forgot to delete the last two months, so ignore the little dip at the end of the BEST plot.