I've owed you guys another post about the recent Gergis et al paper for a little while now. I've been held back by losing all my code written to examine to a power outage, and I'm going to be out of town for the weekend. Fortunately, there is an interesting issue I can write about today. It came to my attention due to the blogger Anders writing this comment at his site:
Hope this doesn’t the “clean exit”, but I thought I would post this figure from Gergis et als SI. It compares the main reconstruction (black) with one in which there was no screening and all 51 proxies were used (red dash) and one with no screening and using all the 36 proxies in the reconstruction domain (green dot). Doesn’t appear to be wild differences, but am not sure how the non-screening reonstructions would influence the 2SE.
I had seen this figure before, but I've been hung up on trying to replicate screening results for the paper (as well as Steve McIntyre's stated results for it) so I hadn't paid much attention to it. Anders drawing my attention to it led me down a windy and strange path.
Being familiar with the proxies used in this study, that figure seemed wrong. Naturally, I decided to try to replicate it. My first step was to read the caption for it:
Figure S1.5. Same as Figure S1.3 but comparing the PCR reconstruction from the main text (black with grey 2SE-shading; option #1 in Table 1.3) with the reconstruction using all available records (red dashed; option #6) and the reconstructions using all available records from within the reconstruction domain (green dotted, option #7). The blue dash-dotted line represents a simple average of all available records (after scaling each record to mean 0 and standard deviation 1 over the 1921-1990 period and adjusting the sign based on the correlation with the raw/undetrended instrumental target).
So all I had to do was scale each proxy and flip it so its correlation with temperatures was positive. There's some strangeness here because proxies can have a positive correlation with regional temperatures and a negative correlation with "local" temperatures or vice versa. Regardless, I was able to produce this image:
Which seemed to match the authors' results for their simple average well enough. I then overlaid the Option #1 line they show and got this:
The authors didn't mention they rescaled the lines they show in that figure, but if we do so ourselves, we find:
This matches the authors results close enough that I didn't want to worry any further. The uptick in the modern portion was guaranteed since all the proxies were (if necessary) flipped to have rising temperatures in the modern period. I don't know how much of a bias that might introduce. It doesn't really matter though. The interesting part to me was that these results which seemed wrong were right.
I spent a little time examining the data, and I soon realized why my expectations were off. You see, one proxy in the Gergis et al data set is Palmyra (scales and flipped so it has a positive correlation):
This proxy has one of the strongest upticks for the modern period of all the proxies in the Gergis et al data set. It also has significant periods without any data. Missing data like this causes odd problems when taking averages. Readers might recall I've discussed this issue in relation to work by one Steven Goddard, in which he produced strange results for the modern temperature record because he used simple averages.
For a brief summary, suppose you have three series. One measures 1, 1, 1, 1, 1. Another measures 2, 2, 2, 2, 2. A third measures 3, 3, 3, 3, 3. They're all horizontal lines. You can compare them by taking a baseline value off each (subtract 1, 2 and 3 from each series respectively). You can simply average them. Either way, you'll find you get a perfectly horizontal line when you compare the three series.
But what if you're missing data? Suppose the third series was actually: 3, -, 3, 3, 3. Obviously, your data still shows a horizontal line. That's not what a simple average shows though. A simple average shows 2, 1.5, 2, 2, 2. That's one of the reasons you shouldn't use simple averages with missing data.
Problems like that make this result I'm trying to replicate rather strange since not all proxies go back in time as the rest, but the effect is particularly noticeable with the Palmyra proxy. Because the proxy was scaled over the 1921-1990 instrumental period, it is guaranteed to have a mean of 0 (and standard deviation of 1) over the period. That means previous periods would tend to diverge more from 0 than later periods. That means the absolute value, and thus the weight when taking a simple average, would be greater. (To understand, consider how the example above would change if the series of 1s was replaced with a series of 5s).
The point of all this is Palmyra is a somewhat strange proxy to use for a reconstruction given its spotty coverage, but it is especially strange when taking a simple average. Naturally, we should check what happens if you don't include it:
Past temperatures are notably higher when taking a simple average if we don't include the Palmyra proxy. Recent temperatures are slightly lower as well. As a result, if we don't include Palmyra in our simple average past temperatures are significantly higher than present ones. To make the effect of Palmyra clearer, this is what effect adding it into the simple average has:
This isn't an earth shattering result, but I thought it was interesting. I also thought it'd give me a good reason to talk about a strange issue with this paper. Back when the original version of this paper was put online, a person noticed a number of proxies were assigned to the wrong location. He wrote:
The study is a “temperature reconstruction for the combined land and oceanic region of Australasia (0°S-50°S, 110°E-180°E)“. The study lists Palmyra Atoll as being at 6° S, 162° E, so within the study area. Wikipedia has the location at 5°52′ N, 162°06′ W, or over 2100Km (1300 miles) outside the study area. On a similar basis, Rarotunga in the Cook Islands (for which there are two separate coral proxy studies), is listed as being at 21° S, 160° E. Again well within the study area. Wikipedia has the location at 21° 14′ 0″ S, 159° 47′ 0″ W, or about 2000Km (1250 miles) outside the study area. The error has occurred due to a table with columns headed “Lon (°E)”, and “Lat (°S). Along with the two ice core studies from Vostok Station, Antarctica (Over 3100km, 1900 miles south of 50° S) there are 5 of the 27 proxies that are significantly outside the region.
Unfortunately, he missed an important detail about the paper. While it is true the original version (and the current one) claim to reconstruct temperatures for the "region of Australasia (0°S-50°S, 110°E-180°E),“ the authors clearly noted all along:
Our temperature proxy network was drawn from a broader Australasian domain (90°E–140°W, 10°N–80°S)
So these proxies are within the domain used to find data to use. It's not clear to me how well a proxy could represent temperatures over a thousand miles away. Whatever the case, one thing is certain. At least three proxies used in this data set were mislocated, and all three passed screening based on correlation to "local" temperatures. For two of the proxies, the correlation to "local" temperatures was far greater than the correlation to regional temperatures.
Now, the two Rarotonga proxies were not used simultaneously. The authors explain for each screening criterion they used, they only used the one with the highest correlation. That means while three proxies were mislocated, only two mislocated proxies were used in any case involving screening (they were both used in the simple average approach). That is still 2 out of 28 proxies.
It's possible there are even more as I don't know that anybody has checked the listed locations for each proxy. Even if there aren't though, it's bad when a full 7% of the data you use for your results has the wrong location. Maybe these proxies would have passed the screening test if they were placed in the wrong location, but how useful is "local" correlation screening if proxies are going to pass it against temperatures ~1500 miles away?