A question has been bugging me for a while. I'm hesitant to ask it because I feel I might be missing something incredibly obvious. However, after seeing the latest two posts at the blogger Anders's place, I feel I need to ask it. Please try not to be too harsh on me if it's as stupid as I worry it might be.
The latest post is a guest post by Zeke Hausfather who begins:
Much of the confusion when comparing the different versions of NOAA’s ocean temperature dataset comes down to how the transition from ships to buoys in the dataset is handled. The root of the problem is that buoys and ships measure temperatures a bit differently. Ships take their temperature measurements in engine room intake valves, where water is pulled through the hull to cool the engine, while buoys take their temperature measurements from instruments sitting directly in the water. Unsurprisingly, ship engine rooms are warm; water measured in ship engine rooms tends to be around 0.1 degrees C warmer than water measured directly in the ocean. The figure below shows an illustrative example of what measurements from ships and buoys might look like over time:
Now, this approach of simply averaging together ships and buoys is problematic. Because there is an offset between the two, the resulting combined record shows much less warming than either the ships or the buoys would on their own. Recognizing that this introduced a bias into their results, NOAA updated their record in version 4 to adjust buoys up to the ship record, resulting in a combined record much more similar to a buoy-only or ship-only record:
Here we see that the combined record is nearly identical to both records, as the offset between ships and buoys has been removed. However, this new approach came under some criticism from folks who considered the buoy data more accurate than the ship data. Why, they asked, would NOAA adjust high quality buoys up to match lower-quality ship data, rather than the other way around? While climate scientists pointed out that this didn’t really matter, that you would end up with the same results if you adjusted buoys up to ships or ships down to buoys, critics persisted in making a big deal out of this. As a response, NOAA changed to adjusting ships down to match buoys in the upcoming version 5 of their dataset. When you adjust ships down to buoys in our illustrative example, you end up with something that looks like this:
The lines are identical, except that the y-axis is 0.1 C lower when ships are adjusted down to buoys. Because climate scientists work with temperature anomalies (e.g. change relative to some baseline period like 1961-1990), this has no effect on the resulting data. Indeed, the trend in the data (e.g. the amount of warming the world has experienced) is unchanged.
Now, I'm going to leave aside that the choice of which data set one uses as the target series can in fact change one's results despite what Zeke's over-simplification might tell you (note, his charts are aren't the output of any actual analysis). The size of that effect should be small (at global levels), and it isn't really important to what's been bugging me.
You see, what's been bugging me is the very thing Zeke shows us - if two series have different average values over a period, you can remove that effect by adjusting their baseline values before combining them. With that in mind, consider that the post before Zeke's says:
The fundamental point is that it has become clear that there is a difference between the readings from ships and the readings from buoys. This discrepancy needs to be reconciled, but it doesn’t matter whether you adjust the ships to the buoys, or the buoys to the ships; ultimately anomalies will be computed. The data that is ued will be relative to a baseline, so it doesn’t matter if you move one up, or the other down.
Quotes a paper from nearly ten years ago:
Because ships tend to be biased warm relative to buoys and because of the increase in the number of buoys and the decrease in the number of ships, the merged in situ data without bias adjustment can have a cool bias relative to data with no ship–buoy bias. As buoys become more important to the in situ record, that bias can increase. Since the 1980s the SST in most areas has been warming. The increasing negative bias due to the increase in buoys tends to reduce this recent warming. This change in observations makes the in situ temperatures up to about 0.1°C cooler than they would be without bias. At present, methods for removing the ship–buoy bias are being developed and tested.
The requirement to make an adjustment because of a ship-buoy bias has, therefore, been known for almost 10 years. My understanding is that Karl et al. didn’t even actually make this adjustment, they simply included this new dataset in their analysis to compute global surface temperatures.
Based on these two posts, we can say there is a bias between the absolute temperatures measured between buoys and ships, a bias like this can be removed be re-baselining series before combining them, and it doesn't matter which series you make the change to. Given all that, I have to ask, why is there any "requirement to make an adjustment"?
I know, it sounds stupid, right? We just went over how there is a bias between the two data sets, so clearly, that bias should be addressed. It's a serious question though. To understand why I ask it, let's re-visit a post from a while back. In that post, I discussed how a person named Steven Goddard produced bogus results (which were then promoted on the floor of Congress) by ignoring how there can be biases between data series. For a simple explanation of one point:
To see the difference, suppose you and I were wanting to create a temperature record for the United States. Now suppose we only had temperatures for the area we live in. I live in Illinois, and let's say you live in... Florida. We both check the temperatures outside. I find it's 50; you find it's 70. We average that together and say the United States's average temperature is 60 degrees.
Obviously, that's not right. There are tons of areas we don't have data for. Our results aren't going to be very good. Still, they're the best we can do with what we have. Because of that, we keep repeating this process every day for the next year. But then, next year, you move to New Jersey.
Now, when you check the day's temperature, you find it is only 35 degrees. A year has passed, and since it is the same time of year as before, I again find it is 50 degrees outside. We average 50 and 35 together, and the result is 42.5 degrees. Last year we got 60 degrees; this year we get 42.5. Do we conclude the country has cooled by nearly 20 degrees?
Of course not. When you create a temperature record, you have to account for where the data comes from. One simple way of doing this is to use what are called "anomalies." Anomalies tell us how much a value has varied from some "normal" amount. If temperatures are usually ~50 degrees where I live, then today when temperatures are 50 degrees, the anomaly is 0. Tomorrow when the temperatures are 48 degrees, the anomaly will be -2.
It's easy to see how this would impact our approach. Instead of averaging 50 and 70 together when you lived in Florida, if we were both experiencing the "normal" temperature for our area, we'd both have an anomaly of 0. That'd give us the result of an average anomaly of 0.
Now, maybe that 35 degrees for New Jersey was a cold day. Maybe the "normal" temperature there would have been 40. In that case, the anomaly you would have measured is -5. Since 50 is still a "normal" day here in Illinois, my anomaly would be 0. Average those together, and the results is -2.5. That says temperatures are 2.5 degrees colder than they were before, a far smaller amount than the 17.5 degrees we got when we didn't use anomalies.
There is more to what I said back then, but the basic point is when you have many temperature series and want to combine them to see how temperatures change over time, you have to account for the fact their average values are different. That's true whether the data is temperature station data, ship data, buoy data or anything else. Just like Zeke and Anders explained, you have to account for the fact there are differences in the average value of the series you're combining.
This is a simple point. It shouldn't surprise anyone. As suchc, it shouldn't surprise anyone the paper where the adjustment in questioned originated (as Anders indicates, it wasn't made by Karl et al.) says:
The ship and buoy SSTs that have passed QC were then converted into SSTAs by subtracting the SST climatology (1971–2000) at their in situ locations in monthly resolution. The ship SSTA was adjusted based on the NMAT comparators; buoy SSTA was adjusted by a mean difference of 0.12°C between ship and buoy observations (section 5). The ship and buoy SSTAs were merged and bin-averaged into monthly “superobservations” on a 2° × 2° grid.
This paper, Huang et al. (2015), is the source of the adjustment in question. The data set described in the paper was then used in another paper, Karl et al. (2015) which got a lot of attention. That is the paper causing all the commotion at places like Ander's blog.
The thing is, Huang et al. (2015) clearly says it converted its ship and buoy data into anomalies (SSTAs) by subtracting out a baseline value from each individual series. It was only after this re-baselining that the series were combined. This means all the data series were re-baselined over the same period, then a subset of them (buoy data series) were shifted up again to account for a difference in baselines...?
I feel like I must be missing something rather obvious. Here is Huang et al. (2015) discussing how they came up with this adjustment:
In addition to the ship SST bias adjustment, the drifting and moored buoy SSTs in ERSST.v4 are adjusted toward ship SSTs, which was not done in ERSST.v3b. Since 1980 the global marine observations have gone from a mix of roughly 10% buoys and 90% ship-based measurements to 90% buoys and 10% ship measurements (Kennedy et al. 2011). Several papers have highlighted, using a variety of methods, differences in the random biases, and a systematic difference between ship-based and buoy-based measurements, with buoy observations systematically cooler than ship observations (Reynolds et al. 2002, 2010; Kent et al. 2010; among others). Here the adjustment is determined by 1) calculating the collocated ship-buoy SST difference over the global ocean from 1982 to 2012, 2) calculating the global areal weighted average of ship-buoy SST difference, 3) applying a 12-month running filter to the global averaged ship-buoy SST difference, and 4) evaluating the mean difference and its STD of ship-buoy SSTs based on the data from 1990 to 2012 (the data are noisy before 1990 due to sparse buoy observations). The mean difference of ship-buoy data between 1990 and 2012 is 0.12°C with a STD of 0.04°C (all rounded to hundredths in precision).
Notice, this doesn't say a word about series converted to anomalies. You don't see "SSTA" in this description. This means the authors estimated the size of a bias in the data series it used as input, converted all those data series to anomalies, then made an adjustment to a subset of the anomaly series (the buoy series) based upon the bias they estimated between the series before converting any to anomalies.
Why? Why is that last step necessary? I get the idea a bias exists in the data before converting the data series to anomalies. What I don't get is why does that mean we need to adjust the series which have been converted to anomalies? Why doesn't converting all the series to anomalies put them on the same baseline? I thought that was the entire point of converting them to anomalies.
I feel like I must be missing something incredibly obvious since this was done by a group of scientists, was published in a scientific journal and has been seen and discussed by at least tens of thousands of people (many of whom disliked the work). This seems like such an obvious question I don't see how nobody else would have noticed it. I just can't figure out what I'm missing.
So can someone help me out? Can someone explain to me why this is a stupid question?