This May Be a Stupid Question

A question has been bugging me for a while. I'm hesitant to ask it because I feel I might be missing something incredibly obvious. However, after seeing the latest two posts at the blogger Anders's place, I feel I need to ask it. Please try not to be too harsh on me if it's as stupid as I worry it might be.

The latest post is a guest post by Zeke Hausfather who begins:

Much of the confusion when comparing the different versions of NOAA’s ocean temperature dataset comes down to how the transition from ships to buoys in the dataset is handled. The root of the problem is that buoys and ships measure temperatures a bit differently. Ships take their temperature measurements in engine room intake valves, where water is pulled through the hull to cool the engine, while buoys take their temperature measurements from instruments sitting directly in the water. Unsurprisingly, ship engine rooms are warm; water measured in ship engine rooms tends to be around 0.1 degrees C warmer than water measured directly in the ocean. The figure below shows an illustrative example of what measurements from ships and buoys might look like over time:

Now, this approach of simply averaging together ships and buoys is problematic. Because there is an offset between the two, the resulting combined record shows much less warming than either the ships or the buoys would on their own. Recognizing that this introduced a bias into their results, NOAA updated their record in version 4 to adjust buoys up to the ship record, resulting in a combined record much more similar to a buoy-only or ship-only record:

Here we see that the combined record is nearly identical to both records, as the offset between ships and buoys has been removed. However, this new approach came under some criticism from folks who considered the buoy data more accurate than the ship data. Why, they asked, would NOAA adjust high quality buoys up to match lower-quality ship data, rather than the other way around? While climate scientists pointed out that this didn’t really matter, that you would end up with the same results if you adjusted buoys up to ships or ships down to buoys, critics persisted in making a big deal out of this. As a response, NOAA changed to adjusting ships down to match buoys in the upcoming version 5 of their dataset. When you adjust ships down to buoys in our illustrative example, you end up with something that looks like this:

The lines are identical, except that the y-axis is 0.1 C lower when ships are adjusted down to buoys. Because climate scientists work with temperature anomalies (e.g. change relative to some baseline period like 1961-1990), this has no effect on the resulting data. Indeed, the trend in the data (e.g. the amount of warming the world has experienced) is unchanged.

Now, I'm going to leave aside that the choice of which data set one uses as the target series can in fact change one's results despite what Zeke's over-simplification might tell you (note, his charts are aren't the output of any actual analysis). The size of that effect should be small (at global levels), and it isn't really important to what's been bugging me.

You see, what's been bugging me is the very thing Zeke shows us - if two series have different average values over a period, you can remove that effect by adjusting their baseline values before combining them. With that in mind, consider that the post before Zeke's says:

The fundamental point is that it has become clear that there is a difference between the readings from ships and the readings from buoys. This discrepancy needs to be reconciled, but it doesn’t matter whether you adjust the ships to the buoys, or the buoys to the ships; ultimately anomalies will be computed. The data that is ued will be relative to a baseline, so it doesn’t matter if you move one up, or the other down.

Quotes a paper from nearly ten years ago:

Because ships tend to be biased warm relative to buoys and because of the increase in the number of buoys and the decrease in the number of ships, the merged in situ data without bias adjustment can have a cool bias relative to data with no ship–buoy bias. As buoys become more important to the in situ record, that bias can increase. Since the 1980s the SST in most areas has been warming. The increasing negative bias due to the increase in buoys tends to reduce this recent warming. This change in observations makes the in situ temperatures up to about 0.1°C cooler than they would be without bias. At present, methods for removing the ship–buoy bias are being developed and tested.

And says:

The requirement to make an adjustment because of a ship-buoy bias has, therefore, been known for almost 10 years. My understanding is that Karl et al. didn’t even actually make this adjustment, they simply included this new dataset in their analysis to compute global surface temperatures.

Based on these two posts, we can say there is a bias between the absolute temperatures measured between buoys and ships, a bias like this can be removed be re-baselining series before combining them, and it doesn't matter which series you make the change to. Given all that, I have to ask, why is there any "requirement to make an adjustment"?

I know, it sounds stupid, right? We just went over how there is a bias between the two data sets, so clearly, that bias should be addressed. It's a serious question though. To understand why I ask it, let's re-visit a post from a while back. In that post, I discussed how a person named Steven Goddard produced bogus results (which were then promoted on the floor of Congress) by ignoring how there can be biases between data series. For a simple explanation of one point:

To see the difference, suppose you and I were wanting to create a temperature record for the United States. Now suppose we only had temperatures for the area we live in. I live in Illinois, and let's say you live in... Florida. We both check the temperatures outside. I find it's 50; you find it's 70. We average that together and say the United States's average temperature is 60 degrees.

Obviously, that's not right. There are tons of areas we don't have data for. Our results aren't going to be very good. Still, they're the best we can do with what we have. Because of that, we keep repeating this process every day for the next year. But then, next year, you move to New Jersey.

Now, when you check the day's temperature, you find it is only 35 degrees. A year has passed, and since it is the same time of year as before, I again find it is 50 degrees outside. We average 50 and 35 together, and the result is 42.5 degrees. Last year we got 60 degrees; this year we get 42.5. Do we conclude the country has cooled by nearly 20 degrees?

Of course not. When you create a temperature record, you have to account for where the data comes from. One simple way of doing this is to use what are called "anomalies." Anomalies tell us how much a value has varied from some "normal" amount. If temperatures are usually ~50 degrees where I live, then today when temperatures are 50 degrees, the anomaly is 0. Tomorrow when the temperatures are 48 degrees, the anomaly will be -2.

It's easy to see how this would impact our approach. Instead of averaging 50 and 70 together when you lived in Florida, if we were both experiencing the "normal" temperature for our area, we'd both have an anomaly of 0. That'd give us the result of an average anomaly of 0.

Now, maybe that 35 degrees for New Jersey was a cold day. Maybe the "normal" temperature there would have been 40. In that case, the anomaly you would have measured is -5. Since 50 is still a "normal" day here in Illinois, my anomaly would be 0. Average those together, and the results is -2.5. That says temperatures are 2.5 degrees colder than they were before, a far smaller amount than the 17.5 degrees we got when we didn't use anomalies.

There is more to what I said back then, but the basic point is when you have many temperature series and want to combine them to see how temperatures change over time, you have to account for the fact their average values are different. That's true whether the data is temperature station data, ship data, buoy data or anything else. Just like Zeke and Anders explained, you have to account for the fact there are differences in the average value of the series you're combining.

This is a simple point. It shouldn't surprise anyone. As suchc, it shouldn't surprise anyone the paper where the adjustment in questioned originated (as Anders indicates, it wasn't made by Karl et al.) says:

The ship and buoy SSTs that have passed QC were then converted into SSTAs by subtracting the SST climatology (1971–2000) at their in situ locations in monthly resolution. The ship SSTA was adjusted based on the NMAT comparators; buoy SSTA was adjusted by a mean difference of 0.12°C between ship and buoy observations (section 5). The ship and buoy SSTAs were merged and bin-averaged into monthly “superobservations” on a 2° × 2° grid.

This paper, Huang et al. (2015), is the source of the adjustment in question. The data set described in the paper was then used in another paper, Karl et al. (2015) which got a lot of attention. That is the paper causing all the commotion at places like Ander's blog.

The thing is, Huang et al. (2015) clearly says it converted its ship and buoy data into anomalies (SSTAs) by subtracting out a baseline value from each individual series. It was only after this re-baselining that the series were combined. This means all the data series were re-baselined over the same period, then a subset of them (buoy data series) were shifted up again to account for a difference in baselines...?

I feel like I must be missing something rather obvious. Here is Huang et al. (2015) discussing how they came up with this adjustment:

In addition to the ship SST bias adjustment, the drifting and moored buoy SSTs in ERSST.v4 are adjusted toward ship SSTs, which was not done in ERSST.v3b. Since 1980 the global marine observations have gone from a mix of roughly 10% buoys and 90% ship-based measurements to 90% buoys and 10% ship measurements (Kennedy et al. 2011). Several papers have highlighted, using a variety of methods, differences in the random biases, and a systematic difference between ship-based and buoy-based measurements, with buoy observations systematically cooler than ship observations (Reynolds et al. 2002, 2010; Kent et al. 2010; among others). Here the adjustment is determined by 1) calculating the collocated ship-buoy SST difference over the global ocean from 1982 to 2012, 2) calculating the global areal weighted average of ship-buoy SST difference, 3) applying a 12-month running filter to the global averaged ship-buoy SST difference, and 4) evaluating the mean difference and its STD of ship-buoy SSTs based on the data from 1990 to 2012 (the data are noisy before 1990 due to sparse buoy observations). The mean difference of ship-buoy data between 1990 and 2012 is 0.12°C with a STD of 0.04°C (all rounded to hundredths in precision).

Notice, this doesn't say a word about series converted to anomalies. You don't see "SSTA" in this description. This means the authors estimated the size of a bias in the data series it used as input, converted all those data series to anomalies, then made an adjustment to a subset of the anomaly series (the buoy series) based upon the bias they estimated between the series before converting any to anomalies.

Why? Why is that last step necessary? I get the idea a bias exists in the data before converting the data series to anomalies. What I don't get is why does that mean we need to adjust the series which have been converted to anomalies? Why doesn't converting all the series to anomalies put them on the same baseline? I thought that was the entire point of converting them to anomalies.

I feel like I must be missing something incredibly obvious since this was done by a group of scientists, was published in a scientific journal and has been seen and discussed by at least tens of thousands of people (many of whom disliked the work). This seems like such an obvious question I don't see how nobody else would have noticed it. I just can't figure out what I'm missing.

So can someone help me out? Can someone explain to me why this is a stupid question?

32 comments

  1. I need to think about that a bit, but it seems a valid question, which may be elucidated by a better understanding of how the anomalies are actually calculated (pre, post and sans adjustment).

    However something else about this whole exercise strikes me as weirdly wrong from the get go.

    The buoys are expected to be more 'absolutely correct' in their real data (more consistent, better precision etc).
    The number of buoys has been steadily increasing (10% -> 90% of the data set) whilst the ships as a proportion of the data set are steadily decreasing.
    It is conceivable that in the near future there may be no ships at all, or at least their proportion of the data set will become vanishingly small.

    Given that scenario, regardless of the method and validity of the corrections, what would possess someone to adjust the better and growing data set to match the lower quality and proportionally reducing (and possibly time limited) data set?

    Since I consider that to be a fundamental flaw in the reasoning behind the entire process, I have tended to discount much of the temperature data manipulations as highly mystical hocus pocus, with very clever maths but little real actual value.

  2. The ship and buoy SSTs that have passed QC were then converted into SSTAs by subtracting the SST climatology (1971–2000) at their in situ locations in monthly resolution.

    Okay, I'm not completely sure about this, but I think the buoy data does not extend far enough back to produce a 1971-2000 baseline from their data alone. Therefore the SSTAs are determined from a common 1971-2000 baseline for their locations (i.e., ships measurements and buoy measurements would have the same baseline for the same location). Therefore, the SSTAs will still be biased. That's my understanding, but Zeke can probably clarify.

    I guess that expecting you to avoid things like this is a bit much to ask?

  3. Peter Green, I imagine the authors never intended this to be a final approach intended to be used for the rest of time. Once enough time had passed and buoy data became far more dominant, I'm sure they would have expected to switch the approach. I get that. I just think it would have been more sensible to either outright state the decision has no effect as Zeke appears to be indicating or display the effect so people could see how large/small it is.

    The latter was actually done by John Kennedy and his group to demonstrate the effect their choice of target series had on their results. I respect that. I also think it was helpful as it demonstrated the choice had little effect on their global results but was quite meaningful in some regional results. Not showing people that would have been wrong.

    (Which isn't to say Huang et al. did anything wrong. The difference in their methodology might ensure their results wouldn't show the same effect. Still, the fact they didn't even attempt to clearly show/state what the effect is annoys me.)

  4. Anders:

    Okay, I'm not completely sure about this, but I think the buoy data does not extend far enough back to produce a 1971-2000 baseline from their data alone. Therefore the SSTAs are determined from a common 1971-2000 baseline for their locations (i.e., ships measurements and buoy measurements would have the same baseline for the same location). Therefore, the SSTAs will still be biased. That's my understanding, but Zeke can probably clarify.

    That's an interesting idea, but there's no mention of it in Huang et al. I also don't think it could work as the authors say they calculated the ship-buoy bias based upon the data prior to taking anomalies. If there were an issue with insufficient data in the climatological period introducing a bias in how the ship/buoy data gets re-baselined, you would have to examine the anomalies to quantify it. For this idea, the bias in the un-anomalized data wouldn't be the same as the bias in the anomalized data.

    I guess that expecting you to avoid things like this is a bit much to ask?

    I hope you'll forgive me if I don't bother to look at whatever past gripe you might be talking about. I wrote a post to ask a question. I can't imagine pursuing whatever tangent you have in mind would help me find an answer to it.

  5. Brandon,
    I think they did calculate the bias before taking anomalies. They then calculated the anomalies using a common baseline, which means the bias is still present. Then they removed the bias.

    I hope you'll forgive me if I don't bother to look at whatever past gripe you might be talking about. I wrote a post to ask a question. I can't imagine pursuing whatever tangent you have in mind would help me find an answer to it.

    Not really a tangent. Just a suggestion that if you think you might want to ask someone a question in future, don't go around calling them a dick. Each to their own, of course.

  6. Anders:

    I think they did calculate the bias before taking anomalies. They then calculated the anomalies using a common baseline, which means the bias is still present. Then they removed the bias.

    A bias might still be present in the scenario you describe, but it would not be the same in the anomalized data set as in the non-anomalized data set. The anomalization process would necessarily remove part of the bias present in the non-anomalized data.

    Not really a tangent. Just a suggestion that if you think you might want to ask someone a question in future, don't go around calling them a dick. Each to their own, of course.

    I didn't ask you a question though. I asked a question of everybody. That I might have called someone a dick in the past doesn't mean I will find it more difficult to ask open questions. If someone I've said negative things about at some point in my life chooses not to answer an open question I ask, so be it.

    But this is definitely a tangent unrelated to the question I asked. As such, I'm not going to pursue it any further. You can talk about it more if you want.

  7. I have no doubt they would need to switch, I guess I would just be happier with doing it right and doing it once. I understand that isn't always possible, but in my eyes this particular series presents no apparent benefit from doing it one way vs the other, so why not just do it in the way that stands the test of time?

    Anyway, that's just one of my personal foibles, and I do want to consider your original question a bit more.

  8. Sorry, I also have a little trouble with this sentence from Zeke above:

    "Because there is an offset between the two, the resulting combined record shows much less warming than either the ships or the buoys would on their own".

    How can combining the two end up with a lesser result than either?

  9. Peter Green, without investigating what (if any) impact the decision would have on the results, I can't be sure that one choice would be "better." I can certainly see reasons to use the ship data as the target series though. Depending on the ratios of data, it's not difficult for me to believe the ship data might provide a more stable target. It would depend on how much data of each type was available and where.

    On a practical level, I can also see why one might believe inverting the choice wouldn't have prevented criticisms. I wouldn't be surprised if Skeptics would have complained about adjusting ship data prior to 1980 since we don't have any buoy data in that period.

    The other way may have been better, but I can't say I know that to be true at the moment. As it stands, I'm content not to worry about which choice was made. I'd like to know the exact effect the choice had, but beyond that, I doubt I'll pay any attention to the issue. I mean, it's already been announced they're inverting the decision for the next version of the data set.

    Anyway, that's just one of my personal foibles, and I do want to consider your original question a bit more.

    For what it's worth, there are often little things in people's work that bug me for not being optimal. I get it. I just don't happen to find this one interesting or annoying enough to think about much.

  10. Peter Green:

    How can combining the two end up with a lesser result than either?

    Both data series have the same trend but they have different baselines. Obviously, if we combine them properly, the result should have the same trend as either series. If we don't account for the different baselines, they won't. They won't because there will be a step=wise change in the average value of the two series when one series begins.

    It's the same basic issue ss station drop-out issue I discuss in the post about using anomalies for temperature station data, quoted in this post. It won't happen if all the data series cover the same periods of time. it happens because one series ends (or begins) at a different point than the other, meaning the influence of its baseline doesn't remain consistent throughout the entire period.

  11. I don't get that this would be a step change so much as a gradual bias shift over time, which I am quite happy to see resolved.

    However since I started my working life as an Instrument Fitter, performing gauge calibration, and dealing with measurement, I am used to seeing operators read an analogue gauge with 5 PSI lines calibrated to +- 3% FSD (100PSI so +- 3PSI at any point) down to .25 PSI. I see the same with temperature and many other such things.

    I am the classic case of a technician who makes something work being quite scornful of those who use it without really understanding the limits of the technology in question.

  12. Peter Green, in the real world, you'd be right. Zeke's charts weren't tfor the real world though. In his charts, there were only two series being combined. That's why there was a step change.

    I think some people will take Zeke's charts as indicating ship data is combined, buoy data is combined then the resulting series are combined. I wish he would have been more clear about what actually happens. I know he includes a note saying the graphs are for illustrative purposes only, but I think a number of his readers will fail to understand the point of the graphs. If you're going to use an overly-simplified explanation, I think you need to be more clear about how much you are simplifying things.

  13. A bias might still be present in the scenario you describe, but it would not be the same in the anomalized data set as in the non-anomalized data set. The anomalization process would necessarily remove part of the bias present in the non-anomalized data.

    Not if the baseline values are exactly the same, as I think they are. If each dataset is relative to the same baseline values (or same climatology) then the same bias will be present even after you've produced the anomalies. I don't know if this is exactly what happens, but it is my understanding.

    I didn't ask you a question though. I asked a question of everybody. That I might have called someone a dick in the past doesn't mean I will find it more difficult to ask open questions.

    You included me in a tweet that asked a question. All I'm suggesting is that you might want to consider not doing something like that if you've been going around calling one of the people you included a dick. Alternatively, don't go around calling other people dicks, but maybe you find it difficulty to not do so.

  14. Anders:

    Not if the baseline values are exactly the same, as I think they are. If each dataset is relative to the same baseline values (or same climatology) then the same bias will be present even after you've produced the anomalies. I don't know if this is exactly what happens, but it is my understanding.

    I'm having difficulty understanding what you mean. To clarify what I've said to you, your theory is there is a bias between ship and buoy data over 1990-2012 which is being adjusted for. The climatological period used to create the anomaly series for this paper is 1971-2000. That means approximately one third of the period used to create the shared baseline for ship and buoy data makes up nearly half of the period this adjustment is estimated over.

    Given that overlap in period, taking anomalies for the ship and buoy data would necessarily remove at least part of the bias one might find in any 1990-2012 ship-buoy comparison. This is particularly true since we are told the ship-buoy bias is a constant bias, affecting only absolute temperatures, not trends. There is simply no way such a bias could manifest in the 1990-2012 period yet not manifest, at least in part, in the 1990-2000 segment of the 1971-2000 climatological period used by the authors to create their anomaly series.

    Could you clarify what portion of this you disagree with?

    You included me in a tweet that asked a question. All I'm suggesting is that you might want to consider not doing something like that if you've been going around calling one of the people you included a dick. Alternatively, don't go around calling other people dicks, but maybe you find it difficulty to not do so.

    It is interesting to know you block me on Twitter yet read my tweets. However, I think it would be helpful if you learned mentioning a person in a tweet does not mean you are asking them a question even if that tweet has a question in it.

  15. As a matter of fairness, I should mention I edited my last comment to remove a couple extra words leftover from when I rearranged some text. I would like to present all readers the option to edit their comments for a short period (say 15 minutes) after submission, but I haven't found a plugin I like which would enable that.

    Hopefully I can find such a plugin before too long. In the meantime, I hope people can forgive me for cheating a bit and using my admin powers to fix a small error or two.

  16. Is it even valid to try and combine the different instrument records at all?
    It worries me that the difference between the two instrument records has been
    reduced to a single characteristic - a temperature offset. Even if this is valid,
    surely it's a property of the existing data and thus not guaranteed to be
    immutable (as new data are added)?

    Ross McKitrick did a piece on Karl shortly after it was first published:

    http://www.rossmckitrick.com/uploads/4/8/0/8/4808045/mckitrick_comments_on_karl2015_r1.pdf

  17. Hi Brandon,

    I tried to loop in Peter Thorne on twitter since he is a coauthor and would know for sure, but my reading of Huang et al suggests that unlike on land, ocean instruments (both ships and drifting buoys) move, and don't stay in a spot with a consistant climatology. Thus unlike on land anomalies are calculated with respect to the grid cell average of absolute measurements rather than on a per-instrument basis. In this case, both ships and buoys in a grid cell in a given month would have a constant 1971-2000 mean value of that grid cell subtracted from them. This would in no way eliminate the offset between ships and buoys (just subtract the same value from each and removing the seasonal cycle), so an additional correction for the offset is needed.

    If anomalies could be calculated separately for each buoy and each ship as is done with stationary land-based stations, and then those anomalies could be averaged in the gridcell, a separate ship-buoy offset correction would not be needed.

  18. To clarify what I've said to you, your theory is there is a bias between ship and buoy data over 1990-2012 which is being adjusted for. The climatological period used to create the anomaly series for this paper is 1971-2000. That means approximately one third of the period used to create the shared baseline for ship and buoy data makes up nearly half of the period this adjustment is estimated over.

    As I understand it (and as - I think - Zeke is saying) the anomalies are calculated relative to a climatology (baseline) that is the same for ships and for buoys. Therefore when you compute these anomalies, the bias is still there because you've essentially subtracted the same number from the ship data and the buoy data. That the baseline period overlaps the period when buoys started being used does not change this. It would only make a difference if the baseline for the buoys was different to the baseline for ships for the same location. I don't think it is, therefore you still need to correct for the bias.

    It is interesting to know you block me on Twitter yet read my tweets.

    Not that interesting. I noticed because someone else responded, which I did get to see in my timeline. Hence I checked to see what you were saying and largely reminded myself as to why I had blocked you in the first place. I had considered unblocking you but since it seemed likely that you would call me a dick, I didn't bother.

  19. Zeke Hausfather:

    I tried to loop in Peter Thorne on twitter since he is a coauthor and would know for sure, but my reading of Huang et al suggests that unlike on land, ocean instruments (both ships and drifting buoys) move, and don't stay in a spot with a consistant climatology. Thus unlike on land anomalies are calculated with respect to the grid cell average of absolute measurements rather than on a per-instrument basis. In this case, both ships and buoys in a grid cell in a given month would have a constant 1971-2000 mean value of that grid cell subtracted from them. This would in no way eliminate the offset between ships and buoys (just subtract the same value from each and removing the seasonal cycle), so an additional correction for the offset is needed.

    Thanks for suggesting this possibility. After reading it, I spent more time looking through Huang et al. and found it says:

    In ERSST.v3b, SSTA was calculated by subtracting the monthly climatology between 1971 and 2000 after full SSTs are bin-averaged to the 2° × 2° grid. This can result in an inaccurate SSTA in data-sparse areas in higher-latitude oceans due to coarse latitudinal resolution, since the SSTA may be partially impacted by the climatological SST if SST observations are not representative of the grid box average. Following Reynolds and Smith (1994) and Kennedy et al. (2011), SSTAs are now initially calculated at in situ locations by subtracting SST climatology interpolated to the in situ locations, and then the in situ SSTAs are bin-averaged to the monthly 2° × 2° grid

    Which I believe may support what you suggest. If I understand this correctly, it appears they combine the data series for grid cells then convert the grid cells to anomalies rather than converting the data series to anomalies before combining them. I'm not sure what to think about that offhand. Not taking anomalies for the data prior to combining it feels weird to me, but I can't say anything more than that for the moment.

    If anomalies could be calculated separately for each buoy and each ship as is done with stationary land-based stations, and then those anomalies could be averaged in the gridcell, a separate ship-buoy offset correction would not be needed.

    That's what I thought. It's good to know my understanding of the concept was right, at least. It looks like I may have just misunderstood what the authors said they had done.

  20. I have a question for Zeke or Peter: Why didn't the ERI/bucket/buoy instrumentation issue not get handled the same way MMTS transition was handled, do investigation into calibrating the instrumentation side-by-side. This is a billion (or trillion-dollar) issue. Why not have a boat travel next to a ship doing a bucket reading while the ship takes its ERI reading? Rinse and repeat (no pun). Then do the same around buoys when they are recording. A couple of papers, Matthews(2013) kind of did this sort of investigation apparently in response to Thompson(2008). Mathhews found that slight ERI warming was largely offset by the 7 meter+ average depth (cooler) sampling. Also, Matthews found that canvas buckets had a cool bias of ~0.2C due to the average observation time of 2min and the we canvas evaporative cooling rate of 0.1C/min. BTW, this was less cooling than found in Folland and Parker(1995).

    I asked Peter Thorne on his blog post about Matthews(2013) and he replied he thought they had cited the Matthews papers in Huang(2015). But I checked and there is no mention.

    It's not like the NMAT2 is pristine. Huang points out there are adjustments for ship deck heights increasing over the years. And, of course, air temperatures are less stable than water temperatures. NMAT2 is only taken at night so represents a proxy for Tmin, while SST is Tavg. So over the century if the diurnal temperature fluctuates then we must rely on unvalidated models to make another adjustment. Was that even done, BTW?

    Thanks, Ron

  21. JonA:

    Is it even valid to try and combine the different instrument records at all?
    It worries me that the difference between the two instrument records has been
    reduced to a single characteristic - a temperature offset. Even if this is valid,
    surely it's a property of the existing data and thus not guaranteed to be
    immutable (as new data are added)?

    I think the different types of data will have different properties and thus the combined result will not have the same properties as the input data. I think that's fine though. People who want to study the properties lost in the combination process can use the individual data sets. People who use the combined data set won't be able to observe certain propreties, but there are properties they can observe. It's just a matter of understanding what the combined data means.

    That said, it's always bugged me that anomalies are used to combine temperature values. A change in temperature for an area from 10C to 11C does not involve the same amount of energy as a change from 20C to 21C. That means even at the most basic level, creating a surface temperature means combining unlike things. The output is not "temperature." global Temperature records are ultimately a proxy for some value that isn't rigorously defined.

    That's not just nitpicking either. The lack of a rigorous definition for what the surface temperature record is supposed to measure is part of why we have so many issues with re-defining it. For instance, adding data from the Arctic has changed a number of results. Were earlier versions without Arctic data not measurements of "global" temperatures? Without a clear definition, there's no way to say.

    Anders:

    As I understand it (and as - I think - Zeke is saying) the anomalies are calculated relative to a climatology (baseline) that is the same for ships and for buoys. Therefore when you compute these anomalies, the bias is still there because you've essentially subtracted the same number from the ship data and the buoy data. That the baseline period overlaps the period when buoys started being used does not change this. It would only make a difference if the baseline for the buoys was different to the baseline for ships for the same location. I don't think it is, therefore you still need to correct for the bias.

    Thanks. I didn't understand you had meant a single baseline value was subtracted from each grid cell. I'm used to anomalies being created before data series are combined rather than after. I hadn't even considered that might not be what's done here. Combining series without converting them to a common baseline first seems unusual to me. Perhaps looking into it more will reveal the logic to it I guess the best idea would be to load up the data and play with it myself.

  22. My last comment raises a question for me. Does anyone happen to know if the data/code used for the Huang or Karl paper is posted online? I know the underlying data is from ICOADS, but the papers used a specific version of it and I'm not aware of any archive of past ICOADS versions. It'd be helpful if I could be sure I was looking at the same data as them. Similarly, if the code is available, I can use that to make sure I understand what's being done.

  23. Brandon, I had asked Peter Thorne for the data last week and he kindly replied with this link.

    I like your ideas of anomalies as a clearer analysis and homogenization solution. One could ask why they did not segregate the data by observing instrument and do a separate series on the internal anomalies of each. They then could combine them by fitting the baseline in the overlapping periods. If nothing else this could have been done as a quality check. Also, changes in instrumentation of each class, like insulated and uninsulated buckets, engine rooms readings versus intake pipe sensors, etc., would have to be accounted for too.

  24. Hi Ron,

    They do have side-by-side data from ships and buoys. Thats where the ~0.1 C offset comes from. ERSSTv5 will be updating this approach to use a dynamic offset calculated each month rather than a static offset for the full period. Both ERSSTv4 and v5 also assign much more weight (on average ~7x) to collocated buoy measurements over ship measurements when a gridcell has both, under the (likely correct) assumption that buoy data is much more homogenous than ship data.

    Brandon,

    If you think about it a bit, there really isn't any good alternative to calculating anomalies on the gridcell level, since having a ship-specific anomaly wouldn't make any sense given the large distances they transverse (and the fact that they aren't even always at sea).

  25. Ron Graf, thanks. It doesn't appear any code is posted there, but the data is definitely useful. I just wish my internet connection was better so I wouldn't have such a long wait on downloads. Combine that with the time working out the quirks in the data sets, and this will probably take a little while. It doesn't help that Readme files like:

    http://www1.ncdc.noaa.gov/pub/data/cmb/ersst/v4/ascii/Readme

    Are not the ideal way to provide "[i]nformation on the format of" your data. People shouldn't have to read FORTRAN to figure out the format of your data files. I'm fine with digging through code when I'm examining code, but descriptions should use words. Or at least a language that isn't as annoying to read as FORTRAN >.<

  26. Here's an odd quirk for you. For land data, it looks like they provide their input and output data. For the ocean data, it doesn't look like they provide their input data. I'm not sure why they'd treat the two differently.

    Zeke:

    If you think about it a bit, there really isn't any good alternative to calculating anomalies on the gridcell level, since having a ship-specific anomaly wouldn't make any sense given the large distances they transverse (and the fact that they aren't even always at sea).

    I get there is some sense to it, but if you don't account for different baselines before combining your data, there are all sorts of problems that can crop up. We see this with how station drop outs can influence the land record. It would seem to me combining data for grid cells then taking anomalies could introduce similar problems. At a minimum, I would have expected them to remove the ship-buoy bias before combining the ship and buoy data to create their anomalies. After all, if you say there is a bias in the baselines of the two data sets, why combine them before removing the bias?

    It's not just this one bias either. Huang et al. say:

    The ship and buoy SSTs that have passed QC were then converted into SSTAs by subtracting the SST climatology (1971–2000) at their in situ locations in monthly resolution. The ship SSTA was adjusted based on the NMAT comparators; buoy SSTA was adjusted by a mean difference of 0.12°C between ship and buoy observations (section 5).

    Meaning they combine their data twice. First, they combine it to create a climatological baseline for each grid cell. They then remove this baseline, correct for multiple biases and combine the data again. That seems very unintuitive.

  27. FWIW -

    Just wanted to post a comment as a "lurker" (on this thread). I can't come anywhere close to following the technical analysis, so I try to parse the main chunks of the exchange.

    As such, I find the "dick" talk particularly and completely irrelevant/useless, and despite that I can see the technical discussion being uselessly personalized in other ways beyond the "dick" talk also, I can also see a reasonable level of exchange between participants from various "camps" of a sort that, to this observer at least, is an extremely rare occurrence. As a non-technically capable "lurker," that is something that is useful for me in trying to achieve a larger goal of trying to tease out identity-protection/-aggression from technical exchange.

    To whatever extent [my observations as a non-technically capable lurker] might matter to the participants, (which I understand would very likely be not to any extent at all), it might be useful for the discussants to see if they find any distinguishing characteristics in this exchange that distinguish it from other exchanges they've engaged in during public exchanges in the climate-o-sphere.

  28. I just remembered the ICOADs data set is ~30GB. That's both a pain on my internet connection and a potential explanation for why it wasn't archived with the other data for this paper. It may just have been too much data. It's still a shame though. Huang et al. used version 2.5 of the ICOADS data set. It's since undergone a major update to version 3.0. That makes any attempt to examine their results difficult.

    I don't know if that change will affect the results in any significant way, but it would definitely be better to be able to work with the same data set the authors used. Perhaps they can make it available if asked.

  29. I just found out the ICOADS update to 3.0 changed its file format from IMMA0 to IMMA1. I haven't tracked down what changes that entails, but it might mean any code one uses on the current data set won't work on the data set used by Huang and Karl et al without modification. It also means the code I have from the last time I looked at this data set may not work without modification.

    I'm starting to really regret considering seeing if I can figure out what effect these baseline/bias issues have.

  30. Another change I discovered if the old ICOADS directory at the NCDC FTP server:

    ftp://ftp.ncdc.noaa.gov/pub/data/icoads/

    Is gone. The data can be retrieved from the NCEI FTP server instead:

    https://www.ncei.noaa.gov/data/marine/icoads3.0/

    I don't know when that change happened. I don't think it matters except that people who haven't looked into the data in a while may need to update their links. Another option I hadn't seen before is the NCAR site for the ICOADs data. I really like it. It's metadata viewer is especially wonderful.

    Anyway, I've started downloading the data and I'll be loading it into R after that. Given the amount of data is involved, I expect this to take quite a while. In the meantime, maybe I can find out the logic behind the approach Huang et al. used of removing biases after taking anomalies.

  31. Zeke, thanks for your visit and replies to our questions. I don't know what the limit is but I have a couple more.

    You say in your paper (congratulations by the way) that ships mostly took observations only once per day. Are you aware of any tackling of TOBS bias in ERI caused by change in time of observation time during WWII or over time?

    Was there a different time of day or frequency for bucket observations versus ERI? I realize the Argo buoys repeated readings while on the surface.

    Is there any metrics for the determined bias between Argo and insulated buckets, canvas buckets, wooden buckets, engine room reseviors, intake pipe thermometers, ship hull sensors, etc.? Or was there not any breakdown that detailed?

  32. After thinking about this a bit, I think the issue is context.

    Whilst I get that the underlying _trend_ doesn't change (at least if you merge the data properly), the absolute values of the the anomalies do change. Since all the hype is about reaching a specific level of average temp anomaly above a particular baseline, (1.5 degrees above pre-industrial as one example) the fact that the anomalies displayed change is an issue.

    Adding to that relative weighting and quality of the disparate data sets over time, and the fact that they are actually measuring different things in different places, and its really no wonder people get agitated about it.

Leave a Reply

Your email address will not be published. Required fields are marked *