In response to my last post in this series, a user (Dr. C) asked me an insightful question. This made me happy because I had intentionally left the issue open for my next post, and that showed some people noticed.
You'll remember that post discussed sensitivity testing Michael Mann did which proved his results were not robust as he had claimed. He stored those results in a directory (infamously) titled CENSORED. Dr C. asked:
Apart from that, how does the censored directory play into this at all? According to Mann, the dataset on which the HS depends was one from Jacoby & D’Arrigo. But the CENSORED directory only included datasets from Graybill & Woodhouse.
This question is on the money. The directory Michael Mann referred to showed the results of removing twenty series from his data set. However, they do not include a series by Jacoby & D'Arrigo. As I explained to Dr. C:
As you note, the “one set of tree ring records” is from Jacoby and D’Arrigo. If you remember, my original list included an item about the (misuse of the) Gaspe series. The Gaspe series is from Jacoby and D’Arrigo. That is the series Mann referred to in that quote.
I intend to write my next post (#2.1) on the Gaspe series, and I plan to delve into this matter more in it. But basically, Mann’s test found if he removed the 19 Graybill and 1 Woodhouse series, he could still get a hockey stick thanks to the Gaspe series. That’s why it was “of critical importance in establishing the reliability of the reconstruction,” not the results of the reconstruction.
In other words, if Mann removed 20 series, he could still get a hockey stick because he included a 21st series named Gaspe. I want to discuss that Gaspe series, but before I do, it's important we understand how Mann et al handled their data.
To begin, Mann et al had 415 data series. Many of these series were from similar areas. To better extract a signal, series from similar areas were combined via a method known as principal component analysis (we'll discuss how Mann implemented it incorrectly in a later post) which reduced the total number of series used to 112. Of these, 31 were created via PCA. The other 81 series were used on their own.
Now then, some of these 112 series extended further back in time than others. To address this, Mann et al broke their temperature reconstruction into periods, using series for whatever periods they covered. If a series didn't extend back to the beginning of a period, it wasn't used in that period.
70 (out of 212) North American tree ring series covered the 1400-1450 period. They were combined via PCA into two series. Another 20 series were used along with those two (including one other created via PCA). Would you be impressed to hear Mann's hockey stick depended entirely upon 21/70 = 30% of the data used in 2/22 = 9% of the series?
What if I told you it's worse than that? You see, the Gaspe series doesn't actually extend back to 1400. It only extends back to 1404. As such, it was part of the 212 North American tree rings (as cana036), but it shouldn't have affected the 1400-1450 period at all. It shouldn't have "saved" the hockey stick at all.
So how did it, you ask? Simple. Mann cheated. If you look at the list of the 22 series used in the 1400-1450 period, you'll see one of the series is named treeline11.dat. Here's a graph of the two (black - cana036, red - treeline.dat):
As you can see, they're both Gaspe. That means Mann used Gaspe twice. But if you look closely, you can see it is worse than that. The treeline.dat line has a bit of flatness at its start absent from the cana036 version. You can verify that by checking the data file which shows exactly four years of extra data was added was added to treeline.dat - the exact amount needed to extend the series to 1400. That's the exact amount needed for it to be included in the 1400-1450 period.
Mann found if he removed 20 series, he could still get a hockey stick because of one other series. That series was an arbitrarily extended (no other series was extended like it) version of a series already included in his data set. The extension of that series was originally undisclosed, acknowledged only after Mann was forced by his critics to publish a corrigendum saying:
For one of the 12 ‘Northern Treeline’ records of Jacoby et al. used in ref. 1 (the ‘St Anne River’ series), the values used for AD 1400–03 were equal to the value for the first available year (AD 1404).
Even after it was acknowledged, no explanation was provided for the extension, and there has never been an acknowledgment of the fact the series was used twice.
Why was the series used twice? We don't know. Why was one use of the series arbitrarily extended back in time? We don't know. Why did these two unexplained decisions result in saving the hockey stick for the sensitivity testing Mann did? We don't know.
Was it fraud? I don't know. What I do know is that is a heck of a lot of coincidences which led to results Mann liked. One could certainly be excused for believing they weren't coincidences.
And for the record, 21/415 = 5%. That's how much of his data Mann says you need to remove to get rid of the hockey stick.