Failure to Replicate

Things went well yesterday, but the painkillers are making my head a little fuzzy. As such, I figured it'd be a good time to write up something I probably should have written up a while back. You see, over a month ago Steve McIntyre wrote this about the recent Gergis et al paper:

Gergis et al 2016 stated that they screened proxies according to significance of the correlation to local gridcell temperature. Law Dome d18O not only had a significant correlation to local temperature, but had a higher t-statistic (and correlation) to local instrumental temperature than:
24 of the 28 proxies retained by Gergis in her screened network;
either of the other two long proxies (Mt Read, Oroko Swamp tree ring chronologies);
Nonetheless, the Law Dome d18O series was excluded from the Gergis et al network. Gergis effected her exclusion of Law Dome not because of deficient temperature correlation, but through an additional arbitrary screening criterion, which excluded Law Dome d18O, but no other proxy in the screened network.

This was a serious accusation he and I had actually discussed in e-mails before he wrote that post. As I told him in those e-mails, I couldn't find a way to replicate his results. I asked him to confirm the data he was using matched what I was using, but that didn't happen. When he wrote the post, I asked again. I asked again later via e-mail, again without success.

Mind you, McIntyre never said, "No," and I think he does intend to do this eventually. I tried to be patient, but given the seriousness of McIntyre's accusations and how they appear to be completely wrong, I think waiting over a month is more than sufficient.

The central claim for McIntyre's post was:

For Law Dome d18O over 1931-1990 for the central gridcell at lag zero i.e. without any Gergian data mining or data torture, using the HadCRUT3v version on archive, I obtained a detrended correlation of 0.529, with a t-statistic of 4.71 (for 37 degrees of freedom after allowing for autocorrelation using the prescribed technique). This was one of the highest t-statistics in the entire network, higher than 24 of 28 proxies selected into the screened network and higher than both long proxies included in the network. It also met any plausible criterion of statistical significance.

This was what I found impossible to replicate. I was dismayed when McIntyre posted this despite being aware I had been unable to replicate his results, particularly as his post goes on to say:

So how (and why) did Gergis screen out Law Dome?
Gergis excluded Law Dome through the following, seemingly innocent, additional screening criterion:

This comparison [of detrended proxy to detrended instrumental data] was only performed for cells containing at least 50 years of data between 1921 and 1990.

Because a t-statistic test already allows for the number of observations in determining significance, there isn’t any need for this restriction. The choice of 50 years excluded Law Dome d18O, but did not impact any other proxy included in the G16 screened network. Gergis did not provide any justification or explanation for the choice of 50 years (as opposed to 60 years or 35 years), nor am I aware of any principle that would justify this particular choice.

And says this in its conclusion:

The Law Dome d18O series has a stronger statistical relationship to gridcell temperature than 24 of 28 “passing” proxies or either of the long tree ring series used as long proxies in Gergis et al 2016. It was excluded from the Gergis et al network based on a additional arbitrary screening criterion that excluded Law Dome without impacting any other proxies in the screened network. It is not known whether Gergis et al intentionally added the additional screening criterion in order to exclude Law Dome or whether the criterion had been added without fully understanding the ramifications, with the exclusion of Law Dome being merely a happy coincidence. In either case, the exclusion is not robust. And because the Gergis et al 2016 reconstruction (R28) is based on only two proxies in its early portion, neither are its various reconstructions. The impact will be particularly felt on the R2 and R3 reconstructions, which have only two and three proxies respectively.
Just another day of data torture by the paleoclimate “community”.

These are serious claims. Personally, I wouldn't be comfortable making them in the face of a person having cast doubt on the underlying calculations. I would first make some minimum effort to figure out why our results didn't match. For whatever reason, that didn't happen.

I discussed this a couple weeks ago over at The Blackboard, beginning with this comment where I asked a user if they could reproduce McIntyre's results. Another user, HaroldW, chimed in and we had a fruitful discussion. While in an early comment he wrote:

For local temperature, I’m using the HadCRUT3v gridcell with NW corner at -65S,110E. Temperatures are present from Jan 1957 on. So I don’t agree with your contention that there are more than 50 years of data in 1921-90. Assigning the temperature average for the 1957-58 summer (SOND’57,JF’58) to 1958 — per McIntyre’s comment — I get a correlation with the Law Dome proxy slightly above 0.54, not quite McIntyre’s stated 0.529.

This comment, and a later comment:

A quick update: I included all years with at least partial winter observations, and the 1931-90 correlation of (detrended) local temperatures and Law Dome d18O now stands at r=0.530, very close to McIntyre’s reported 0.529. The difference probably is due to my using a HadCRUT3v dataset from mid 2012. (Saved when I was looking at Gergis et al. 2012.)

Made it seem like McIntyre's results may have been correct, suggesting I was just doing something wrong. To try to resolve things, I posted some not tidy code to let people reproduce my results. I'll repeat it here.

The first step is to read the data. I’m assuming you already have the NCDF file from the link in my comment above in your working directory.

HAD = nc_open(“HadCRUT3v.nc”)
t = ncvar_get(HAD, “temperature”)
lon = ncvar_get(HAD, “longitude”)
lat = ncvar_get(HAD, “latitude”)

This gives you the full, gridded temperature data set in the variable t. It has three dimensions, with the first being longitude, the second being latitude and the third being temperature. We need to figure out which grid cell to use, so we figure the center of the grid cell for the proxy should be 112.5E, -67.5N. This gives:

lo = which(lon == 112.5)
la = which(lat == -67.5)
dump = ts(t[lo,la,], freq=12, start=1850)

The first two lines there figure out the grid cell to use. I’ve started using that approach to make sure I don’t make an error when manually looking up coordinates. The third line extracts the data for that grid cell and puts it in a time series so it’s easy to examine.
I know there should be a simple way to extract the winter data from here, but I couldn’t think of it so I wrote a quick loop:

scratch = NULL
for (i in 1:70){
scratch[i] = mean(dump2[c(9:14)+i*12 – 12], na.rm=TRUE)
}

Perhaps someone more experienced with R can tell me how terrible this is and provide a better solution. Regardless, this leaves you with 70 winter averages for the grid cell in question. You can then download the proxy table from Gergis et al, or as I prefer, just copy it to your clipboard and read it in. Either way, you can then run a correlation test on the unscreened data:

raw = read.table(“clipboard”, sep=”;”, header=TRUE)
cor.test(scratch, raw[923:992,3], na.rm=TRUE)

923 and 992 correspond to the years 1921 and 1990. It would be tidier to assign these to variables, but since I was testing different lags, this was quicker. Definitely something which can be improved, but it gives the results:

data: scratch and raw[923:992, 3]
t = 2.3298, df = 35, p-value = 0.02571
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.04810524 0.61715980
sample estimates:
cor
0.366413

We then perform a linear fit on these two series and rerun the correlation test on their residuals to produce a detrended correlation:

lm_t = lm(scratch ~ c(1:70))
lm_p = lm(raw[923:992,3] ~ c(1:70))
cor.test(lm_t$residuals, lm_p$residuals[as.numeric(names(lm_t$residuals))], na.rm=TRUE)

Because of missing values in the instrumental temperature record, we have to subset the proxy residuals to only include matching years. The code: as.numeric(names(lm_t$residuals))) produces a numerical index with the appropriate subset. The results this produces are:

data: lm_t$residuals and lm_p$residuals[as.numeric(names(lm_t$residuals))]
t = 2.3475, df = 35, p-value = 0.02468
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.05088891 0.61888447
sample estimates:
cor
0.3688263

I know this code isn’t turnkey or pretty, but it should produce the same results for anyone. If it’s needed, I can tidy up the code and include parts to handle the data downloads. I’m hoping someone will just spot an error in this somewhere though.

Shortly after posting the code I realized I had left out one line and added:

data for the grid cell as a time series. We only want to look at correlations over the 1921-1990 period though, so I create the dump2 variable:
dump2 = window(dump,1920,1991)
The reason it says 1920 instead of 1921 is assigning September, October, November and December to the next year. So 1921’s winter has four months that were measured in 1920. Obviously, without that line the code won’t work as you won’t have a dump2 variable to take the averages from. I really should create a tidied script with more descriptive names.

HaroldW and I were then able to reconcile our results, with him realizing a problem had been in the data file he was using:

Haven’t tried your R code yet, but I re-ran my Excel spreadsheet with the latest HadCRUT3v file. (https://crudata.uea.ac.uk/cru/data/crutem3/hadcrut3v.zip) I obtained a detrended correlation of 0.369581, close to yours. [Perhaps a difference between Excel’s & R’s regression.] So your script is probably accurate, if not pretty. 🙂
Then I went back to look at what I had done earlier, and realized that I had been using HadCRUT3, not HadCRUT3v. Apparently there’s a significant difference between the two datasets for this gridcell. Among others: as you noted, HadCRUT3v lacks values for May-Sept inclusive, while HadCRUT3 does provide values for those months.

As this shows, one doesn't need to create tidy and turnkey code to let people check your work. Indeed, simply looking at the data we ran our tests on was enough to tell we were using different data sets. HaroldW refers to the fact the HadCRUT3v data set is missing values for May through September, something I originally pointed out to Steve McIntyre. If the data set he was using wasn't missing those values, the problem should have been readily apparent.

I don't want to editorialize on this, and I cannot be certain McIntyre used the HadCRUT3 instead of the HadCRUT3v data set like Gergis et al said they used. I do, however, want to point out I find it troubling my discussion of this paper has been held up for over a month because these results were impossible to replicate even though dozens of other people have freely discussed them.

It seems a poor state of affairs when serious and false accusations that are easy to check won't get resolved for months. I won't guarantee these accusations are false, but it seems obvious that if not for me, nobody would have bothered to try to find out if they were.

For those who want to check the data themselves, you can find both HadCRUT3 and HadCRUT3v at the first link below, with the proxy data Gergis et al used at the second link (you'll need to use the raw proxies file to examine the Law Dome proxy):

https://crudata.uea.ac.uk/cru/data/crutem3/
http://www1.ncdc.noaa.gov/pub/data/paleo/reconstructions/gergis2016/

As a final note, I should point out I haven't been able to replicate Gergis et al's results either.

22 comments

  1. For a simple explanation of this issue, Steve McIntyre said Gergis et al used an arbitrary standard which excluded only a single proxy, a proxy which had a strong correlation to temperatures for its area and an inconvenient signal for the authors of the paper (high MWP values). This post shows when you use the correct data, that doesn't appear to be the case. And as HaroldW pointed out, using a different (but similarly named) data set gives results like those McIntyre posted. The implication is McIntyre made rather serious claims based upon him mistakenly using the wrong data set.

    By the way, when I accounted for autocorrelation, the Law Dome proxy ceases to pass the correlation tests. It's possible I did the calculations incorrectly, but if my results are right , this proxy wouldn't have been used regardless of the requirement there be 50 years of data in the screening period.

    Either way, this proxy simply doesn't appear to have the strong correlation to local temperatures McIntyre claims it has if you use the correct data set.

  2. Not commenting specifically on your work above (I don't know R well enough), but in starting to look into the data sources, I came across this line in the abstract section of the readme-gergis2016.txt:

    "The reconstructed twentieth-century warming cannot be explained by natural variability alone using GISS-E2-R."

    I wonder what explanations for twentieth-century warming could be arrived at when not using a model. Isn't this then yet another case of the model being seen as the source of all truth?

  3. If I were Steve Mc I would be recruiting Brandon S and Kenneth F and others to take up his craft on reconstructions at CA. He gives Nic Lewis the reigns for climate sensitivity and land record issues now. I hope he contacts you. Keep trying.

  4. I try to be careful in my analyses and suspect that differences may come from datasets or missing data. One of the reasons why I support code documentation is to resolve this sort of dispute. At the time of your initial request, I told you that I was getting ready to go to Europe for a week and that my computer had crashed and that I needed to reconfirm my script. I forgot about your request when I returned. Sorry about that, but you could also have reminded me, as I think that I have a very good track record of responding to requests and ensuring that results are documented.

    In this case, as noted above, I got sandbagged by a computer crash, and need to crosscheck the version on my computer against my results. I still need to do this before responding.

    When I got back from Europe, I made a resolution to finish a lengthy submission on Lewandowsky on which I'd been working off and on for a couple of years, but which I hadn't finished. Because I get tired quickly these days, I put other issues on the back burner and apologize for that. I would like to work on this undisturbed for a little while longer and would prefer not to revisit Law Dome for a couple of weeks if everyone doesn't mind waiting a while longer. I do not believe that there are any material issues with my Law Dome analysis, but will undertake to quadruple check, together with the relevant code.

  5. Steve McIntyre, while you may have preferred I remind you of this issue, the reality is I raised these same concerns well before you wrote your post. Your claim to have been preparing for a trip at the time of my "initial request" is wrong. My "initial request" came before you even wrote your post. You chose not to address it or make any effort to reconcile our results prior to making strong claims in public. As a side note, I'll point out this post clearly referred to me raising this issue before you published your post. I am not sure why you are seemingly unaware of the very simple history of my remarks on this issue.

    It is true I could have reminded you of the issue after your trip, raising it with you for the fourth time, but given you chose to make this issue public without attempting to reconcile our results, I don't think you have any room to complain. I would have much preferred we resolve this issue before you wrote your blog post, as then we could have refrained from having false claims be made in public which would mislead readers. You may prefer not to revisit the Law Dome issue and believe what you wrote was fine, but what you wrote was at a minimum, inaccurate. Depending on the specifics of the dataset used, it may have even been wrong in its entirety insofar as it deals with the Gergis et al paper (without reflecting or remarking on anything said about the IPCC in the post).

    Your central claim was the fifty year requirement imposed by Gergis et al arbitrarily excluded the Law Dome proxy, a proxy which had a stronger correlation to local temperatures than almost any proxy Gergis et al included. In reality, it seems the Law Dome proxy could justly be excluded even if one lifts that fifty year requirement. Moreover, the Law Dome proxy does not appear to have the strong correlation to local temperatures you claim it has when one uses the data set* the authors say they used.

    If we're being honest, I probably should have written about this the moment I saw your post. It was only because I respect you I raised my concerns three different times before writing this post. I'm sure I could have raised the issue even more times, but at a certain point, delaying pointing out what appears to be an error becomes inappropriate.

    *In name, if not in specific version. As I've said from the beginning, it is inappropriate for Gergis et al to archive their proxy data but not the instrumental data they used.

  6. I re-ran the code and once again got a correlation of 0.529 for Law Dome to summer temperatures, making a summer average temperature of available summer months. I'm not sure what you did, but I think that you're jumping the gun in assuming that my conclusions about Law Dome are "false".

    For autocorrelation, Gergis et al used a formula from Bretherton et al 1999, which was new to me and which I implemented for the first time in this calculation. Re-running the code, I noticed an issue in my implementation of the Bretherton formula and the t-values are a little different. Tweaking this, I got a slightly lower t-value of 3.652 for Law Dome, a little lower than reported in my post, but not changing the conclusions of my post.

    I'll post up my code after making it turnkey.

  7. Code is available at http://www.climateaudit.info/scripts/multiproxy/gergis_2016/gergis_cor_final.txt showing steps in calculation.

    Script executes turnkey with following command:
    source("http://www.climateaudit.info/scripts/multiproxy/gergis_2016/gergis_cor_final.txt")

    you get list of proxies in order of decreasing correlation (absolute value) to central gridcell:
    # id lag_calc rcalc tcalc G16
    # 4 X1091_Palmyra_d18O_floating 0 -0.766 -8.597 1
    # 30 X1076_Fiji_1F_SrCa 0 -0.705 -7.201 1
    # 16 X607_Fiji_AB_d18O 0 -0.646 -6.201 1
    # 21 X850_New_Caledonia_d18O -1 -0.536 -4.006 1
    # 2 X588_Law_Dome_18O_new 0 0.529 3.652 0
    # 8 X897_New_Fenwick_Composite_Signalfree_2 1 0.525 4.296 1
    # 32 X600_Fiji_1F_d18O 0 -0.507 -4.434 0
    # 22 X937_Stewart_Island_HABI_composite_Signalfree_2 0 0.502 4.041 1
    # 9 X885_URW_newz063_Signalfree_2 0 -0.490 -3.933 1
    # 13 X667_All.Celery.Top.west.corr_Signalfree_2 -1 0.440 3.726 1

  8. Steve McIntyre, I have no doubt if you rerun code on the HadCRUT3 data set you will continue to get the same results you got before when running code on the HadCRUT3 data set. That just won't address the fact the authors said they used the HadCRUT3v data set, a different data set than the HadCRUT3 data set you ran your code on.

    While I appreciate you posting your code, and you are certainly more adept at writing code for problems such as these, your code proves you used the wrong data set. It clearly says:

    #loc="http://www.metoffice.gov.uk/hadobs/hadcrut3/data/HadCRUT3.nc"
    ...
    #upload to http://www.climateaudit.info/data/multiproxy/gergis_2016/instr.tab

    That is the URL for the HadCRUT3 data set as can be verified via this page. The URL for the HadCRUT3v data set can also be found on that page, and it is:

    https://crudata.uea.ac.uk/cru/data/crutem3/HadCRUT3v.nc

    I thought I had made this clear enough in my e-mails to you and in this post, but the central concern I have had all along is that we were using different data sets. Your code proves my concern was correct, and you were using the wrong data set. Hopefully I have made this point clear enough now. In case I have not, I'll repeat: Your results are based upon using the wrong data set.

    I'll go ahead and rerun your code on the correct data set and see what the results are.

    As a final note, your code isn't actually turnkey as you have several destination files hardcoded into the code which have a specific directory structure that won't be compatible with everyone's computer. It's an easy thing for a user to fix. The simplest solution is to delete the directory structure of the pathnames and simply use your working directory instead. You also have a call to a pairwise.complete.obs file which isn't created or downloaded by the code, causing it to break. I don't think those are an important issues, but I thought I should let you know about them.

  9. Sven, indeed he did. It also appears he failed to understand that the choice of data set was a central issue of my comments, as he didn't comment on that issue at all even though his code proves I was correct.

    In any event, I believe if he had used the correct data set, his results would have been quite different. I'll be using his code to check this belief once I finish some work around the house. It looks like it is going to rain again soon so I want to get things done while I can.

  10. Oh, quick note. I just let the code run after fixing the pathnames, and the pairwise.complete.obs issue didn't cause it to break. I'm not certain things worked properly, but I wanted to put it out there that maybe there's a reason that portion of the code would work when my first impression was it wouldn't.

    More in an hour or so though.

  11. Alright, so I ran Steve McIntyre's code and gotten the same results as those posted in his comment above. His results match what I obtain when using code I wrote on the HadCRUT3 data set (as opposed to the, correct, HadCRUT3v data set I had been examining). I'm confident the code is accurate. Given that, I reran the code with the HadCRUT3v data set the authors said they used instead of the HadCRUT3 data set McIntyre used. Here are the results:

    # 4 X1091_Palmyra_d18O_floating 0 -0.776 -8.834 1
    # 30 X1076_Fiji_1F_SrCa 0 -0.700 -7.054 1
    # 16 X607_Fiji_AB_d18O 0 -0.640 -6.069 1
    # 26 X585_Vostok_d18O -1 0.545 3.418 0
    # 8 X897_New_Fenwick_Composite_Signalfree_2 1 0.530 4.291 1
    # 22 X937_Stewart_Island_HABI_composite_Signalfree_2 0 0.521 4.156 1
    # 32 X600_Fiji_1F_d18O 0 -0.507 -4.422 0
    # 9 X885_URW_newz063_Signalfree_2 0 -0.480 -3.758 1
    # 21 X850_New_Caledonia_d18O -1 -0.472 -3.163 1
    # 13 X667_All.Celery.Top.west.corr_Signalfree_2 -1 0.445 3.783 1
    # 3 X1075_Oroko_TT_recon 0 0.427 3.566 1
    # 25 X869_Rarotonga_d18O 0 -0.404 -3.136 1
    # 12 X879_TAK_newz062_Signalfree_2 0 -0.394 -2.386 0
    # 6 X829_Law_Dome_Accumulation 0 0.384 2.306 0
    # 45 X1017_Rarotonga.3R_d18O 0 -0.384 -2.888 1
    # 24 X857_Rarotonga_SrCa 0 -0.380 -2.973 0
    # 29 X913_Savusavu_d18O 0 -0.375 -2.577 1
    # 39 X923_Maiana_d18O 0 -0.371 -2.910 0
    # 7 X656_CTP_East_BLT_RFR_Signalfree_2 -1 0.370 3.040 1
    # 2 X588_Law_Dome_18O_new 0 0.368 2.331 0

    When placed in order of decreasing (absolute) correlation, the Law Dome proxy Steve McIntyre highlighted drops to 20th, as opposed to fifth as in his list. Additionally, the t-score drops from the 4.71 McIntyre listedi n his post, or the 3.65 he has since updated it to, to the much lower value of 2.331. This means McIntyre's claims like Law Dome having higher correlation to local temperatures than "24 of the 28 proxies retained by Gergis in her screened network" and "a higher t-statistic than 24 of 28 proxies retained by Gergis" and "a higher correlation and t-statistic than either of the other two long proxies (Mt Read, Oroko Swamp tree ring chronologies)" are false.

    Additionally, McIntyre's code includes any year so long as there is at least a single month of data in it. If one chooses to instead use a slightly more restrictive requirement, that there be at least two months of data, his results change. If you require all six months of summer data be present (remember, Gergis et al were attempting to reconstruct summer temperatures), then you can't even get a result as you'll find there isn't enough data in any year.

    The authors didn't say what, if any, criteria they used for handling missing data. That is bad. However, it also means we cannot know if McIntyre's code matches what they did. It is quite possible the Law Dome proxy McIntyre made a fuss about wouldn't have been used regardless of the 50 year requirement the authors imposed simply because it wouldn't have passed the correlation screening.

    It's currently impossible to know much with certainty since different versions of HadCRUT3v exist, and we don't know which the authors used. They may have used a slightly older version than the one currently available. Given that and how one's decision of how to handle missing data affects one's results, it is inappropriate for McIntyre to conclude his post:

    The Law Dome d18O series has a stronger correlation to gridcell temperature than 24 of 28 “passing” proxies or either of the long tree ring series used as long proxies in Gergis et al 2016. It was excluded from the Gergis et al network based on a additional arbitrary screening criterion that excluded Law Dome without impacting any other proxies in the screened network. It is not known whether Gergis et al intentionally added the additional screening criterion in order to exclude Law Dome or whether the criterion had been added without fully understanding the ramifications, with the exclusion of Law Dome being merely a happy coincidence. In either case, the exclusion is not robust. And because the Gergis et al 2016 reconstruction (R28) is based on only two proxies in its early portion, neither are its various reconstructions. The impact will be particularly felt on the R2 and R3 reconstructions, which have only two and three proxies respectively.
    Just another day of data torture by the paleoclimate “community”.

    Had McIntyre used the correct data set, HadCRUT3v instead of the incorrect data set, HadCRUT3, I am confident he never would have written that. Hopefully it will not be long before he recognizes his mistake and corrects it.

    Incidentally, this further emphasizes that authors should archive the data used in their study and provide sufficient detail to replicate their results. Not knowing what version of HadCRUT3v the authors used or how they handled missing data is a silly obstacle for people who claim to be doing science.

  12. Brandon, I'm impressed by your work and catching that Gergis used HADCRUT3V. I have a few questions.

    1) When you say H3v is the "correct" data set what makes it the correct set? Would Gergis have been wrong to use H3? If so, why? If not, then is this not yet another post hoc selection that could be used to find the "correct" conclusions? Gergis, after all, took years to try different combinations of analysis strategies.

    2) Why is H3v so different than H3? The are from the same raw data.

    3) If a proxy can correlate significantly with one of the data sets but not the other does this tell us anything about the confidence of correlation stats?

  13. Ron Graf, sadly, it didn't take any real effort to figure out there was something wrong with the results Steve McIntyre posted. Figuring out the specific, that McIntyre had used HadCRUT3 when Gergis et al used HadCRUT3v was not as easy, but credit for that goes to HaroldW. I had suspected the issue might be in the data being used, but I had no idea on the specifics.

    1) When you say H3v is the "correct" data set what makes it the correct set? Would Gergis have been wrong to use H3? If so, why? If not, then is this not yet another post hoc selection that could be used to find the "correct" conclusions? Gergis, after all, took years to try different combinations of analysis strategies.

    When I say "correct," I only mean that in a limited sense: McIntyre was discussing what effect choices Gergis et al made had on their work, so he needed to use the same data set they used (to whatever extent possible). If Gergis et al had used HadCRUT3 instead of HadCRUT3v, I'd say the "correct" data set to use would be HadCRUT3.

    2) Why is H3v so different than H3? The are from the same raw data.

    Along with your previous question, I believe the answer lies in the purpose of the variance adjustments applied to the HadCRUT3v data set. The reason a variance adjusted data set was created is due to the amount of temperature data available changing over time, the variance in the resulting temperature series will change over time. That can cause problems for some types of analyses.

    At first glance, I believe variance adjusted gridded data is more appropriate if one is aiming for linear functions like linear detrending and correlation testing. The reason is without the variance adjustment, any linear model you create will be influenced by the artificial effects of things like station drop outs. The effect could be non-trivial.

    As for these specific differences, I suspect a larger issue might be the HadCRUT3v data set has a lot of missing values for this particular grid cell, to the point I believe every summer is missing at least one month of data. I'm not sure what causes that. It might be a glitch, a difference in amount of data required for the methodologies used or something else.

    3) If a proxy can correlate significantly with one of the data sets but not the other does this tell us anything about the confidence of correlation stats?

    This is a point I raised in my early e-mails to McIntyre on this issue. I initially used the HadCRUT4 data set, and I got notably different results than he did. I suggested that cast doubt on the validity of the approach. Correlating against grid cells seems fraught with problems to me, and I don't it is a viable approach. I think to do the work well oen would either need to directly use station records themselves or create temperature fields specifically tailored for the area of interest.

    But honestly, the paper is trash. Issues like that are real and I think fundamental to millennial paleoclimate reconstructions. There are just bigger concerns with this paper. At a minimum though, I would say repeating one's correlation screening against something like GISS's record should be required. That would give at least some perspective on how meaningful any correlations one finds might be.

  14. I guess this in the same area as your recent post on BEST but isn't HadCRUT3v another
    'homogenized' data set? - I skimmed through the Uncertainty Estimates paper
    (Brohan et al 2005). It appears that the model is validated by adding 'random noise' to
    a GCM run.

  15. JonA, all of these data sets are homogenized, in one manner or another. The real difference between HadCRUT3 and HadCRUT3v doesn't really involve that. To understand the difference, imagine if there were a portion of the globe where average temperatures hadn't changed at all over the years. In 1900, we had one temperature station there, but by 2000, we had 20.

    The number of stations changing over time wouldn't change that the trend for the area would be 0. What would change is the variance. In 1900, temperature anomalies for the area might fluctuate from -5 to 5 because a single station is bound to have quite a bit of noise in its record. In 2000, that fluctuation would be much smaller, perhaps only -1 to 1, due to the increase in number of stations. If you plotted temperatures for the area, the past would seem very different from the future because of the change in variance. That difference would only be caused by the number of stations though, not any physical properties of the area.

    For some purposes, that might be okay. For other purposes, it might not be. The choice of which data set to use depends on what your purpose is.

  16. Might have to reconsider incompetence as the reason Mann made false statements in reply to McIntyre's comment on Mann 2008 about upside down usage.

  17. "Might have to reconsider incompetence as the reason Mann made false statements in reply to McIntyre's comment on Mann 2008 about upside down usage."

    Weep...
    Steve Mc makes a mistake..therefore everything else is suspect..????
    Show it... like Brandon does....or off to the corner with the pointy hat you go..

  18. Mike Williams, that doesn't seem a reasonable response. He was right to point out there is some similarity here. This post makes it clear enough what error it was highlighting. Despite that, Steve McIntyre came here and argued the post was wrong, without even looking or considering that error. Instead, he simply repeated the same thing he had done before, making the exact same error he had made the first time.

    I don't think anyone (here) would claim McIntyre is incompetent, yet he dismissed this error even though he looked at his own code which showed it clearly existed. If McIntyre can do that without incompetence or dishonesty being the reason, then perhaps it is not incompetence or dishonesty that caused Michael Mann to dismiss his criticisms on the Tiljander issue.

    My impression is in both cases the criticisms were dismissed by someone who didn't even bother to read them before writing his response.

Leave a Reply

Your email address will not be published. Required fields are marked *