Consensus Chart Craziness - Part 4

We've been discussing a strange chart from a recent paper published by John Cook of Skeptical Science and many others. The chart ostensibly shows the consensus on global warming increases with a person's expertise. To "prove" this claim, Cook et al assigned "expertise" levels to a variety of "consensus estimates" they took from various papers. You can see the results in the chart below, which I've added lines to to show each category:

4_13_scaling_example

As you can see, the "consensus estimates" are all plotted one next to the other, without concern for how the categories are spaced. The result is Category 4 doesn't exist in the chart, and Category 5 covers more than half of the chart. This creates a vastly distorted impression of the results.

But while that is a damning problem in and of itself, there is much more wrong with the chart and corresponding paper. One of the key issues we've been looking at in this series is how Cook et al arbitrarily chose which results from the studies it examined to report and which ones not to. Today I'd like to discuss one of the most severe cases of this. It deals with the paper Verheggent et al (2014).

This is how the new Cook et al (2016) paper reported the results for Verheggen et al (2016):

5_3_Verheggen_Reported

That is one line. Their chart which has 16 data points on it has only one point devoted to this paper's results even though other papers have as many as three data points. This is peculiar as Verheggen et al report their results in great detail. For instance, here is a chart showing responses as a whole:

5_3_Verheggen1

And here are the responses broken down by type of person who gave the answer:

5_3_Verheggen2

And here is some of the same information again, in tabular form:

5_3_Verheggen3

That is a wealth of information. According to the authors of Verheggen et al, it shows:

Consistent with other research, we found that, as the level of expertise in climate science grew, so too did the level of agreement on anthropogenic causation.

This is exactly what Cook et al (2016) seeks to prove. Why then does Cook et al (2016) not include this data? The only results it includes are the answers given by people who "Published more than 10 climate-related papers (self-reported)." Why is that the one group who has its results reported? Why would a person having published 10 papers mean they get excluded while a person having published 11 papers get included?

It gets even worse when we look at the Expertise category assignment. These values were arbitrarily assigned by Cook et al, and there is no guidelines given for how they were chosen. As our last post showed, climate scientists who've published on climate science recently but published more on other topics were put in Expertise Category 3 for the Stenhouse et al (2014) survey.

Some of those people likely published 10 or more papers on climate science. After all, in five years (the period of time covered), a scientist could easily have written over 20 papers. If 10 were on climate science and 15 on something else, they'd be rated as Expertise Category 3 for Stenhouse et al (2014) yet fall in the Expertise Category 5 for Verheggen et al (2014). It gets even worse when you realize Stenhouse et al (2014) judged people's expertise on papers published in the last five years but Verheggen et al judged their expertise on how many papers they had published in their entire lifetimes.

Cook et al (2016) don't offer any explanation for why they only published results for one subgroup of the Verheggen et al (2014) data set. They offer no explanation for why some data was excluded and other data was not. They claim:

We examine the available studies and conclude that the finding of 97% consensus in published climate research is robust and consistent with other surveys of climate scientists and peer-reviewed studies.

Yet they intentionally exclude a great deal of data from many of "the available studies" they claim to examine. And when they display whatever results they did report, they make no effort to ensure the "expertise" categories they use are consistent or coherent. It appears all they're doing is picking up results that are convenient for them, assigning arbitrary "expertise" values to what they picked out and then displaying the results in a heavily skewed manner which gives far more visual weight to the most favorable results when a fair depiction:

4_13_scaling_proper

Would look very different.

6 comments

  1. Brandon -
    Please forgive me if this is off in the weeds, because I haven't read either Cook et al. (2016) or Verheggen et al. (2014). But the snippet of the C16 tabular results shows "V14Q3" (presumably question 3 in the survey), and your table & chart are labelled Q1 and Q1a. So not the same thing, perhaps?

  2. HaroldW, good catch. I didn't notice the Q3 in the code there. The two questions are largely the same. They ask after the same thing with Question 3 just being qualitative whereas Question 1 was quantitative. The results are almost the same if you exclude the people who didn't take a position. I'm not sure why I used Question 1 instead of Question 3, but the point remains the same as the same sort of charts and tables exist for both questions (here is the tabular results for both questions).

    In fact, this raises an additional question. Why did Cook et al (2016) use the results from Question 3 instead of the results from Question 1? I would think when doing a meta-study you'd have to have some sort of clear criteria as to how you would decide which data to use and which data to exclude, yet here, the decisions all seem completely arbitrary. Why did they use Question 3 instead of Question 1? Why did they exclude ~1,000 responses that didn't fit in the one subset they reported? For that matter, why did they pick that one subset, grouping people who had published 11-30 climate related papers with ones who had published 31+ while ignoring everyone else?

    I have no idea what the answers to those questions are, but I do know if Cook et al (2016) had used all the available data, they would never have been able to create that nice, pretty picture to use for PR. Plus, they would have actually had to do a meaningful amount of work. The amount of work put into this paper is not impressive.

  3. I should probably read the paper(s) but...
    What is the definition of the "consensus" (vertical) axis on that [C16] chart? Or does each point plotted refer to a different "consensus"? I got the impression from one of the other posts that in this paper they tried to be consistent about discussing the question of whether anthropogenic effects represent more than half of the observed temperature increase. Q1 addresses that issue, but Q3 doesn't.

  4. As strange as it might seem, there was no single definition of the "consensus" they examined. Compare these three quotes from Cook et al (2016) for a demonstration:

    Among the 4014 abstracts stating a position on human-caused global warming, 97.1% were judged as having implicitly or explicitly endorsed the consensus. In addition, the study authors were invited to rate their own papers, based on the contents of the full paper, not just the abstract. Amongst 1381 papers self-rated by their authors as stating a position on human-caused global warming, 97.2% endorsed the consensus.

    Thus, Stenhouse et al (2014) concluded that '93% of actively publishing climate scientists indicated they are convinced that humans have contributed to global warming.'

    By combining published scientific papers and public statements, Anderegg et al determined that 97%–98% of the 200 most-published climate scientists endorsed the IPCC conclusions on AGW.

    For the first of these, Cook et al (2013) is said to have shown various things "endorsed the consensus." No definition is provided. For the second of these, For the second of these, the consensus is defined as "humans have contributed to global warming. For the third of these, it is said people "endorsed the IPCC conclusions on AGW." Of course, the IPCC concludes a great number of things about global warming, so this is wrong as stated. What it actually refers to is a single statement by the IPCC, which Cook et al (2016) gives as:

    human influence has been the dominant cause of the observed warming since the mid-20th century

    Only, that quote comes from the IPCC Fifth Assessment Report, cited for the year 2014. The paper which supposedly shows a consensus endorsing that position was cited for the year 2010. It dealt with what was said by the Fourth Assessment Report. The underlying paper explains:

    We defined CE researchers as those who signed statements broadly agreeing with or directly endorsing the primary tenets of the IPCC Fourth Assessment Report that it is “very likely” that anthropogenic greenhouse gases have been responsible for “most” of the “unequivocal” warming of the Earth’s average global temperature in the second half of the 20th century

    Which becomes even more problematic when one realizes Cook et al (2016) cuts off part of the quote it does provide, as even the Fifth Assessment Report only said "It is extremely likely" that position is true. The Fourth Assessment Report said it was "very likely." So Cook et al (2016) changes the definition from referring to one report to a different report with different certainty levels, and then it drops all mention of uncertainty in the statement.

    I can't imagine how any decent researcher would do this sort of thing, but I'm also not sure it mattered. The underlying paper in this case didn't survey anyone. Its authors just collected papers from people and decided which were "broadly agreeing" with the IPCC position. The result if their "consensus" is the result of arbitrary ratings, ones they haven't published for anyone to verify. So, you know, maybe Cook et al (2016) screwing up what their "consensus" position is isn't that big a deal.

  5. By the way, the lack of consistency in definition of the "consensus" is being treated as a good thing by some. Consider:

    It is almost as if Richard doesn’t understand that several different groups have conducted independent surveys using slightly different questions, and yet produce consillient results. It’s called “structural uncertainty” Richard, there is no one true question for such a survey and showing that the results are similar for variations on the question posed shows that the consensus is “in the high 90s”.

    This is even alluded to by one of the authors, Ken Rice (more commonly known as the blogger Anders):

    And, Richard is ignoring that we discussed the various structual uncertainties in our paper.

    The authors made no effort to try to compare results for similar questions. For some papers, such as the Stenhouse and the Verheggen one, they had data indicating views on different issues. Rather than try to compare like to like, Cook et al (2016) simply chose some results to use and excluded others, without making any effort to pick results which were for similar concepts. They then threw this random mess all together and now want credit for that showing "the various structural uncertainties."

    That's the exact opposite of the truth. Had they wished to examine the various structural uncertainties, they would have compared the results of similar questions to one another and looked at how results change as the questions asked changed. They would have said something like, "For issue A, papers found X, Y and Z levels of consensus. For issue B, papers found J, K and L levels of consensus."

    That's how real research into the issue would be done. What they did was... I don't even know a term for it.

  6. Brandon: Exactly right.

    In both the Doran and the Carlton surveys, two questions were asked: Has it warmed? Who's to blame? (paraphrased)

    For Doran, the consensus is P(Q2=humans | Q1 = yes).

    For Carlton, the consensus is P(Q1 = yes) x P(Q2 = humans | Q1 = yes).

    Doran also reports P(Q1 = yes), so comparing like with like is straigthforward.

Leave a Reply

Your email address will not be published. Required fields are marked *