Consensus Chart Craziness - Part 1

There's a new paper out claiming to find a "consensus on [the] consensus" on global warming. It concludes:

We have shown that the scientific consensus on AGW is robust, with a range of 90%–100% depending on the exact question, timing and sampling methodology. This is supported by multiple independent studies despite variations in the study timing, definition of consensus, or differences in methodology including surveys of scientists, analyses of literature or of citation networks.

With it's one and only figure is used to demonstrate the claim:

Figure 1 demonstrates that consensus estimates are highly sensitive to the expertise of the sampled group. An accurate estimate of scientific consensus reflects the level of agreement among experts in climate science; that is, scientists publishing peer-reviewed research on climate change. As shown in table 1, low estimates of consensus arise from samples that include non-experts such as scientists (or non-scientists) who are not actively publishing climate research, while samples of experts are consistent in showing overwhelming consensus.

If you've followed the discussion about this paper so far, you may have seen my recent post discussing this chart:


In which I explained:

Look at the x-axis. See how it says "Expertise"? Tell me, what scale do you think that's on?

You're wrong. It doesn't matter what your answer might have been; it's wrong. It's wrong because there is no scale for the x-axis on this chart.

Seriously. This is what the authors of the paper had to say about the chart:

Figure 1 uses Bayesian credible intervals to visualise the degree of confidence of each consensus estimate (largely a function of the sample size). The coloring refers to the density of the Bayesian posterior, with anything that isn’t gray representing the 99% credible interval around the estimated proportions (using a Jeffreys prior). Expertise for each consensus estimate was assigned qualitatively, using ordinal values from 1 to 5. Only consensus estimates obtained over the last 10 years are included.

For today, let's ignore the part about the "coloring" and "credible intervals." Let's just focus on the part where it says the expertise values were "assigned qualitatively." What that means is there was no rigorous method to how they assigned these values. They just went with whatever felt right. That's why there is no rubric or guideline published for the expertise rankings.

Kind of weird, right? Well that's not too important. What is important is... there are five categories. Look at the chart. Where are they?

I then showed what the chart would look like if you labeled the various categories in it:


One category (5) covers more than half the chart's range while another category (4) doesn't even appear on the chart. Any claim "consensus estimates are highly sensitive to the expertise of the sampled group" based on this chart is heavily biased by the authors decision to present their data in a misleading way. Had they simply shown their data by category, they would have gotten a chart like this:


Which doesn't make for anywhere near as compelling an image, and it wouldn't allow the authors to create graphics like this one which they use to promote their conclusions:

By choosing not to label the values on their x-Axis, and by choosing to place every point next to another rather than grouping the data by category, the authors of this paper were able to create the visual impression of a relationship between expertise level and size of the consensus estimate.

That alone should be damning, but it turns out there are many other problems with this chart as well. To highlight them, I am going to run a little mini-series of posts under the title of this one. The series will demonstrate how data used in this chart has been cherry-picked, adjusted, and in one case seemingly pulled out of thin air.

Because this post is already running long, I'll close it out with one of the more peculiar aspects of this chart. It's a mystery I cannot unravel. This paper says:

Carlton et al (2015) adapted questions from Doran and Zimmerman (2009) to survey 698 biophysical scientists across various disciplines, finding that 91.9% of them agreed that (1) mean global temperatures have generally risen compared with pre-1800s levels and that (2) human activity is a significant contributing factor in changing mean global temperatures. Among the 306 who indicated that 'the majority of my research concerns climate change or the impacts of climate change', there was 96.7% consensus on the existence of AGW.

Both Table 1 in the paper and Table S1 in the Supplementary Material for the paper confirm these numbers, though the SM gives 91.8% instead of 91.9% as the consensus for the 698 people surveyed. I presume that was a typo. Regardless, what's peculiar is all three of these locations say there were 306 people "who indicated that 'the majority of my research concerns climate change of the impacts of climate change.'" 306 of 698 people is ~43.8%. Given that, why then would Carlton et al (2015) show:


5.50% is far smaller than 43.8%. In fact, 5.50% of 698 is only 38. That is, 38 people said "the majority of my research concerns climate change of the impacts of climate change." Yet somehow, this is what Cook et al (2016) show:


How the authors went from 38 (5.50%) to 306 (43.8%) is a mystery. I thought perhaps it was a typo with the 0 slipping into it, but 36/698 is 5.16%, not 5.50%. I thought perhaps the number had been inadvertently switched with another, but neither 43.8% nor 306 show up as values in the Carlton et al (2015) paper.

There's an additional question to consider as well. This one is less mysterious, but look at the Cook et al (2016) table again. There were 698 people in the Carlton et al (2015) study. If 306 did fall in one category, why would Cook et al (2016)'s other category show 698 people? Are they double counting some people, allowing them to fit in both categories?

I'll have more on that in the next post in this series. Stay tuned.


  1. Brandon,
    I didn't see any Supplemental Information associated with Carlton et al. (2015), so presumably they didn't publish a spreadsheet with all the responses.
    I was bothered by the combination of the survey response number (N=698) and the helpfully very precise breakdown of the research area response (5.50% / 42.45% / 50.04%). As you wrote, 5.50% of 698 is *about* 38; in fact it's 38.39. One naturally expects to see an integer here. Fiddling with the number of responses N, I inferred from the percentages that the number of responses to this question (Q25) was N=636, with response distribution (35 / 270 / 331 ).
    Not particularly noteworthy, but the possibility of creating 306 from a typo seemed a little more likely if the correct value were 36; however, 35 seems to be the correct answer.

  2. HaroldW, that is an interesting observation. I think you got something wrong on your response distribution though. 50.04% of 636 is only 318, not 331. I think you probably got it because 331 + 270 + 35 = 636, but 50.04% + 42.45% + 5.50% is only 97.99%. That means some people didn't respond to the question.

    Which creates a problem as 318/636 = 50% exactly, not 50.04%. In fact, I'm not sure of any number set of integers which would give exactly 50.04% anywhere near 698. That's one of the things I planned to look at for my next post.

    By the way, you should really check out Figure 4 of the Carlton paper. The rescaling in it is quite strange.

  3. I think a civil email to ATTP might be in order.
    I know its more fun on the web, but give it a try

  4. I was actually planning on e-mailing the author of the Carlton paper and John Cook. Carlton is an author on this paper and the originator of the data, so that makes him an obvious choice. Cook is the lead author of this paper, so that makes him an obvious choice as well. I don't see any particular reason I would contact Anders, unless I was perhaps going to contact all the authors (which I see no reason to do).

    Before I do that though, I want to take some time and try to figure out as much as I can on my own. I don't want to contact authors with questions if I can find the answer to them with some simple research or discussion with people. Plus, I want to figure out what all I would want to ask about. I think it'd be better to spend a week or two creating a list of questions/concerns/issues and contact authors with them all at once than to send an e-mail each time I see something new.

    Plus this way, if people spot errors/misunderstandings on my part, I won't have to bother the authors with them.

  5. Brandon (2:16 am) -
    My fault -- the 50.04% in the previous post is a typo. Looking at the results of Q25, the correct fraction responding "3" ("None of my research...") is 52.04%. Fortunately I only mis-typed it in the comment, not in the spreadsheet I used, so the math is correct...but apologies for misleading you.

    As the percentages added up to 100%, I concluded that non-respondents were excluded when computing percentages. [Well, technically the numbers add up to 99.99%, but the difference must be due to rounding. If it were due to including non-respondents in the denominator, the deviation from 100% would have to exceed 1/698, much more than the missing 0.01%]

    P.S. I wrote a brief (5 lines?) Matlab script yesterday which showed that only N=636 (of all N up to an including 698) admits to a solution in integers with the stated percentages rounded to two decimal places. A little more reliable than "fiddling". Well, assuming that Carlton rounded as Matlab does.

  6. Brandon -
    Regarding Figure 4 which you commended to me...I assume you find it odd that the trustworthy question (Q20), where the responses are on a 1-5 scale, are plotted against a 1-7 axis. Well, yes, that is odd. Saved a figure, though, or at least a panel.

    One thing which surprised the heck out of me is that the nearby text states "The average response to 'Compared to my field, climate science is a mature science' was 4.78 out of 7, indicating slight agreement." I would in no way describe climate science as mature, especially when compared to, say, physics. The responses to Q19 are given in the appendix as:
    1. Strongly agree 17.10%
    2. Moderately agree 33.74%
    3. Slightly agree 12.67%
    4. Undecided 8.09%
    5. Slightly disagree 11.15%
    6. Moderately disagree 10.38%
    7. Strongly disagree 6.87%
    So while the conclusion correctly describes the leaning of this tabulation, the authors apparently inverted the response order, such that a larger numeric value indicates more agreement. Same with Q20. Nothing wrong with that, but I didn't see it mentioned in the writeup. And I'm still surprised at the result.

  7. Oh, that's cool. I don't know how I missed that 50.04% was just a typo. That explains it. I'd say your results are right then. I was going to write a script like yours to see if I could figure out a possible set of respondent results (and may still do so), but I'm pretty sure there isn't another combination that will match the results reported in the paper. With fewer than 700 people surveyed, I don't think there should be many combinations that match the hundredth of a percent. I'm not sure there would be any.

    There is actually math which would let us tell, but I'm only vaguely familiar with how it works. I think it's an interesting field as I'm always amazed at the patterns you can find in digit sequences. It'd be a silly diversion to look into since a brute force script is far more efficient in this case, but... math is cool. At a minimum, we can test if the 0.01% difference is a rounding error. The smallest difference a missing response could cause is 1/698, or 0.14% That obviously couldn't explain the numbers adding up to 99.99%.

    I don't think there's really any need for consideration like that though. I think it's pretty clear your results are correct.

  8. And yup, for Figure 4 I get they wanted to show all the results on the same chart so they needed to address the difference in scale, but the result looks weird. They rescaled 1-5 to 1-7 and the resulting chart has no values below 3. That's not wrong; it just looks weird.

    As for the maturity of climate science, I think that question is likely to mislead people because it isn't necessarily asking people whether climate science is more or less mature, just that in comparison, it is mature. That means people who think their field isn't mature and think the same about climate science could give the same answer as people who think both their field and climate science are mature.

    What's more interesting is if people took the comparison as meaning equally mature, anyone who felt climate science was more mature than their field would have to disagree with the answer. They'd also have to disagree if they felt climate science were less mature. That means disagreeing with the answer could mean more than one thing.

    But honestly, I think people were biased to answer the question positively because, in general, scientists don't want to speak ill of other scientists and their work. That might explain why chemists and physicists gave pretty much completely neutral results. If no respondents to answer question give a negative answer on whole, it seems likely the scale is skewed.

  9. For (Carlton's) Figure 4, it appears that they added 1 to the survey results for Q20, so a 1-5 scale is plotted within 2-6. This has the advantage of converting the median response -- "3. About equally trustworthy" -- to a 4, which matches the center of the range for Qs 18 & 19, which are on a 1-7 scale. In their discussion, they seem to have left it as 1-5, reporting a mean of 2.69, although it's plotted at 3.69 in Figure 4. As the Q20 responses are already in an order where the more positive perception of climate science is represented with higher numeric values, they didn't adjust the order.

    For Qs 18&19, they mapped the 1-7 responses into 7-1, inverting the order that the most positive views are represented with higher numeric values. The text also uses that mapping.

  10. Carlton's Q18 -- "Climate science is a credible science" is an interesting question to me. The response was overwhelmingly (almost 80%) "Strongly agree". I don't know how to interpret the statement, though. Climate science certainly incorporates sound physical principles used in atmospheric science. That gives it a certain amount of credibility in my mind. The use of these principles in GCMs results in qualitative matching of phenomena such as persistent ocean currents, prevailing winds, Hadley circulation, seasonal cycles, etc. All of that is to the good. However, there are significant lacunae, natural to an unsettled science, in that the various GCMs disagree in many significant respects. One example is the mean surface temperature, arguably the most basic metric, in which GCMs differ over a range of several K. From one perspective, that's pretty good -- a few K out of ~300 -- but it's a red flag (in my mind, at least) that we don't yet know all of the guiding principles. [By comparison, if you asked 20 astronomers to calculate the times of the next solar eclipses, I'd guess that they would differ by no more than seconds over intervals of years.]

    And the leading question of climate science, climate sensitivity, is known only within a factor of 3 (using 1.5 to 4.5 K). More interesting questions -- rainfall effects, polar ice sheet evolution, biological responses -- remain less known. When I see climate science papers saying that they predict such-and-such over the next century, I find them not credible, because when the basic principles are known only to a "kinda sorta" level, and a simulation is run which integrates those equations over a long span, my experience tells me that those results aren't reliable. Whole-society predictions -- the stuff of IPCC WG2 -- don't even qualify in my mind as "kinda sorta" right.

    So how would I respond to Q18? I think I would say "moderately disagree". But it's a bit tricky, and I don't know if all persons would interpret the statement in the same way.

  11. P.S. From 6:33 comment: "I don't know how I missed that 50.04% was just a typo." I think it was observant of you to notice that (as written) there was an easy cross-check, and it failed. Hence, *something* about my numbers was wrong. Not up to you to identify/fix the problem. And you were correct, there is no way to get 50.04% from a rational number with denominator up to 698; aside from 50.00% (easy!), the closest on the other side is 50.07%, e.g. 348/697.

    I wrote earlier that one caveat to my result was "assuming that Carlton rounded as Matlab does." I tweaked the Matlab script to try rounding down, which gave solutions for N=563 and N=636, and rounding up, which gave no solution.

  12. One big thing to remember is climate science is the topics we hear about in the global warming debate only make up a portion of what the field covers. Being credible/mature in the eyes of scientists doesn't require you have all the answers. There are plenty of unanswered questions in all sorts of fields that are considered mature and credible. Climate science has been around for over a century, and it does contain a wealth of knowledge. People could reasonably think it credible and mature even if it doesn't have answers to some of the questions in the global warming debate.

    As for 50.04%, it was a good catch, but I've looked at the table in the Carlton paper enough times I could have remembered the right value. Failing that, all I had to do is scroll up to see it. I included it in the post, after all.

    On rounding, I can't see any reason for rounding down. I suppose it's good to consider the possibility though. Either way, the number given in Cook et al (2016) seems clearly wrong.

Comments are closed.