2012-02-14 12:51:51Proposed methodology
John Cook


There's been a long discussion on how we define AGW and some talk on the methodology we follow on rating papers. With the email harvesting winding up, it's time to move to Phase 2b - rating the papers. Here is a summary of where we're at, how we plan to rate the papers, the process followed. Now is the time to have your say on how we approach this - once this Titanic gets moving, it will be hard to change direction (hmm, bad metaphor) so best to start in the right direction from the outset. So feedback welcome!

UPDATE 17 Feb: Have updated the methodology based on feedback, marked in red.

Rating paper by category and endorsement level

We will be giving each paper two ratings - category and endorsement level. Here are the available options:

Category Endorsement Level
  1. Mitigation
  2. Paleoclimate
  3. Impacts
  4. Methods
  5. Opinion
  6. Not related to climate

AGW = humans are causing global warming

  1. Explicitly endorses and quantifies AGW as 50+% cause of the observed warming
  2. Explicitly endorses but does not quantify or minimise AGW
  3. Implicitly endorses AGW without minimising it (by definition does not quantify)
  4. Neutral
  5. Implicitly minimizes/rejects AGW (i.e. says the sun is playing a big role)
  6. Explicitly minimizes/rejects AGW but does not quantify AGW
  7. Explicitly minimizes/rejects AGW as less than 50%

Some notes:

  • Categories are exclusive - you can only assign one category to each paper.
  • Endorsement levels are discrete - there's no such thing as 1.5, halfway between two categories. It's one or the other.
  • We won't include "from 1950" in Category 1 (matching it to IPCC) because papers are very rarely that specific in the abstract

Our original intent was to replicate Oreskes 2004 who used 6 categories - endorse consensus, reject consensus, mitigation, paleoclimate, impacts, methods. We are adding extra levels of information to this same methodology. This gives us the option of collapsing our results down to Naomi's format if we so desire but also capture lots of other interesting information. For example, Oreskes cites percentages of implicit and explicit endorsement but doesn't actually measure it, she just assumes the figures. We're going to quantify the amount.

Guidelines and clarification on rating papers

Here are some more specific guidelines on how to categorise:


  • Methods: examines technical aspects of measurement/modelling. If a paper describes methods but no actual results, assign it to methods. If it goes on to results, then assign it to whatever the results are relevant to (eg - impacts/mitigation/paleoclimate)
  • Mitigation: explores ways to reduce CO2 emissions or sequester CO2 from the atmosphere
  • Paleoclimate: examines climate in periods predating the instrumental period (eg - around 1750)
  • Impacts: Papers on the effect of climate change or rising CO2 on the environment, ecosystems or humanity
  • Not Related To Climate: Non-mitigation/impacts paper that contain no actual climate science. Eg - social science papers on education/communication/historical analysis. 

Level of Endorsement

1. Explicit Endorsement of AGW with quantification

  • Mention that human activity is a dominant influence or has caused most of recent climate change (>50%). Endorsing the IPCC without explicitly quantifying doesn't count as explicit endorsement - that would be implicit.

2. Explicit Endorsement of AGW without quantification

  • Mention of 'anthropogenic global warming' or 'anthropogenic climate change' as a given fact. Mention of increased CO2 leading to higher temperatures without including 'anthropogenic' or reference to human influence/activity relegates to 'implicit endorsement'.

3. Implicit Endorsement of AGW

  • Mitigation papers that examine GHG emission reduction or carbon sequestration
  • Climate modelling papers that talks about emission scenarios and subsequent warming or other climate impacts from increased CO2 in the abstract implicitly endorse that GHGs cause warming
  • Paleoclimate papers that link CO2 to climate change
  • Papers about climate policy (specifically mitigation of GHG emissions) unless they restrict their focus to non-GHG issues like CFC emissions in which case they're neutral
  • Modelling of increased CO2 effect on regional temperature - not explicitly saying global warming but implying warming from CO2
  • Endorsement of IPCC findings is usually an implicit endorsement. (updated this so it's more than just reference to IPCC but actual endorsement of IPCC)

4. Neutral

  • If a paper merely mentions 'global climate change' or 'global warming', this isn't sufficient to imply anthropogenic global warming
  • Mitigation papers talking about non-GHG pollutants are not about AGW
  • Research into the direct effect of CO2 on plant growth without including the warming effect of CO2
  • Anthropogenic impact studies about direct human influence like urban heat island and land use changes (eg - not about GHG emissions)
  • Research into metrics of climate change (surface temperature, sea level rise) without mention of causation (eg - GHGs)

5. Implicit Rejection of AGW

  • Discusses other natural causes as being dominant influences of recent climate change without explicitly mentioning AGW

6. Explicit Rejection of AGW without quantification

  • explicitly rejects or minimises anthropogenic warming without putting a figure on it.

7. Explicit Rejection of AGW with quantification

  • explicitly rejects or minimises anthropogenic warming with a specific figure

Rating Process

The goal is every paper (12272 in total) needs to receive at least two ratings so our results all confirm each other. We will rate the papers blind, not knowing what anyone else has rated, until we obtain the required number of ratings. Papers will be rated based on the paper's title and abstract. Each person only rates the same paper once (no double dipping). If we get through the ratings quickly enough, we may extend the rating to 3 or 4 ratings per paper (I'm guessing unlikely).

You will rate papers by selecting from two drop downs for each paper. Papers will be displayed at random, ordered so those with the least number of ratings get shown first. There will also be a small text box for entering any notes about the paper. You will also be able to see all papers you've already rated in case you made a mistake and want to go back and edit it.

We wlll have 2 months to get through the ratings (so I will have a deadline figure in the forum header to let us know how we're going). That's a goal of ~200 ratings per day. Note - I can get through around 100 ratings in an hour so 2 hours per day between the entire SkS team sounds imminently doable. If we get through it quicker, we get more time for analysis and editing the paper. The reason for the deadline is so the paper is submitted for peer-review by the deadline that gives it a chance of getting into IPCC AR5.

Emailing Scientists

For this plan, was thinking I would write some code that generates an automatic email for every paper that we've collected emails for and sends off the email. Preference will be for the lead author but if there's no email for the lead author, will go down the pecking order until I find an email. The email would contain an explanation of our survey and a link to a form on SkS where they can rate their own paper. Assuming we send off 5000 emails with a 10% response rate, we might get ratings for ~500 papers. That will be a lot of useful data. This can happen while we're doing our rating and will be invisible to those doing the rating. This poll will restrict itself to just asking the scientists to rate their own papers, not ask for additional opinions about climate change.

Quality Control

Once we've finished rating, we quality control our ratings in two ways.

  1. You can see all the papers where someone has rated differently to you, compare your ratings to theirs and if you still decide your rating is correct, discuss it with the other rater
  2. If you both still agree to disagree, this paper is flagged and a third person can rate the paper also. Then the three or more raters can discuss the paper and hopefully come to a consensus

We also compare our results to the ratings provided by the scientists who authored the papers. My thinking here is that we don't use the scientists' ratings to change our own ratings. Instead, we follow our process then compare our results to the scientists' results as a measure of how accurate our method is. That would be an interesting analysis (interesting note - I have the ability to pinpoint which SkSers differ from the scientists the most so we can make our own 'wall of shame' :-)

Tom Curtis' 3 stage idea for additional quality control

Tom has an idea to add an extra level of quality control - as well as our 2 ratings based on title/abstract and the scientists' rating, we also rate a subset of papers based on title/abstract/introduction/conclusion (or perhaps on the full paper). Perhaps these papers could be selected from papers where we obtained a scientist rating so we would have three different approaches to rating the paper. Personally, it seems like overkill to me. The quality control issue is mitigated to some degree by the fact that when we publish our results, we will publish them transparently online and encourage people to check for themselves - perhaps even including an interactive feature making it easy for them to do so. But what do others think of this idea?

Follow-up Study

Note - only rating papers based on title and abstract is, I admit, not a comprehensive approach. You really need to read the full paper to properly categorise it. That's just not practical. My prediction is we'll underestimate the number of endorsements because a lot of papers will probably endorse AGW in the full paper but not bother to mention it in the abstract. But we don't need to stress out about that. Instead, I suggest at the end when we have, x% of endorsements, y% of neutrals and z% of rejections that we pick out a small random sample of neutrals and peruse the full papers to see how many endorse/reject. Or use Tom Curtis' 3 stage idea to get a sense of this. That will give us a rough guestimate of what the *actual* number of endorsements/rejections are. That will be the basis for Phase 3 where we publicly crowd source going through the neutral papers to find a more accurate figure.

I'm hoping to start coding all this very soon so anyone involved in TCP (31 of you helped harvest emails), read this through carefully and post your comments so we can develop as rigorous and robust a methodology as possible.

2012-02-14 15:11:09A few questions for clarification:
Tom Curtis


1)  Will Categories by exclusive?  I am specifically thinking of papers like Mann et al 2008 which is certainly about paleoclimate in that it presents two reconstructions of temperatures over the last 2000 years, but is explicitly a discussion of the merits of different methodologies of reconstruction.  I would be inclined to classify it as 2&4  (or perhaps 2/4 so long as that is understood to not be a fraction).   The disadvantage of allowing non-exclusive categories is that you multiply the number of effective categories you have, which also multiplies the number of potential disagreements in classification.  The advantage is that you make it easier to resolve disagreements by not forcing potentially arbitrary conclusions.  It also makes it possible to report partial agreement with authors.  IE, if Mann where to classify Mann et al 2008 as 4, and we classified it as 2/4, then we can report the partial agreement instead of the (less accurate) report of a disagreement if we had been forced to classify it as a 2 or 4, and chose 4.  (Of course, arbitrary choices would tend to average out so that is not a huge advantage.)


2)  Are endorsement levels discrete?  Again I am thinking of borderline cases which we may be inlcined to classify as 1.5 rather than as a 1 or a 2.   Allowing non-discrete endorsement levels again allows easier resolution of differences, and allows reporting mean level of disagreement between author and SkS ratings as a numerical value with appropriate statistics, which may be more informative than a simple agrees/disagrees classification.  Perhaps I should reverse that.  The numberical reporting of level of disagreement is recommended in any event, and allows non-discrete endorsement levels which then makes resolving issues of classification easier.

3)  What is the timeline?  The question relates to my idea for additional quality control.  If the timeline is tight, then the rating by whole paper must be near concurrent  with the rating by abstract.  That would mean we need a large sample (600 papers at minimum) in order to ensure a significant number in each category, and a significant number of author rated papers are included in the sample.  On the other hand, if the timeline allows for whole paper ratings to occur during April, we can rate a smaller number of papers randomly selected from each category, and from author overlaps.  I would recommend 50% of papers rated on those terms be selected from those with author ratings, and that from among author rated (and non-author rated) papers, a number be selected from each category in proportion to that categories proportion to all author rated (non-author rated) papers.  By doing so we could reduce full paper ratings to about 200 or so.  Dikran as our resident statistical genius can advise us on the minimum number needed to get statistically usefull results.  Clearly the third layer of quality control becomes less advised if we are on a tight timeline as it involves more effort on our behalf (although I would still think it desirable)?

2012-02-14 15:31:14Answering Tom's questions
John Cook


1) All the category options are exclusive. In Phase 1, we found many papers overlapped different categories. Eg - a methods paper about paleoclimate or methods paper that modelled impacts. So the approach we took was this. If a paper was about methods but also relevant to paleoclimate, impacts or mitigation, then we would default to paleoclimate, impacts or mitigation if the paper obtained results relevant to those categories. Eg - a paper might be about modelling climate impacts and the whole paper was solely about developing the model without actually obtaining any results. That would be Methods. But if it developed methods but then went on to calculate specific impacts from the model, then it was Impacts. I think the most efficient approach is to keep the categories discrete, only one category per paper (let's not complicate things too much, the categorisation is blowing out as it is) and set firm rules for how to categorise for cases where it could go either way.

An interesting side-note. Naomi Oreskes only had 1 dimension of categorisation and "endorsement" and "rejection" were categories. That mean that any mitigation paper that explicitly endorsed the consensus was labelled an Endorsement paper. We felt that we were losing too much information by keeping endorsement/rejection in the same category list as the other categories and moved them into their own categorisation, making it a 2 dimensional rating process. This is another way we're expanding on Oreskes 2004, as well as extending the time period and broadening the Web of Science search from "global climate change" papers to "global climate change" and "global warming" papers.

2) I'd like to keep endorsement levels discrete. The boundaries between "explicit endorsement with quantification" and "explicit endorsement without quantification" is fairly discrete. The only grey area is the boundary between implicit endorsement and neutral or between implicit rejection and neutral. Again, I recommend for the sake of simplicity that we set firm rules for the more difficult to rate papers and if in doubt, opt for neutral.

3) Rating 600 full papers sounds like overkill to me - you don't need that many for statistical significance. But the full paper rating can happen at the same time as the abstract rating. Potentially, what we could do is email the scientists early in the process then as scientist ratings come in, those papers are flagged for SkSers to go look at the full paper.

2012-02-14 16:31:03
Dana Nuccitelli

Sounds good to me.  I may be biased since I came up with the 7 categories, but I think this is a much more robust approach than previous efforts.  Of course, Oreskes probably didn't have a whole team at her disposal!  I think we'll end up collecting a lot of very valuable information with this approach.

I agree that 2 ratings per paper is a good start, and if we tear through the papers quickly, bump it up to 3 or 4.  Given the speed of the email collecting, that may be entirely feasible, but we can play it by ear.

Tom's idea would give us some useful information too about how much we're missing just by looking at title/abstract.

So overall, thumbs-up from me.

2012-02-14 18:00:34
Ari Jokimäki


Like I indicated in the other thread, the categories still aren't clear to me. Furthermore, the way category 1 is worded here is different than the one presented in the discussion of these. There it was said that category 1 holds all papers that agree with IPCC or mainstream consensus (which is taken as endorsement of over 50 %) but here just ones that quantify it (however, I think category 1 would be better if it only would contain those papers that actually quantify it).

I think we shouldn't go ahead until this has been cleared up.

2012-02-14 18:41:19
John Mason


I wonder about the 50%+ figure of Endorsement Level 1 on the basis of what will the deniers make out of it. A denier way of twisting this would be "only half of a tiny temperature rise". It needs wording in a bombproof way.

Don't give the buggers an inch to play with!

Cheers - John

2012-02-14 20:24:44
Paul D


Sounds good to me.

I would also ask the scientists to give an oppinion about their current thinking about climate change and the science.

2012-02-14 23:24:15Polling scientists about science
John Cook

I suggest keeping the scientist poll restricted to the papers. Lean and mean. For what it's worth, the other survey I'm working on, which we're also using the 5000 emails for, will ask those kinds of questions.

John M, the >50% comes from the IPCC definition: "most of global warming..."

2012-02-15 00:01:01
Tom Curtis


John Cook, shouldn't Endorsement lvl 1 indicate >50% of increased temperature since 1950, to bring it completely inline with the IPCC?

2012-02-15 06:30:21
Dana Nuccitelli

Ari - John doesn't think we should include the IPCC and similar endorsements in Category 1.  I'm on the fence - I still think we should include them just to be technically correct, although that's debatable.  John argues it's not explicit unless they use the words 'majority' or '>50%' or something, and so endorsing IPCC isn't explicitly saying >50%.  I would still argue that explicitly endorsing a document with a >50% position is equivalent to explicitly endorsing AGW >50%, but I could see it either way.

Did IPCC say majority since 1950 or 1850?  You're probably right that it's 1950.  I don't think we should limit ourselves to an even more narrow category #1 though, it's already going to be tough for papers to fit in there.  I would suggest any paper saying AGW is responsible for most or a majority or something similar should go in the category, even if not explicit about the time frame.

2012-02-15 13:47:58Dana, this goes against everything I've experienced and believe in (DAWAAR) but I'm going to have to disagree with you for once
John Cook


The wording of our category definitions is important. So it's not just about whether the paper quantifies the human contribution but also that it explicitly says it. If you explicitly endorse A and A explicitly endorses B, then you are only implicitly endorsing B. You only explicitly endorse B if you mention B specifically. That's the very definition of explicit:


Stated clearly and in detail, leaving no room for confusion or doubt.

If the abstract mentions no details about quantification, then it's not explicit - it's implicit. Note - we don't need to be "padding" any of our categories. An implicit endorsement is an endorsement just as an explicit quantified endorsement is. In the end, we will probably collapse our categories into endorsement vs rejection for the take-home message.

Tom, yes, should be 1950. Oreskes 2004 citing IPCC 3rd Assessment Report:

In its most recent assessment, IPCC states unequivocally that the consensus of scientific opinion is that Earth’s climate is being affected by human activities: “Human activities … are modifying the concentration of atmospheric constituents … that absorb or scatter radiant energy. … [M]ost of the observed warming over the last 50 years is likely to have been due to the increase in greenhouse gas concentrations” [p. 21 in (4)].

Here is what the IPCC said in 2007, after Oreskes 2004:

Most of the observed increase in global average temperatures since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations.[12] This is an advance since the TAR’s conclusion that “most of the observed warming over the last 50 years is likely to have been due to the increase in greenhouse gas concentrations”.

So we follow Oreskes/IPCC's lead and the explicit with quantification definition will be "since 1950"

2012-02-15 16:02:37
Dana Nuccitelli

I'm kind of concerned about that because requiring them to explicitly quantify AGW is difficult enough, but also requiring that they be specific about post-1950?  Now you're really narrowing down the category.

Besides, the human contribution since pre-1950 is less than the contribution since 1950.  So for example if they say AGW >50% over the past century, they're also saying AGW >50% since 1950.  I understand wanting to be consistent with the IPCC definition, but I think that's taking it a bit far.  And while technically Oreskes used that IPCC definition, she didn't really stick to it.

2012-02-15 16:16:21Okay, good point about 1950
John Cook


None of them are going to specify the time frame but a number of papers do say "dominant cause" or "most of global warming".

So Dana, have I swayed you re "explicit endorse of IPCC" = "implicit endorse of AGW"?

2012-02-15 16:32:45
Dana Nuccitelli

No, not really :-)

But as I said in the other thread, it's not a big sticking point, and putting the IPCC endorsement type papers in category 2 or 3 will make that category stronger.  So I'm willing to compromise on that point.  Category 3 you think?  I guess technically that's correct.  Hopefully most papers that mention IPCC will also mention AGW so we can at least put them in #2.

2012-02-15 16:59:04
Ari Jokimäki


I disagree with post 1950 issue. I think paleoclimate papers that determine that CO2 affects climate with high climate sensitivity should be endorsement of AGW (explicit or implicit is another question) even if they wouldn't say anything about current warming.

Opening post says that abstract and title are used for ratings. I think it should be emphasized that we shouldn't peek at full texts at any case, because that will distort and cause biases to the rating process, as mentioned by the opening post.

Why you are setting a time-table for this? Surely we should do this carefully and take all the time we need to do it properly. Rushing it will just result in sloppy work. I also doubt that we will get anywhere near your 100 papers per hour on average.

I have one idea to suggest before we go on. It would be good if there would be a place for making a note while rating certain papers. You know, just for saying that this paper was difficult to decide if it goes to this or this category, or perhaps even highlighting what was it that made the paper difficult to rate. Having these notes would help very much when writing a discussion on uncertainties to the paper. It could also help the analysis of results.

2012-02-15 19:12:26
Tom Curtis


Dana, explicitly endorsing AGW  warming > 50% since 1950 is a less restrective category rather than a more restrictive category.  As you point out, if they endorse AGW warming >50% over the last 100 years, they thereby endorse >50% over the last 60 years.  But a paper that quantifies AGW warming as being >30% over the last century and >50% since 1950 would, under the current criterion count as not explicitly endorsing the IPCC position (which would be an incorrect classification).

With regard to Ari's point about paleoclimate, perhaps we should broaden the category still further by stating it as:


1)  Explicitly endorses and quantifies AGW as 50+% cause of the observed warming since 1950*, and/or explicitly asserts a climate senistivity x2CO2 greater than 2.

*  Papers explicitly endorsing and quantifying AGW as 50+% of the cause of observed warming of any period which includes the period since 1950 shall be counted as explicitly endorsing 50+% of warmign being caused by AGW since 1950 unless they explicitly contradict that claim.

2012-02-16 03:20:42
Dana Nuccitelli

I don't agree on the paleoclimate papers.  You can infer that high sensitivity means large AGW contribution, but that's your inference, not the conclusions of the authors.  Thus by definition it's not an explicit endorsement (same reason my IPCC endorsement suggestion got taken out of category #1).  Also paleoclimate sensitivity isn't necessarily the same as current sensitivity, so that's a further leap from the conclusions of the scientists themselves.

Adding climate sensitivity to the AGW definition is an interesting idea, but could over-complicate things, i.e. with the paleoclimate sensitivity issue.

Tom - adding 1950 would still be more restrictive because hardly any papers would be explicit about that.  Remember we're basing this on just abstracts.  The less stringent your requirement, the more papers you'll capture.  Besides, without the 1950 requirement, your example would still be captured because it says AGW >50% over some timeframe.

2012-02-16 04:14:17
Ari Jokimäki


It's not just about inferring from high sensitivity, but inferring from paper's mention of high sensitivity and GHG's affecting climate. So, at least I'm not suggesting adding only climate sensitivity there but a combination of high sensitivity and GHG effect, which is pretty much the same as AGW. But as there is some interpretation involved, these kind of papers might be best considered as implicit endorsements (these are much better implicit endorsements than some mitigation papers, in my opinion).

2012-02-16 05:53:16
Dana Nuccitelli

Agreed - I think John is right that we need to make sure we only put papers that explicitly say AGW >50% in category 1.  We shouldn't have to make any inferences.  Implicit endorsement (category 3) for high sensitivity papers is an accurate categorization.

2012-02-16 11:17:06Why am I setting a time-table?
John Cook


Well I didn't want to get into this just yet but you had to ask, didn't you, Ari? The reason for the deadline is because I want our paper submitted to a journal by a certain date, to ensure that it qualifies to get into AR5. It's a long shot, a very unlikely long shot but it can't hurt to give ourselves every chance. The reason for 2 months of ratings is it still gives us plenty of time for quality control and if we get the rating done in less time, we have even more time for quality control. I think that gives us plenty of time - if SkSers are as motivated as they were in harvesting emails, we should polish off the workload in way faster than that.

Agree re making notes. Will add a small field.

Tom, what Dana meant by 1950 being restrictive is that for a paper to explicitly endorse "most of warming is man-made since 1950", they would have to explicitly mention the time frame. Very few papers go into that much detail in the abstract. So that would eliminate all the papers that say "humans are causing most of warming" or "human activities is the dominant forcing" that don't happen to mention timeframe.

Ari, a paleoclimate paper about climate sensitivity doesn't necessarily have to endorse AGW. Technically speaking, sensitivity is about feedbacks, not about CO2 warming. So it's possible that a paper could be looking at past climate changes to measure feedback/sensitivity without any mention of CO2. However, if a paper looking at past climate change mentions that GHGs affect climate, that is considered an implicit endorsement.

Am going to update the top post based on feedback here, with changes marked in red.

2012-02-16 16:37:33
Dana Nuccitelli

John, I think it would help to add "does not quantify or minimize AGW" to category 2 and 3, just to make the distinction between the categories clear.  Otherwise if you get a paper that endorses but also minimizes AGW, technically it could go in multiple categories.  Don't want to confuse people or have category overlap.

2012-02-16 16:58:28
Tom Curtis


dana and John, if you look at my full recommended condition, as quoted below, you will see that it enables us to include as explicitly endorsing the concensus any paper that explicitly endorses any of the following:


1)  AGW warming >50% since 1950;


2)  AGW warming >50% for any period including the period 1950-2010 provided they do not explicitly exclude AGW >50% over the period 1950 to present.


3)  Any paper that explicitly endorsed x2CO2 > 2 C, provided they do not explicitly indicate warming since 1950 has been predominantly not from AGW, or do not explicitly indicate that CO2 increase over the 20th century has been primarily of non-anthropogenic origin.


I pesonally think the provisios for (2) and (3) will not restrict any papers from inclusion in endorsement level (1),  but the provisios need to to be made just in case.  

In contrast, endorsement level (1) will require use to categorize any paper that claims that AGW warming since 1950 > 50%, but that AGW warming since 1750 < 50% as category 7.  It would also require us to categorize a paper that argues from paleo data that the equilibrium climate sensitivy is 5 degrees C as endorsement level (3), ie, implicitly endorsing AGW.  I personally feel that both would be misclassifications, but that represents a problem with the current wording.

Once again, my recommended wording is:


1)  Explicitly endorses and quantifies AGW as 50+% cause of the observed warming since 1950*, and/or explicitly asserts a climate senistivity x2CO2 greater than 2.

*  Papers explicitly endorsing and quantifying AGW as 50+% of the cause of observed warming of any period which includes the period since 1950 shall be counted as explicitly endorsing 50+% of warmign being caused by AGW since 1950 unless they explicitly contradict that claim." 

The footnote is important and should be included explicitly in the paper, instructions to reviewers, and of course in the survey of authors.  Possibly it should be included as a subclause of the main text, but I do not feel that is necessary.

2012-02-17 05:46:17
Ari Jokimäki


"Ari, a paleoclimate paper about climate sensitivity doesn't necessarily have to endorse AGW."

Yep, that's why I suggested that a mention of GHG effect should also be present in the paper in question.

2012-02-17 13:24:34Have updated the official methodology at the top of this thread
John Cook


All updates marked in red. Am itching to start coding but have another look and post additional thoughts.

Tom, I'm just not sold on your much more complicated definitions, especially considering the current definitions are more complicated than I would've liked already.

Firstly, the climate sensitivity definition is covered by 3. Implicit endorsement. If A is high climate sensitivity and B is AGW and A implies B, then explicitly endorsing A only implicitly endorses B. That's the whole point of explicitness - they need to say the words "humans are causing GW" (or a variation).

Second, if our definition "since 1950" actually means "any period including from 1950", then why bother adding all that complicating text?

2012-02-17 14:36:26
Tom Curtis


Fair enough, John.  Your paper, your call!

2012-02-17 16:35:12A note re timing
John Cook


Next week, I hope to do the coding so we can get started on rating the papers.

But I probably won't get around to doing the coding for the emailing of scientists until March because of a big deadline I have to make by the end of February (I probably shouldn't even be working on coding the paper rating but I'm just really keen to get that process started asap). But as the paper rating is expected to go for several months, there's no urgency on emailing the scientists - that can happen any time over the next few months. In fact, just the fact that we send out that survey will let the cat out of the bag about TCP, particularly if skeptic scientists are on the mailing list, so I might be inclined to push that towards the end of the project.

Thought - if we have more than one scientist's email for a single paper, should we email them both? It might be interesting to see if scientists rate the same paper differently. Hmm, the mind boggles at the possibilities...

Another thought - getting the scientist's opinion makes Tom's idea of doing an additional survey of the full paper essential - so we can compare abstract rating vs scientist rating vs full paper rating. One would expect a closer correlation between scientist rating vs full paper rating than between scientist rating vs abstract rating. But we'll need the data to confirm.

Also, I wonder how long it takes scientists to respond. Even just plotting a distribution function of how long it takes scientists to respond to the survey will be a useful statistic for future reference. Goodness, there'll be so much fun data to play with! :-)

2012-02-17 16:37:41
Dana Nuccitelli

Wouldn't hurt to try multiple scientists on a given paper if we have multiple emails.  It increases the odds of getting a response too.

2012-02-17 20:09:48
Ari Jokimäki


If a paper for example measures how much AGW contributed to recent warming, what is the category of the paper? It is not methods as it gives results. It is also not mitigation, paleoclimate, or impacts. It clearly relates to climate so it's not unrelated either. It seems to me that the most important category is missing, i.e. climate science (we touched this issue already some time ago, I think).

2012-02-17 21:07:00
John Cook

It would be methods. It's only falls into the other 3 (mitigation, paleo, impacts) if the results pertain to one of those 3 areas. If not, then methods, which is kind of the "everything else" category.
2012-02-17 22:47:01
Ari Jokimäki


Why is that? It seems to me that "methods" category will not be very useful when analysing results. If it would be genuinely methods category (for methods papers only), then it would contain meaningful information. With this category scheme you can't get a subsample of most relevant papers very easily. I suggest following categories:

Contemporary climate change (the most relevant papers)


General climate science





Not related to climate

2012-02-18 22:26:24
John Cook


Now I know it was Dana and I arguing for a more complicated "endorsement level" system while you argued for a simpler system.

In this case, however, I must argue for a simpler "category" system. There has to be a balance between complexity/depth of information and simplicity/ease of use. What decides which end of the spectrum you lean towards is bang for your buck. Adding those extra categories doesn't to me add information that I find particularly interesting. We started this project intending to replicate Oreskes 2004 but the focus has gradually turned towards measuring the level of endorsement. The quantification of endorsement vs rejection will be the key results from this paper, hence the bang for buck by adding complexity to endorsement options is high.

I'm lukewarm about adding these extra categories for two reasons. Firstly, we're adding complexity for what is, frankly, the least interesting part of the paper. I don't really care how the different categories (paleo, impacts, etc) evolve over time - it is largely of academic interest.

Secondly, the new categories are a little fuzzy. You talked about adding more ambiguity every time you add an extra category. This seems to add a lot of extra fuzzy areas between categories, making it harder to choose.

The only reason I can see to add extra categories is to provide information for your consensus over different journals analysis. But I would've thought the # of "not related to climate" papers vs the other papers would provide the info you need for that analysis.

2012-02-18 23:26:59
Ari Jokimäki


I agree that extra categories add more fuzz, but in this case categories are not that difficult to determine (especially if we forget about the contemporary climate change and add only the general climate science). At least change the name of methods category. It is really misleading currently. Perhaps something about miscellaneous climate related.

2012-02-19 01:20:22Methods = Miscellaneous
John Cook


My understanding of Naomi's methodology is anything not falling under Impacts, Paleoclimate or Mitigation fell by default into Methods. So for all practical purposes, Methods = Miscellaneous

2012-02-19 11:07:58
Dana Nuccitelli

So if an abstract talks about global warming, and maybe the impacts of continued warming without every saying anything about it being anthropogenic, is that implicit or neutral?

2012-02-19 11:42:25
John Cook


If in doubt, neutral.

The one thing I realised when doing the initial paper rating was this isn't about trying to plump the endorsement numbers, looking really hard to find a reason to make as many papers endorsements. It's inevitable we underestimate the # of endorsements by working off the abstract. If it turns out that the scientist ratings and the full paper ratings find a higher level of endorsements, that will be one of the take-home messages of our paper - there's a strengthening consensus and that's from an underestimate!

2012-02-19 17:52:07
Ari Jokimäki


I doubt that using only abstract biases sample to that direction. If there are some doubts on the issue, they are more likely to be presented in the full text, not in abstract. I think that there are lot of borderline rejections that will end up in neutral due to that. So, I think there are biases in each direction.

By the way, the work has not yet been done, so we don't know what the take home message will be.

2012-02-21 15:16:02
Sarah Green

I'm confused by:

"Papers about climate policy (specifically mitigation of GHG emissions) unless they restrict their focus to non-GHG issues like CFC emissions in which case they're neutral" (Implicit)

What about papers that discuss the GHG effects of CFCs? I call that explicit since all CFCs are anthropogenic.

2012-02-21 15:27:13CFCs and other non-GHGs
John Cook


Sarah, this is a good question and what Ari and I have been discussing. The definition of AGW is "humans are causing global warming". The lion share of that warming is caused by CO2. So is acknowledging that CFCs are a greenhouse gas without discussing CO2 equivalent to saying "humans are causing global warming"? To me, it's a bit of a stretch but more importantly, can you defend that assertion when our paper gets blasted by the denialosphere. I don't think it's a very defendable position. I think it's better to underestimate than overestimate.

2012-02-22 15:38:23future warming = implicit
Dana Nuccitelli

I'm seeing a lot of papers that talk about predicted global warming, or the impacts of forthcoming global warming - wording like that.  I think those are implicit endorsements because predictions of future global warming are based on the AGW theory.

These are often biological papers, and the authors are pretty clearly deferring to the AGW consensus and then examining how the climate change will impact their area of expertise.  To me that's a pretty clear implicit endorsement of AGW, agreed?

2012-02-22 16:11:34I agonise over this one too
John Cook


To say "future global warming", you *know* what they're talking about. It's not like global warming is expected to happen naturally (to my knowledge, every "it's natural" argument I've heard has predicted imminent cooling). Is that assumption defendable though? Will it open us to "they're padding the endorsement numbers by assuming any mention of future warming means AGW"? Welcome additional thoughts from others on this one.

Note - we are not past the point of no return and the methodology isn't locked in concrete even though we are 1000 papers into the rating. Note that each paper gets rated twice so this kind of discussion will help us sharpen up and clarify our methodology so that when we get to the second rating for each paper, we will be more consistent and any early inconsistencies will be identified by not matching up with the second rating.

2012-02-22 16:15:38
Dana Nuccitelli

There are a few 'minimizing AGW' papers that predict continued warming (i.e. Scafetta).  But honestly, if you're not a climate scientist, you're not going to even be aware of those few minimizing papers.  You're going to defer to the consensus and see what the implications are for your field of study.  Or if you did believe the minimizers were correct, you would say so and then examine the implications of minimal warming for your field of study.

I think we're on pretty safe ground to call impacts papers that refer to future warming 'implicit endorsements'.

2012-02-22 16:39:31Here is an example
John Cook


An abstract starts with this:

With recent predictions for global climate warming, the question arises as to how changes in temperature influence the dynamics of populations in natural communities.

I dunno, Dana, this is right in the middle of the fuzzy area between implicit endorsement and neutral. I'm on a knife edge. However, the fact that there exists papers that predict future warming and yet minimize AGW has me leaning towards dropping them into neutral. Otherwise, we open ourselves to the criticism "according to your definition, Scafetta's paper is an endorsement hence your result is invalid". However, would like to hear some others weigh in on this issue.

For the record, however, in this particular abstract, they went on to say:

Some implications of temperature increases expected under current global warming scenarios in pond systems are discussed. 

Which I take to mean warming under IPCC emission scenarios which pushes it into implicit warming.

2012-02-22 16:45:32
Dana Nuccitelli

Assuming they then went on to examine the influence of future warming on natural community populations, I'd call that an implicit endorsement.  Recent warming predictions are based on AGW, with very few exceptions.

Scafetta isn't an endorsement because it minimizes AGW.

I agree input from others on this issue would be helpful.

2012-02-22 16:49:58
John Cook


Well, in lieu of other input, I'll keep the echo chamber going by suggesting this possible guideline that takes Scafetta into account:

Paper is implicit endorsement if it refers to predictions of future global warming without minimising the role of AGW


2012-02-22 16:53:34
Glenn Tamblyn


To throw a different perspective into this discussion, if this were medical research these very discussions would be breaking much of the randomised double-blind nature of the methodology. Such a study would be trashed as having a poor design. Arguably JC shouldn't even be participating in rating them - that is part of the double-blind

Perhaps a better approach is that John has published guidelines for what the different categories mean. Now each reviewer should interpret those guidelines as they see them without further discussion. That is part of the randomization, that different individuals will have somewhat differing biases including what the guidelines mean and by having many reviewers randomly looking at different papers, that creates the randomisation and hopefully cancels out the biases. Discussions such as this break the central nature of the methodology being used.

2012-02-22 16:56:21
Ari Jokimäki


In my opinion papers discussing future warming without mentioning GHGs or human influence are neutral. Also John's last example would be neutral in my book. Without reference we don't know what scenario they mean. We are doing scientific study here so let's try not to introduce too many assumptions to this already subjective situation. Let's err on the side of caution.

2012-02-22 17:01:43Double blind survey
John Cook


Glenn, you know more about these matters than I do but the end goal of our study is to produce the most accurate assessment of the level of consensus rather than a measure of the opinions of SkSers. I don't see myself as an experimenter doing a survey of SkSers' opinions but as part of a team helping assess the papers. So isn't some refinement of the guidelines to add clarity an appropriate measure to ensure a more accurate result?

Note: I'm open to being educated on these matters.

2012-02-22 17:24:26
Glenn Tamblyn


John. The most accurate assessment includes minimisation of the impact of the biases of the reviewers. And since the collection of people selected to be reviewers, us truely, has not been randomly selected, reviewer bias is an intrinsic issue since the ratings have a degree of subjectivity. Lets be honest, we all have a bias towards wanting the research to show what we hope. This is not to diss the project or anyone here - we are all acting diligently and with integrity. But the most diligent person still has biases, its unavoidable.

Critics of this study will look at the methodolgy and look for weaknesses in the methodology. And I don't just mean deniers. Honest academics that might use these results would still look at its methods in evaluating the value of the study - that is what we are always arguing wrt the climate scientists.

Since this study has the potential to produce important results, making the method as robust as possible as far as criticism is concerned is important - we saw how deniers tried to diss Oreskes' results. And since deciding which categories apply is a subjective judgement, bias is an aspect of the study. It must not only be scrupulous, it muust be seen to be.

While it was reasonable to discuss what the criteria meant prior to starting the ratings, discussion of the criteria and how to apply them to now could be seen as modifying the study while it is in progress.

So I think now the Cone of Silence should descend while the ratings are done. Cheer each other on as far as the count is concerned, but don't discuss ratings at all. If a reviewer finds an abstract to hard to classify, skip it and those ones can be dealt with at a later stage.

2012-02-24 10:16:55Making the method as robust as possible
John Cook


I agree, we need to make our study as robust and immune to criticism as possible. A key question here is what will people criticise about this study? The process we followed, or the results? I would argue that the criticisms will be of the form "hey, they defined XXX as implicit endorsement but YYY paper doesn't really fit that, they're plumping the numbers". So sharpening up exactly what we do and don't count as endorsements is of paramount importance.

Now we can either do that now or do it in the quality control stage, after every paper has been rated twice and we check each other's results. From a process point of view, at some point, we're going to have to start discussing definitions and specific papers and decide collectively over the trickier ones. But doesn't it make more sense to have that discussion now rather than after we've done all the rating?

This issue needs to be resolved one way or the other sooner than later as these discussions are happening in other threads as we speak. So do we or do we not consider modifying the guidelines now? I think yes because we're going to have to modify them later during the quality control period anyway - more efficient to do it now. Any other thoughts?

2012-02-24 11:05:41
Dana Nuccitelli

Well, if we settle it now then there will be less work later.  If we don't settle it now, there will be more disagreements, and we'll have to sort out the proper categorizations later anyway.

I'd say it's better to get the methodology sorted now.  Especially if it's sorted in the direction I want ;-)

Ultimately we're still going to have these disagreements about implicit vs. neutral though, because it's a gray area.  The porn method inevitably comes into play to some degree.  Papers talking about the impacts of future warming 'feel' like endorsements, and it's pretty damn hard to argue otherwise.

2012-02-24 18:46:17
Ari Jokimäki


It is very easy for me to argue otherwise and I have done that already above. I'll just add that for scientists it is not unheard of to study some subject without fully endorsing all the things that lead to that. They just might see some subject interesting to study and they then note that the subject also relates to that global warming issue, even if they in their heart would be deniers. We have to stick to what the abstract says. If we want to get their opinion, we should ask them (by sending loads of e-mails to scientists for example ;) ), not guess. Furthermore, if there is a gray area, the neutral is the side of caution. The issue about mitigation papers also is in the gray area in my view, for the same reason as the impact papers.

I also noticed a chance to nit-pick in John's post, so I would like to take that chance. I don't think the key question here should be what critics will say but what is correct from scientific point of view. However, if you think critic's feedback is important, I think these issues with impacts and mitication are just sort of things they will note, because these look awful lot like they are designed to increase the endorsement statistics. I would also take it that far that I think the referee should demand us to take out the rule of default implicit endorsement for mitigation papers before accepting the paper, because I think it is clear methodological flaw. Sorry if this sounds harsh, but I think we have to air our thoughts on these issues. Remember always that it's the issues that argue, not people. :)

2012-02-24 20:41:58


Following Glenn, the most important methodological weakness I see is the selection of the reviewers. The second is the potential bias of the large number of neutral ratings.
I can't see any workaround for the former. The latter requires some thoughts and can be dealt with later.

2012-02-25 00:09:39Lots of comments to respond to
John Cook


A few points here. First addressing Ari's comments:

  1. Agree that our study focuses on what abstract says. We don't speculate on the author's personal opinions but only go on the words in the title and abstract. At this point, I don't even plan to ask the scientists what their opinion is about AGW, just ask them to rate their paper (unless we deem that important data to collect - but sending the emails is a month or two away so plenty of time to decide that).
  2. Also agree that if in doubt, rate neutral.
  3. Impact papers don't necessarily mean implicitly endorsement - they usually refer to impacts from climate change but unless linked to carbon emissions, you can't assume they implicitly endorse AGW. The guidelines don't say anything about impact papers being implicit endorsements.
  4. Re your nitpick, that's a fair enough comment. Ideally, "being immune to criticism" should be synonymous to "what is correct from a scientific point of view". Eg - I want our results to be both as accurate as possible and as bullet-proof as possible - which I would hope are the same thing.
  5. Now moving onto the most serious of Ari's points. I think that if a paper is about "mitigating carbon emissions", then it's a safe assumption that the paper is implicitly assuming carbon emissions cause global warming, unless there's something in the abstract that indicates otherwise. But the fact that Ari thinks differently is quite a serious difference of opinion. Would be interested in hearing other thoughts on this matter.

Now addressing Riccardo's comments:

  1. Re the selection of the reviewers, there are two ways that we address this potential bias. First, we compare our results to the self-ratings from the scientists who wrote the papers. If there is an SkS bias in our ratings, it will be apparent by our endorsement levels being higher than the scientists' self-ratings. If we do get that result, that will be a cause for concern. If it turns out we underestimate the level of endorsement, that will strengthen our result. So awaiting the scientists' ratings will be a nervous moment, for sure! :-)
  2. More on the comparison between scientist ratings and our abstract ratings. There can be some really interesting analysis here. Not just comparing the overall level of consensus. We can also compare over the different range of endorsement levels. Eg - if we just compare the papers that we rated neutral, how did the scientists rate those. I'm anticipating a greater discrepancy over the neutral part of the spectrum but more consistent results at the extremes. Also interesting will be to see how the discrepancy changes over different categories. Will we underestimate the level of consensus for mitigation or impacts papers? It's going to be fascinating but probably also nerve wracking when the data first comes in, to see how we compare.
  3. More on the selection of the reviewers. I do have another idea on how we can do a further check on our ratings, not from a scientific point of view but from a communication/outreach kind of way. Disclaimer - this is just a germ of an idea at this point. After our results are published, we have an interactive feature on the website where the public can rate a random selection of papers themselves. As many as they like. Then they can compare their ratings with ours. Maybe I'll get fancy shmancy and figure out a way to plot their results versus ours. It's a way for users to experience the rating process themselves and makes our results transparent and interactive. Could be interesting. I might even turn that feature into a social science experiment and see whether level of interaction correlates with agreement with the results (would be fascinating if there is polarization with greater interaction means more agreement from progressives and more skepticism from conservatives).
  4. As for the large number of neutral, our abstract methodology does mean there'll likely be a bias towards a larger number of neutral ratings. This means it's expected that the scientist self-ratings will have a higher consensus rating than our result. Our rating of full papers will also have a higher consensus rating than our abstract rating. If these two results come to pass, then that will also strengthen our result, show that we're underestimating the level of consensus. And it gives us somewhere to go for Phase 3.

Lastly, everyone but Dana is dodging the gorilla in the room - the issue we need a consensus on asap:

  • Do we update the Guidelines now based on discussion of contentious papers?
  • Or do we rate based on the original guidelines and only discuss clarifications after all the ratings have been done?

As I said before, sometime during this process, we're going to have to confront contentious types of papers and come to a consensus on how we rate them. Either now or after we've done 24,000 ratings. Seeing as it's something that has to be done sooner or later, it seems logical to me that we do it now. Dana agrees. The last I heard from Glenn, he disagrees. Any other thoughts?

2012-02-25 01:12:32
Ari Jokimäki


I think updating guidelines now would be better.

"I think that if a paper is about "mitigating carbon emissions", then it's a safe assumption that the paper is implicitly assuming carbon emissions cause global warming, unless there's something in the abstract that indicates otherwise."

The same I said about the impact papers applies to here also: the scientists might just study something they find interesting that just happens to relate to global warming. There's no need for them to agree that global warming is anthropogenic. Furthermore, this explores to the territory where you are guessing the opinions of the authors, not going by what it says in the abstract. I already pointed out a paper where authors said it's debatable even if it was mitigation paper dealing with GHG emissions. I have seen other one where authors described that it is generally believed that GHG's cause global warming, which is very neutral way of saying it. There also has been several papers discussing mitigation issues where global warming has been only one minor aspect of the whole issue. Another twist are the papers where they are discussing GHG emissions because there's a policy pressure to reduce them, not because they think they cause global warming.

2012-02-25 04:34:03
Dana Nuccitelli

Despite my comments about implicit vs. netural, I've rated a lot of impacts and mitigation papers as neutral.  Reason being, it depends on the wording.  For example,

  • if a paper says "the planet is warming, here are some impacts", that's netural, because it's not talking about future warming or attributing the warming to anything in particular.
  • If a paper says "global warming might continue, here are some possible impacts", that is an example of what Ari argued in this comment, in which case I agree it's neutral.  They're just looking at a hypothetical situation.
  • If a paper says "future climate change will have these impacts", that to me is clearly an implicit endorsement.  They are endorsing that the planet will continue to warm, which is a position based on the AGW theory.

Similar for mitigation, if they don't say why they're trying to mitigate CO2 emissions, I don't think we should assume it's an AGW endorsement.  Maybe they're concerned about acidification, or just want to make money or something.  Most mitigation papers make some statement about global warming that puts them into an endorsement category though.

I still agree we should sort out this issue now rather than later.  We want to remove as much subjectivity as possible, and right now this disagreement is introducing subjectivity.

2012-02-25 06:01:46


No doubt a grain of salt is needed whatever rule may apply. Otherwise a computer could do the job, much faster.

2012-02-27 13:18:55


If there is now a consensus on the consensus project rules / categories can we please have a tidy version as a 'sticky'.

It would be easier for noobs like me who have to keep reading up on the categories just to be sure of doing things right.


2012-02-27 16:13:47Sticky thread now added
John Cook



2012-02-27 18:48:06
Ari Jokimäki


Now that mitigation paper rating rules changed, how can we revisit mitigation papers that were rated before this time (at least my ratings will be different as I rated them according to old rules)?

2012-02-27 22:19:35Rating existing papers
John Cook


I will set up a "My Ratings" page but not for at least a day or two as I'm going to Sydney tomorrow for the myth busting evening and am madly preparing slides to cover as many myths as possible.

Of course, at the end when we compare each other's ratings, there'll be plenty of revisitation then too. I imagine/hope we will have worked out any kinks by the time we get to the halfway point of 12,000 papers so that when we all rate papers for the 2nd time, we will be more confident about what we're doing.

2012-02-27 22:24:30
Ari Jokimäki


Thats' good but will there be a possibility to see which ratings were done before this date (is there a timestamp associated with the ratings or are they ordered in same order as they have been rated temporally)?