My last post has received a fair amount of pushback on Twitter despite nobody seeming to be able to find any error with it (save for some typos which I am glad to have fixed). There have been claims that I was wrong to accuse anyone of dishonesty, but in terms of objective matters, The only issue I can find where anyone has said I am wrong arises where I said:

Now, any curious reader would naturally wonder, "Why is Zeke talking about what Hansen's model could have projected from 1958 to 2018 when the model was made/used in 1988?" Zeke doesn't offer an answer. In fact, he doesn't even bother to refer to the issue, not even obliquely.

Zeke tells people we shouldn't judge Hansen's model by what he guessed would happen in the future because Hansen had to try to predict things about human behavior which he couldn't possibly hope to predict, but then, Zeke silently changes the the topic. Instead of looking at what Hansen's model might say about the future (as of 1988), Zeke says we ought to look at what Hansen's model says would happen from 1958 to 2018. That's a huge change in topic, one Zeke not only fails to discuss but completely glosses over.

Is it possible Zeke doesn't understand the difference between judging a person's projections about the future on what happens in the future and judging those projections on what happened in the past? No. That'd be stupid. Zeke clearly knows better than that. He knows fully well that including 30 years of what a model says about what has already happened along with 30 years of what will happen is misleading. He does it anyway.

I can find no defense for claiming it is okay to judge the skill of a model designed to project changes for the future by using 60 years of data, 30 of which were available at the time the model was made. A model was made to try to glean insight about the future, looking at decades of data from the past, and some people 30 years later are saying it is okay to judge that model by comparing its results to data where half of that data was used in creating the model.

That claim is... well, bizarre. I was going to offer a humorous demonstration of why by creating a fake model which would generate results which claim the planet would start cooling after 1988. That obviously didn't happen, yet my fake model would pass the very same tests I criticized with flying colors. My thought was that'd be a funny way of showing how nonsensical these "tests" are.

But then I decided I didn't want to. This issue isn't one I want to joke about. The joke version of this post would have been way more fun, but it also would have been less insightful.

To start, let's go over a couple quick details. For the tests we'll be looking at today, nobody is interested in whether or not a model can actually predict what will happen in the future. Everyone agrees the famous model 1988 James Hansen came up with did a bad job of that.

People have offered various reasons for that. A Skeptical Science article says:

If we take into account the lower atmospheric greenhouse gas increases, we can compare the observed versus projected global temperature warming rates, as shown in the Advanced version of this rebuttal. To accurately predict the global warming of the past 22 years, Hansen's climate model would have needed a climate sensitivity of about 3.4°C for a doubling of atmospheric CO2. This is within the likely range of climate sensitivity values listed as 2-4.5°C by the IPCC for a doubling of CO2. It is even a bit higher than the most likely value currently widely accepted as 3°C.

In short, the main reason Hansen's 1988 warming projections were too high is that he used a climate model with a high climate sensitivity. His results are actually evidence that the true climate sensitivity parameter is within the range accepted by the IPCC.

According to this article, Hansen misjudged how much human activity there would be in the future in terms of things like burning fossil fuels, but more importantly, his model had a climate sensitivity which was too high. The article says the correct sensitivity with this data (given this particular test) would be 3.4C. Hansen's model had a sensitivity of 4.2C.

This shows two key factors at play. To predict what will happen to the climate in the future due to human activity, we need to predict what that human activity will be and how much the planet's temperatures will change in response. You might get one right and the other wrong. If so, your final results will be wrong. This means when judging a model in this way, we can't just say it got the right/wrong results. We have to consider why it did.

My post yesterday came about after a person on Twitter named Zeke Hausfather told everyone Hansen's results were wrong solely because he misjudged human activity, that if we fix that error, Hansen's results would have been spot on. Specifically:

A better test of Hansen's model is to see if the relationship between radiative forcing and temperature matches observations. This removes all the uncertainty involved in predicting future emissions, and just tests the accuracy of the physical model. 4/8

— Zeke Hausfather (@hausfath) June 22, 2018

On this more meaningful metric, Hansen's projections across all his scenarios are very similar to observations; the amount of warming we've seen (~0.45C per w/m^2) is nearly identical to Hansen's projections (0.44C to 0.48C per w/m^2) between 1958 and 2017: 5/8 pic.twitter.com/E7nXyeIA9z

— Zeke Hausfather (@hausfath) June 22, 2018

I take issue with this "test" for a number of reasons, but let's finish examining his case before going into that. Zeke went on to say:

This is even more clear if we compare the trends (the linear relationship between forcing and temperature) between Hansen's model and observations: 6/8 pic.twitter.com/U1TgFZosHi

— Zeke Hausfather (@hausfath) June 22, 2018

In an informative RealClimate post today, @ClimateOfGavin estimates how temperatures would have changed over time if Hansen's model used our current best estimate of observed radiative forcing: https://t.co/0fYJxvfp2d 7/8 pic.twitter.com/waBeoypy0v

— Zeke Hausfather (@hausfath) June 22, 2018

The image Zeke offers here shows an incredible match between the trend lines of each of the scenarios Hansen's model considered and the trend line with actual results. The relationship between the change in radiative forcing (basically, human activity) and the change in temperatures projected by Hansen is nearly identical to the observed relationship. That indicates his model was awesome.

Only, when you look into it, nothing about that seems true. First, there's a small oddity where the chart has labels saying "since 1960" yet Zeke's tweets said the results were since 1958. What period was actually used? I don't know. One thing I do know is predicting a relationship between two variables over the period 1958-2018 in 1988 would be far, far easier than predicting the same relationship over the period 1989-2018. That's because predicting results you already have is quite easy to do.

Additionally, I know the lines shown here are not "trend lines" in any meaningful sense. As I wrote in my last post:

The lines Zeke shows are meaningless. What he did was create two linear models over data and generate the slope of the line created by those models. That slope was in the form of Ax + By, with A being the slope and B being some constant value added to it (thus setting the baseline for the model).

Each model had its own A and B. Zeke decided he wanted to show what each model's A value was. However, he knew showing a single numerical value for each A would be uninteresting to most. So instead, he decided not to show what A's value was. Instead, he decided to create a graph with a line whose slope was A for each model. That is, to compare single numerical values, Zeke decided to show a graph whose results relied upon multiple parameters... and pretend that was the same thing. The most immediate impact is showing lines like this makes the visual impact far greater than just showing numerical values (e.g. .44 and .48 seem nowhere near as similar as lines in that graph). Another impact, however, is the B parameter in Zeke's models. He simply ignored it. He calculated that parameter, then he... just threw it away because accurately showing his results would not create as compelling an image as what he could create via deceit.

There is more to say about that, but the point is the "trend lines" shown in this figure are really just single numerical values given as parameters by linear models. Zeke has, for whatever reason, chosen not to show them as single numerical values. Instead, he shows them as lines whose length is fixed by the range of the underlying data. Why? I have no idea. If I want to compare the speed of two cards, I say one is going 40 MPH while the other is going 50 MPH. I don't create graphs with lines whose slopes are those values.

But... whatever. Zeke chose to use this approach so let's roll with it. An obvious question is what would Zeke's approach give if we didn't include data in our verification tests for a model which was used in making the model? Zeke provided an answer. Here are two graphs he provided in it:

The one chart says "since 1990" but it's not clear if that is really 1990 as the other chart says "since 1960" while Zeke himself said the results were for 1958 on. I won't try to guess what the right years are. What I will do is note Zeke says "whether or not you test Hansen's model against obs from when his runs begin (1958) or when they were published (1988) does not meaningfully change the results. Both are reasonable choices..."

I have an incredibly difficult time reconciling that claim with what I actually see. To consider it more clearly, let's extend those "trend lines" to better show what the relationships they represent are:

In one chart, Zeke shows four lines which are virtually identical. In this chart, we see lines with significant visual differences. It may be that Zeke does not consider such differences "meaningful," but I do. I am sure at least some other people would as well.

Even people who don't think the visual differences are "meaningful" are likely to agree the differences are meaningful when they consider the numerical values. As a rough eyeball estimate, the red line (Scenario A) has a sensitivity of ~0.4C to every 1 w/m^2. The green line (scenario C) has a sensitivity of ~0.8C to 1 w/m^2. That's twice as large.

I can't understand how Zeke concludes there is no meaningful difference between having a variance of ~10% and having a variance of ~100%. The fact including the 1958-1988 period in ones test decreases the variance by An order of magnitude seems to show exactly why you shouldn't include data used in creating a model in your verification tests of that model.

Let's redo this ourselves. NASA GISS published data files for Hansen's 1988 paper listing the radiative forcing and temperature values used in and generated by his three scenarios. To begin, I read the two tables into variables forcings and temps. I then create two plots With colors chosen to match Zeke's for each scenario, giving these two charts:

Those seem to match Hansen's published figures, though the data files don't contain observed data (as such observations didn't exist at the time). This means we won't be able to check Zeke's black lines, but we should be able to look into the results he provides for Scenarios A, B and C. Doing so is as simple as performing basic linear fits. First, we'll do so over the full 60 year period 1958-2017:

> lm(temps[1:60,2]~forcings[1:60,2]) Call: lm(formula = temps[1:60, 2] ~ forcings[1:60, 2]) Coefficients: (Intercept) forcings[1:60, 2] -0.1043 0.4198 > lm(temps[1:60,3]~forcings[1:60,3]) Call: lm(formula = temps[1:60, 3] ~ forcings[1:60, 3]) Coefficients: (Intercept) forcings[1:60, 3] -0.1566 0.4708 > lm(temps[1:60,4]~forcings[1:60,4]) Call: lm(formula = temps[1:60, 4] ~ forcings[1:60, 4]) Coefficients: (Intercept) forcings[1:60, 4] -0.1463 0.4567

Zeke said, "Hansen's projections (0.44C to 0.48C per w/m^2) between 1958 and 2017" which is close, but not quite the same as what I get. The results I get are .42C for Scenario A, .47 for Scenario B and .46 for Scenario C. Let's try doing the same thing, but only looking at data for 1988 on (perhaps 1989 should be used as the starting point, but I like having 30 years for the round number):

> lm(temps[31:60,2]~forcings[31:60,2]) Call: lm(formula = temps[31:60, 2] ~ forcings[31:60, 2]) Coefficients: (Intercept) forcings[31:60, 2] 0.02081 0.37580 > lm(temps[31:60,3]~forcings[31:60,3]) Call: lm(formula = temps[31:60, 3] ~ forcings[31:60, 3]) Coefficients: (Intercept) forcings[31:60, 3] -0.3581 0.5839 > lm(temps[31:60,4]~forcings[31:60,4]) Call: lm(formula = temps[31:60, 4] ~ forcings[31:60, 4]) Coefficients: (Intercept) forcings[31:60, 4] -0.6806 0.8684 >

The ratios in this case are .38C (per w/m^2) for Scenario A, 0.58C for Scenaio B and 0.87c for Scenario C. That matches Zeke's results to a large extent, with the sensitivity for Scenario C being more than double the sensitivity for Scenario A. But when we plot the lines given by these models:

We see the lines are even more divergent than Zeke showed. The reason for this can be found in what I said yesterday, quoted above:

The lines Zeke shows are meaningless. What he did was create two linear models over data and generate the slope of the line created by those models. That slope was in the form of Ax + By, with A being the slope and B being some constant value added to it (thus setting the baseline for the model).

Each model had its own A and B. Zeke decided he wanted to show what each model's A value was. However, he knew showing a single numerical value for each A would be uninteresting to most. So instead, he decided not to show what A's value was. Instead, he decided to create a graph with a line whose slope was A for each model. That is, to compare single numerical values, Zeke decided to show a graph whose results relied upon multiple parameters... and pretend that was the same thing. The most immediate impact is showing lines like this makes the visual impact far greater than just showing numerical values (e.g. .44 and .48 seem nowhere near as similar as lines in that graph). Another impact, however, is the B parameter in Zeke's models. He simply ignored it. He calculated that parameter, then he... just threw it away because accurately showing his results would not create as compelling an image as what he could create via deceit.

Why did Zeke choose to ignore one of his model parameters while claiming to present "trend lines," aligning the series to maximize their apparent agreement? I don't know.

Why did Zeke choose to promote the near-perfect verification results he got for Hansen's model when half of the data used in that test was also used in creating the model? I don't know.

Why did Zeke say going from having one's climate sensitivity values be almost identical to having one be more than twice as large as another isn't meaningful? I don'r know.

What I do know is every single decision Zeke made downplayed the uncertainties/errors in Hansen's model, and he was completely aware they would each do so.

There are a ton of other questions, concerns and problems with Zeke's defense of Hansen's model. I'm not going to try to cover all of them. I'm not here to try to say Hansen's model was "good" or "bad." I'm here to say when climate communicators tweet out statements like:

Hansen's projections across all his scenarios are very similar to observations; the amount of warming we've seen (~0.45C per w/m^2) is nearly identical to Hansen's projections (0.44C to 0.48C per w/m^2) between 1958 and 2017:

And that's considered normal and acceptable despite charts like the last one I created, it isn't difficult to see why climate communicators constantly run into problems with people mistrusting them. If climate communicators want to gain the trust of more people, the first thing they should do is be up front and clear about uncertainties, questions and errors. As long as they don't, people will rightly mistrust them.

There's about 200 odd comments on this subject over at ATTP - if your

will to live is strong enough. People (again, on all sides) seem to have

very low standards if something confirms what they already believe. If

it supports a particular POV, no matter how tendentious or even

wishfully interpreted, it becomes part of a 'consilience' of 'evidence'.

I was banned from that aite years ago, even though I never did anything there which was remotely out of line. Anders just didn't like that I implied he was being dishonest in an exchange we had here (he was being dishonest) so he banned me from his site.

But yeah, people's standards are not the same for things they like and things they dislike. See my latest post for an example. Or the poat I hope to write today. Not sure I'll be able to get it done though. I broke my finger yesterday. Typing is a bit rough now.