A Mathematical Interlude

So if you've read my last few posts, you can probably tell I could use a bit of a break. I was considering taking a few days off. I probably should. I just don't know if I could bring myself to. So instead, I'm going to try doing something different. I'm going to talk about math.

You might not know it, but I love math. I think math is beautiful. One of my biggest regrets in life is I never got much formal training in math. Even so, I think I still understand it in a more fundamental way than the average person. That's because to me, math is more philosophical than practical. I like the logical structure it imposes on arguments. I like how it makes you think in a rigourous manner.

So today I'm going to discuss math. It's about some work by Richard Tol, and if you've followed my blog, you'll know there is history there. That's not important though. The data errors previously found in Tol's work aren't important either. All that matters is the implicit argument found in one of his papers: the less data you have, the more certain you are of your results.

That argument is obviously absurd. We could just laugh it off and move on. That'd be a bad idea though. Silly ideas often get disguised with verbiose terminology and complicated equations. Once that happens, it stops being easy to tell they're silly. Then you can't just laugh them off. Then you have to be able to explain what makes them wrong.

To begin with, Tol uses this data:


More or less. The data might not match up exactly because the data set has been changed more than half a dozen times due to people finding errors in it. That's pretty remarkable given the values are supposed to just be taken straight out of papers. As in, Tol was supposed to have read ~20 papers, looked at numbers in them, copied them down and plotted them on a graph.

Regardless, any differences between this version and the one used in the paper we're looking at should be irrelevant for our purposes. What matters is we have ~20 data points plotted on an x-axis ranging from 0 to 5, almost all of which are below 0. Only one point is above 0 by any meaningful amount, and that is at 1 on the X-axis. Suppose you wanted to determine what relation this data indicated between the variables for the x and y axes. What would you do?

One option would be to draw a line through the data. Tol did this. The first time he did it, it was with an earlier version of this data set. That got him this image:


Later on, after he corrected some errors in the data and added new data in, he tried drawing a new line. He got this one:


That's a pretty different looking image. The reason is when you only have ~20 data points, it's hard to tell what relationship there is between two variables. You pretty much just have to guess. Tol guessed there was a quadratic relationship. That is, he guessed: f(x) = ax^2 + bx. But the relationship could have just been parabolic: f(x) = ax^2. Or linear: f(x) = ax. Or cubic: f(x) = ax^3 + bx^2 + cx.

All of these are "models" he's "fitting" to the data. That is, he's guessing at a relationship the two variables might have to one another, and he's then finding out what numbers would best fit that relationship. There are an infinite number he could try, and none of them are "right." It's just a matter of which give results that seem to be reasonable.

So with that in mind, it's obvious a model that changes greatly when a small amount of data changes is not a very good model. In order to have much confidence in one's work, it should be robust to the removal of small amounts of data. Since Tol's earlier model wasn't, finding a new model seems wise. The work I want to discuss today does. It's a fairly complicated one. I'm not going to discuss it.

Confused? Don't be. The model itself isn't important (which might be why Tol keeps changing it from paper to paper). What is important is the output of the model. I'll show you that in a minute. Before I do though, I need to explain one thing real quick. Tol splits his data up into four groups labeled AR2, AR3, AR4 and AR5. Those refer to the Second, Third, Fourth and Fifth IPCC Assessment Reports. Each group includes only those papers published before that particular report. You can see the break down in this table:


So AR2 includes the first four data points, AR3 includes the first 9 data points, AR4 includes the first 15 data points, and AR5 includes all 21 data points. With that in mind, here is the first set of outputs from Tol's model:


Ah-hah, you say. There is more data, but the bands shrink. That means I was wrong, right? The bands represent uncertainty levels, right? Well, let's see. Tol says:

Figure 1 shows the restricted Nadaraya-Watson kernel regression and its 95% confidence interval for the studies published before the Second Assessment Report, the Third, the Fourth and the Fifth, respectively. Before AR4, estimates of the impact of were limited to warming of 2.5°C and 3.0°C. The kernel regression is therefore valid only for a limited range of climatic changes. This range shrinks between AR2 and AR3 as the number of observations rises from 4 to 9, and the standard deviations shrink accordingly.

So yeah, that would seem to contradict what I said. As more data was added between AR2 and AR3, the confidence interval shrank, meaning we grew more certain of our results. That's exactly what we would hope would happen.

But let me ask you something, do you think it's just chance Tol singled out the jump from AR2 to AR3? What about the jump from AR3 to AR4? How about the one from AR4 to AR5? Do you think he just wanted to save space and felt they weren't important?

It'd be easy enough to check. Let's go ahead. Here's the output for AR4 and AR5:


Well would you look at that. The confidence intervals for AR5 are huge. They are way larger than the ones for AR4. They're way larger than the ones for AR3 or AR2 too. How is that? AR5 had the most data. How could it be that having more data decreased our certainty? More importantly, how could having less data increase our certainty? How could knowing less mean being more sure of what we know?

It's simple. It's because of how Tol defined uncertainty. A common way to define uncertainty is to look at variance. The more different your data points are from one another, the more uncertain you are of your results. It's a common sense approach. If all of your data points are very different from one another, you obviously can't be too certain of any results you draw from them.

The problem is the inverse is not inherently true. We all know data can be biased. If your data is biased, then constantly getting the same results doesn't mean those results are accurate. But that's not the (only) problem here. The problem here is, you only have 21 data points!

The values on the x-axis range from 1 to 5.4. The values on the y-axis range from -11.5 to 2.3. With all the possible combinations you could have between those, 21 data points is nowhere near enough. It's even worse if you then start splitting the data up into four groups.

Now, the primary reason for the increase in uncertainty in the AR5 chart is the one data point at -11.5. That point is a clear outlier. It actually shouldn't be in the data set, as it is given in PPP GDP instead of nominal GDP.* Still, if a single outlier can cause such a dramatic problem for your model, it's clear your model has serious problems.

Also, if one negative outlier has serious problems, it's likely one positive outlier could have serious problems. As I pointed out before, there is only one data point that is notably above 0. That one is introduced in the AR4 group, and as we see, that is when the model output becomes positive in the early potions. It would appear the model is highly sensitive to both major outliers.

In fact, you can see the model is so sensitive to outliers its uncertainty increases when they are added primarily because they are added.

Normally, if one data point is very unlike the rest, you assume the outlier is more likely to be wrong and give more weight to the rest of the data. This model does the opposite. This model asumes the outlier is more likely to be right and gives it more weight.

*It turns out this error was corrected in Tol's subsequent paper. Given I pointed the error out last year, it appears Tol corrected the error because of me. I'm not sure if he gave me credit. I am sure he introduced at least one other data error into that paper though. It's incredible really. This is at least the tenth revision of the data set I've seen now.


  1. >>the less data you have, the more certain you are of your results.

    >That argument is obviously absurd.

    No. If you have one watch, you know precisely what time it is. If you have two, you do not.

  2. There is another aspect to calculating uncertainty bands. The data points are not independent. In a tight field the authors should be familiar with all the previous papers. This is particularly true of William Nordhaus, who was an author on about a third of the papers. In economics one would hope that later papers would build upon, or undermine the previous conclusions.
    There is also problems in creating a damage function that has meaningful empirical content. An easy one to understand is that economic impact is inversely related to time. For instance, the more rapid the rate of sea level rise, the more costly it is to mitigate. More fundamentally, the greater the uncertainty, the less able economic actors are able to predict, and therefore the less able they are to adapt to minimize the economic impact. Allied to this, the damage functions range from a second to sixth degree equations, relating economic impact costs to temperature rise. We have had up to one degree of human-caused warming so far and are projecting ahead to five degrees or more of warming. The economic costs, particularly at a global level, are going to be fanciful.

  3. Oh, it's way worse than thatm. There are about a hundred different reasons the confidence intervals in the paper are bogus. I wouldn't care to try to list them all. One of the more bizarre ones is the choice of data used for the papers is arbitrary. Seriously, Richard Tol changed what data he used for the aggregation calculations he performed for a number of papers. More than once.

    How do you account for something like that? What if I told you he used 2095 economic data for calculations performed for a 1996 paper? What do those results even mean?

  4. MikeN, no that's not right about the watches. You only know what time it is from a single watch if you already know that the watch shows the right time. But what it you don't know if it shows the right time? All else equal, the more watches you check the better your estimate of the time will be on average.

  5. Brandon
    The figures mean very little. The "data" for 2095 is just a model estimate. It should not be confused with real world estimates. I have a fundamental philosophical problem with the models over such a long period. Take economic forecast models. These are far more complex than these welfare models; have much more "real world" empirical content; and are usually country-by-country. For even a year ahead if they get economic growth within 0.5% (i.e. to forecast 2.5%-3.5% when growth is 3%) it is usually seen as a triumph. The problem with these models is that they tend to perform well when there is steady growth. That is when the growth rate is the same year-on-year. The forecast models do little better than the dumb prediction that next years growth will be the same as last years. The paradox with these models do worst when you most need them. That is when there is a big structural change, such as the Credit Crunch of 2008. The reason is that the underlying empirical parameters of the model change.
    If I understand correctly, the biggest costs of climate change are not through measurable variables like temperature rise and sea level rise. Rather, it is that weather patterns will become more volatile (e.g. sharper swings from heavy rainfall to drought) and there being more extreme weather events. In the short-term this will be further compounded by passing through tipping points above two degrees of warming where weather systems will suddenly get worse. The major welfare impact will be massive shifts to something far removed from the present.
    Of course, trying to models the average welfare impact of global warming a century from now when there are net benefits forecast in some areas (e.g. Northern Europe) is hugely problematic. Also the shape of the function is hugely influenced by where in the 2-6 degrees of warming forecast by the IPCC for this century is correct. It will greatly influence the shape of the cost function. In total I am not surprised that in each version of Tol's graph there are significant changes. In fact I would have expected changes of greater magnitude..

  6. There would have been greater changes to Richard Tol's results if his models had anything to do with the underlying paper's models. They don't though. All he did is take single point estimates from some papers and discard all other information from those papers. He did that even though some of the work he took point estimates from estimate damage curves.

    That is, people estimated the amount of economic damage global warming would have for all amounts of temperature change for given ranges. Tol looked at their work, took their estimates for single points, and put them in a table. He then looked at the work of other people who only estimated the amount of economic damage global warming would have for specific amounts of warmings and put them in the same table. In the end, he wound up with 14 data points the first time. When he updated the work, he wound up with 17. Later he updated it to 21. I think he's up to 24 now.

    But he could get 24 data points from a single paper! He could get a continuous damage function, giving an infinite number of data points (albeit, over a finite range) from a single paper! He just... doesn't. He doesn't even explain why he doesn't!

    That's what's so baffling about Tol's work. It's not just that he constantly introduces so many baffling errors into his data (stay tuned for a new story on this). It's that he doesn't even try to make use of information that's readily available. It's like he's approaching the problem in the worst way imaginable.

Comments are closed.