I am not perfect. Nobody is. I get that. I have made plenty of stupid mistakes in my life. Even so, I don't understand how some mistakes get made. It's even more confusing when those mistakes get included in a publication. For that to happen, the author needs to make the mistake and not notice the mistake while drafting the publication. Then any editors and reviewers have to not notice the mistake. Then usually, readers have to not notice the mistake either.
For some errors, I understand how that might happen. For others, I don't. Today's post is going to discuss an example where several inter-related errors in a publication caused the paper to have dramatically different results than it would have had otherwise.
As a warning, there is a lot of backstory to this topic, and I won't be covering it. The publication in question is "The Economic Impacts of Climate Change," by professor Richard Tol. It was published earlier this year, but similar papers by Tol have been published over the last decade, as have other publications using his work, such as a major report by the Intergovernmental Panel on Climate Change (IPCC), a United Nations body created to create reports on climate change. These various publications have had numerous errors in them, and there has been a long saga with them being pointed out and changed/not really corrected. It's all quite messy, and it is made worse by things like the fact Tol used his position as a lead author of an IPCC chapter to get his work prominently included after the last round of external review had happened, meaning reviewers weren't given the opportunity to point out many errors (some of which have since been corrected by the IPCC).
You can read about some of the errors and history here. For today's post though, I'm just going to focus on one aspect. Namely, Tol's central premise for the last decade has been moderate amounts of global warming are beneficial. He first published this conclusion with this chart:
He did follow-up work expanding upon this, but after a few years, errors in his work in his work were discovered as detailed in the article linked to above. After various back-and-forths, Tol was forced to publish a new version of that chart:
This version contradicted Tol's premise, showing no benefits from global warming. Over the next few years, Tol attempted to find ways to get his original conclusion back, experimenting with various ways to model the data. The latest effort was to use a "piecewise linear function" as his model. Basically, instead of trying to draw a curved line through his data like before, he just draws two straight lines that connect to one another:
Now, there is no physical basis for this model. There is no reason to think the impacts of global warming will follow one trend for a time then reverse themselves in an instant. Tol had offered physical reasons for his previous model, but for this one, he simply says things like (from his recent paper):
Alternative specifications of the impact function are possible, but the piecewise linear model of figure 1 is by far the best fit.
One might question the merits of using a model with no physical basis simply because it can fit the data. That's a matter for another day though. For today, I just want to examine the errors which went into Tol's calculations he uses to claim this model provides the best fit, and further, that it "is by far the best fit." We can see his results in Appendix B of his supplementary materials:
This table lists various models people have proposed for studying this problem. The first column shows the model. The last column shows the percent probability Tol assigns each model (if you add up the values, you get 100%). That column shows Tol assigns a "Relative likelihood" of 82% to his new model, 16% to his previous model, and 1.8% to the next most likely model. Whatever one thinks of this method for choosing the model, it is the one Tol went with. But Tol messed it up.
You can find Tol's errors by examining his data file, though it is so messy and poorly organized I wouldn't recommend trying. There are tons of things which aren't labeled, calculations with no description of what they are/do, and everything is all over the place. That's why it's easy to miss things. For instance, a natural question to consider is why would the model T^2 provide a significantly worse fit than T + T^2? Tol says his previous model, T + T^2, has a 16% probability while an alternative model, T^2, has a probability of just 1.8%. If you understand how functions work for parabolas, that should seem kind of weird.
If you poke around in the data file a bit, you can find a column for the T^2 model used to come up with this result. It multiples each data point Tol uses by the coefficient of his model as such:
=M$75*B1 =M$75*B2 =M$75*B3 ...
This seems unremarkable unless you think to check what grid cell M75 in his data sheet is. You might not think to since it should be the coefficient of the model, but if you, you'll see:
At it happens, M75 is not the coefficient value. That value is found in I75. M75 is the "Lower 95%," or the lowest value one get given the uncertainty range for the model. The upper limit is found in N75. In numerical terms, the range is -0.24 to -0.15, with -0.19 being the best estimate. That is the same -0.19 we see in the table from Tol's supplementary material. He just failed to use it when calculating the likelhood values seen in that fourth column. Instead, he used -0.24.
Obviously, if you do a best fit with a model then use different values than what the model suggests, you will get worse results. Since all the models in Tol's list have their likelihood add up to 100%, this makes the Tol's previous and current models look better, as can be seen by this comparison:
This isn't an enormous shift, but it is interesting to see how such a simple error of using the wrong data value can affect Tol's comparisons. This error may not have a large impact, but what about other errors? Remember how I asked why changing your model from T^2 to T + T^2 would make such a large difference? We can see now the answer is, "Because Tol messed up his calculations."
With that in mind, why does changing from T + T^2 to T^2 + T^6 or T^2 + T^7 make such a large difference? Tol says those models are so unlikely to be correct their probabilities are 0.00000000309023070179547% and 0.00000000209622969674906%. How could they be that unlikely when they are so similar to the model Tol used in his previous paper? It doesn't make sense, especially not with the coefficients Tol gives for the T^6 and T^7 terms being so small (-.00016 and -.00026 respectively). Again, let's try looking into Tol's spreadsheet. If these models are as incredibly improbable as Tol reports, that should show up in the stats for those models:
It's fine if you don't know what that all means. All you need to know is Multiple R, R Square and Adjusted R Square are measures of skill being reported for the model. They are basically percent scores, meaning the closer they are to 1 (100%), the better the model performed. These scores indicate the models in question performed very well. I couldn't find similar reporting for Tol's piecewise model in his data, but here are the reported scores for his old model:
As you can see, these scores say the two models Tol rates as being terrible outperform his old model by a sizable margin. How is that possible? To try to find out, we can examine Tol's formulas. We see in one column:
=I$114*A1+I$115*A1^6 =I$114*A2+I$115*A2^6 =I$114*A3+I$115*A3^6 ...
And in another:
=I$94*A1+I$95*A1^7 =I$94*A2+I$95*A2^7 =I$94*A3+I$95*A3^7 ...
Notice anything unusual? It's true Tol used the I column instead of the M column this time, meaning he did use the values as coefficients. And we can see one column multiples one cofficient by T^6 while the other multiples one coefficient by T^7, as expected. But look at the first part of these formulas: I94*A1.
Remember what the models were? They were T^2 + T^6 and T^2 + T^7. Tol's formulas use A1 + A1^6 and A1 + A1^7th. Obviously, that's not right. They should be A1^2 + A1^6 and A1^2 + A1^7. Tol simply failed to square one of the coefficients for each model when looking at how well they fit the data.
Like before, if you use the wrong data when calculating how well a model performs, it will seem to perform worse than it should. Tol messed up the formula for the models with the best scores, and as a result, concluded they had the worst scores. Had he simply looked at the statistics he reported for these models, he'd have realized they had the best scores.
Now, confession time. When I first started working on this post, this is where I stopped. I saw the reported statistics for these two models, found this error and foolishly stopped there. I thought, "Wow, Tol made these awesome models look bad by messing up their formula." I didn't bother to redo the probability calculations because with the results Tol reported for these models, there was no way the models could be anything but the best of the bunch.
That made sense. It was nice and simple. But while it made sense, I wanted to try to make sure I verified everything I was putting in this post so I went ahead and made the corrections to Tol's formulas to get the model specifications right. This is what I got:
I was dumbfounded by this. I fixed an error in Tol's calculations for these two models, and according to Tol, that made them perform worse. That shouldn't be possible. Errors in your data should not make your results better. Yet here, fixing an error made the results worse. That is difficult to understand so I went looking for an answer. After a while, I looked at a second set of model specifications Tol provided with two of his data points (representing the greatest amounts of warming) excluded:
It is no surprise there'd be some differences between this table and the previous one since two data points were excluded. Some of the differences are surprising though. For instance, in this table, the relative likelihood of the two models in question are given as 0.04% and 0.02%, approximately a million times more likely than in the other table. That is weird. I took a quick look at that section of Tol's data file, and I found the same errors as before. Applying the same corrections, we get:
Here, the changes have the expected effect of improving the relative likelihood for Tol's table. They're still not as likely as we'd expect given the reported stats on the models for the full data set though. To consider why, we can look at those reported stats for these models when fit to Tol's data set with two data points excluded:
They are nowhere near as good this time around. That large a difference suggests Tol's reports for these models in the other case are wrong. Plotting the data confirms this. The model coefficients Tol provides in his supplementary paper are for a model that doesn't come close to fitting the data. It turns out the reason the relative likelihood Tol reported for these two models is he generated the fits for the models on an entirely wrong data set. I don't know what data he used, but it wasn't the same data he used for the rest of his models.
This isn't the end of the problems either. If you look at the tables Tol provided for these models, you may notice most of them do not have a constant value added to them. That's because Tol forces all model fits to go through the point 0,0 to represent 0 warming having had 0 economic impact. But for two models, he doesn't do this. Those models are the exp(T) and exp(exp(T)) models.
I don't know why Tol treats those two models differently. What I do know is Tol doesn't provide any calculations or statistics for those models, and they are wrong. Whether you force them to intersect 0,0 like the other models or not, no best fit for them will give the results Tol gives. And unlike some of the other errors, nobody could tell this looking at Tol's data files because he doesn't show or explain how he got the results he got. He managed to get the best fit calculations for four models wrong, and there's nothing in his data to indicate how.
What does all this mean? I'm not sure. Initially I trusted the detailed statistics Tol provided for models. assuming he only made mistakes in typing the formulas he used with their output. If that were the case, fixing the errors in his formulas would have caused Tol's method for choosing which of these models to use to pick one of the two models models he reported an R Squared value of .97 for. These models would have shown no benefits from global warming at any point.
It turns out things aren't quite that bad. It is still nonsensical to use one model over another without regard for any physical meaning the models might have just because the numbers "fit better" in one case. And the calculations Tol does to determine the "relative likelihood" of his models are... let's just say, not how you're supposed to do it. And the entire idea of trying to fit a line through this data silly for about thirty different reasons. But... I don't know.
I've got to be honest. I was enthused about this post because when I saw those model statistics Tol reported, I thought him flubbing the formulas in his data sheet meant he completely inverted his results on which of these models to use. The fact those results are entirely wrong took the wind out the post. I can't find any central theme or point to tie this all together with.
Still, I guess this was worthwhile. I mean, look at what we've seen. Tol used the wrong value as his coefficient for one model. He mistyped the formulas for two more models, forgetting to square one of the parameters in them. He somehow managed to mess up the best fit calculations for four different models, reporting false statistics for two of them. And in an alternative case for one of those models, he also managed to report the incorrect value he had come up with inaccurately, inverting its sign.
How does a person mess all that up? And how does work of this quality pass any sort of review? It's incredible. What's even more incredible is this is all about one table in Tol's supplementary material, a throwaway section offered as nothing more than an excuse for Tol to pick the one model he was able to come up with which would let him continue to say moderate global warming is beneficial, a claim he can base only upon a single paper he himself wrote 15 years ago.
Oh, and none of this is new. I just checked. Back in 2015, I wrote a post about how Tol surreptitiously changed a paper he wrote to correct a data error I had discovered, without disclosing the change had been made. In reviewing it and the materials for it, I've realized all the errors I discussed today were present in that paper as well. That means Tol has managed to use these erroneous formulas/model calculations for years without it getting noticed.
I guess I might be a bit to blame for that. I don't know why I didn't notice any of this back then.