2010-10-12 11:15:24Statistical Feedback being asked for- Comparing Temp to CO2.AMO
Robert Way

robert_way19@hotmail...
137.122.14.20
Hello All.

I was playing around with some data a little bit and I chose to run a multivariate regression model comparing Temperatures (1883-2009 from GISS land/ocean index) to the AMO and CO2. What I did is I took out every third year and then ran the regression on the remaining data (every 1st and 2nd year) to give me my values to plot in the multiple regression formula y= Z + X1J1+X2J2 to predict what the values would be for the 3rd year. I then complete the same process to predict the 1st and 2nd years by excluding them, then running the model again and repeating what I mentioned above.

Anyways I end up with a predicted temperature chart and I compare it to the observed temperatures below.

I was wondering if someone could think of a better way of doing the prediction cause the result obviously explains far too much variance (90%, P>99%) to be right, right? Perhaps a better way from a scientist like ned ?




http://www.skepticalscience.com/pics/Temp-AMO-CO2.bmp
2010-10-13 07:58:42
Riccardo

riccardoreitano@tiscali...
93.147.82.141
I'm not sure I understood what you did. Why did you sliced the dataset? Couldn't you run the regression just once on the whole dataset?
2010-10-13 10:50:34Slightly off-topic paper
John Cook

john@skepticalscience...
124.186.160.198

Robert, this is off-topic (PDO not AMO) but just read this abstract, thought you might find it interesting:

Investigating the possibility of a human component in various pacific decadal oscillation indices
The pacific decadal oscillation (PDO) is a mode of natural decadal climate variability, typically defined as the principal component of North Pacific sea surface temperature (SST) anomalies. To remove any global warming signal present in the data, the traditional definition specifies that monthly-mean, global-average SST anomalies are subtracted from the local anomalies. Differences in the warming rates over the globe and the PDO region may therefore be aliased into the PDO index. Here, we examine the possibility of a human component in the PDO, considering three different definitions. The implications of these definitions are explored using SSTs from both observations and simulations of historical and future climate, all projected onto (definition-dependent) observed PDO patterns. In the twenty first century scenarios, a systematic anthropogenic component is found in all three PDO indices. Under the first definition—in which no warming signal is removed—this component is so large that it is also statistically detectable in the observed PDO. Using the second/traditional definition, this component is also large, and arises primarily from the differential warming rates predicted in the North Pacific and over global oceans. Removing the spatial average SST signal in the PDO region (in the third definition) partially solves this problem, but a human signal persists because the predicted pattern of SST response to human forcing projects strongly onto the PDO pattern. This illustrates the importance of separating internally-generated and externally-forced components in the PDO, and suggests that caution should be exercised in using PDO indices for statistical removal of “natural variability” effects from observational datasets.

BTW, I note with interest that your prediction underestimates temperatures around 1940 which suggests to me the observations are skewed by too-warm ocean measurements (still waiting for CRU to adjust for that).

 

2010-10-14 00:45:37comment
Robert Way

robert_way19@hotmail...
174.115.188.128

Riccardo,

If you ran it on the whole dataset then you would be using your training period as your prediction period also. From my understanding of multivariate regression you should have a training period which you use to predict the remainder of the data.

John,
That sounds like an interesting paper, I might have to read that soon. With respect to my prediction, I wouldn't get carried away with assessing hadley's accuracy based upon a very simple model by a young student haha

2010-10-14 21:34:25
Riccardo

riccardoreitano@tiscali...
192.84.150.209

Robert, but in your way you are using the same time period to do both training and prediction, although using different data points. My guess is that this is the reason why you get such a high predictive ability.

2010-10-15 01:13:41Response
Robert Way

robert_way19@hotmail...
137.122.14.20
Hey Riccardo,
That is essentially the part I have been worried about. What is a preferable way to do this? I am currently training on the first two year, predicting the third using the correlations I get from the multivariate regression of the training years. I then repeat this process to predict the 3rd and 1st years using the same sort of methodology. What is an optimal way? Just train on a chunk of data (say 1880-1920) and then try to predict 1920-2010?
2010-10-15 05:44:46
Riccardo

riccardoreitano@tiscali...
188.152.84.246

Robert

if you are looking for the pedictive ability of the model, you shoud train the model during the first period and then predict for the rest of the dataset. This is typically done with complex model which include all the possible factors affecting the dependet variable (e.g. GCMs).

If instead you want to assess the ability of the model to explain (reproduce) the behaviour of the dependet variable, you should simply regress over the full time range.

Your last remark on how much of the variance is explained by the AMO-CO2  model is properly assessed by using the latter.

2010-10-15 09:58:44I was..
Robert Way

robert_way19@hotmail...
137.122.14.20
Hey Riccardo,
I was told by a climatologist I know to not regress over the full time range because I would be outputting a prediction that was incorporating into it the training period. This stuff is complicated stuff. I was suggested that if I was trying to reproduce that I would train and predict in the manner I previously mentioned. But I would love to find an optimal method. Also the regression over the full period produces a similar result.

Does this mean I have solved that whole global warming thing with a simple regression model :P