A Modest Cloud Cover Study (part 5)
Today I ramp up use of indicator variables to further improve my modelling of the ICOADS v3 monthly cloud cover dataset
Back in part 4 we discovered that a simple intervention approach to the wonky ICOADS data for 1945 – 1956 gave superior prediction results to a model based around the CRU TS4.08 dataset as an independent variable in an ARIMA time series tray bake. Today I’m going to take this approach a step further to see if it generates something even more superior.
I consider this fiddling to be important because the venerable ICOADS v3 dataset cannot be used for time series analysis as it stands owing to the peculiar bias observed during the post war era (1945 – 1956). Iron this wrinkle out and you suddenly get a juicy series dating all the way back to 1853 in order that you may say something about cloud cover. There is some evidence (see here, here and here) that changing cloud cover is affecting the Earth’s albedo and it is this that is altering the energy budget, resulting in recent periods of warming.
I can’t see this hypothesis going down well amongst alarmists, globalists, and those experts with a vested interest in promulgating the carbon dioxide story, but I’m one of those old school scientists who’d rather take a look see then have a think, a cuppa, and a chat (ideally with biscuits). Hence my interest in fixing the ICOADS v3 long-series dataset. I’m hoping everybody has been following this article series and understanding the methodology for I’m about to dive straight in to revealing my best ICOADS predictive model to date…
My Best Model To Date!
OK, so I’m trying to model the ICOADS v3 monthly cloud cover dataset for the UK for the period January 1855 to December 2024 using ARIMA. As stated before we can get a time series to predict future values of itself providing there is some sort of periodic structure to the data - we all know that this coming winter is going to be cooler than it is today because of the periodic nature of the seasons in the northern hemisphere; put some numbers to this and there you have it!
Up to now I have marked the entire wonky period 1945 – 1956 using a single binary (0,1) indicator variable and submitted this as a single independent (predictor) variable. Whilst this has yielded a half-decent result the method makes the rather bold assumption that all eleven years of 1945 – 1956 were equally as wonky. Chances are they were not, which is probably why we see a ramping of okta values over this period. We can do to refine this crude approach is to permit estimates of wonkiness for each year by submitting an array of eleven binary indicators and see what that brings.
Good Job, Good Job!
Let us wade into the deep end with the horrid tables that define all things…
We start with a model structure that is defined as ARIMA(1,1,1)(1,0,1). This pretty much ticks all the boxes in that we have a non-seasonal autoregressive component (p=1), non-seasonal differencing (d=1), a non-seasonal moving average component (q=1), a seasonal autoregressive component (P=1), and a seasonal moving average component (Q=1). Seasonality in cloud cover is to be expected so it’s nice to see two components picking up on this. The two non-seasonal components are interesting in that they’re telling us that cloud cover in one month can be predicted by what was happening in the month before; this makes sense to me. Also interesting is the differencing aspect (d=1) which suggests there is an underlying long-term trend.
A stationary R-square of 0.509 is pretty good going considering we are trying to predict something that is well tricky to predict, and it’s nice to see all 11 binary indicators sucked into the model.
The big table of model parameters gives us a feel of just how wonky each year was, with the gold medal going to 1949 for an associated coefficient of -2.183 okta (p<0.001). Beyond this point things slowly pick-up with 1956 only down by -0.606 okta (p<0.001). The year 1945, as a whole, fails to make the grade as a binary predictor because wonky values only set in during November and December. However, if we eyeball the table of outliers we can see November 1945 being flagged up as a transient with an associated coefficient of -2.464 (p<0.001).
That final table of outliers flags up some interesting aspects in that cloudy weather appeared to be a thing back in May 1917, August 1918 and September 1918, with May 1918 being particularly fine. I wonder how much of this is down to genuinely variable weather and how much down to the war effort affecting observation. The same ambiguity applies to December 1940, February 1941, April 1942, and May 1942. One finding I can personally vouch for is the lack of cloud cover for August 1976 – it was scorchio back then, and make no mistake! It is curious that extremes in cloud cover are not observed in recent years given we’re supposedly living through a terrible climate catastrophe: you’d think more clouds would be a feature of man-made global warming!
The Pudding
Let us have a look at what the ICOADS monthly series looks like after appropriate corrections:
Now that is rather lovely even if I say so myself!
My eyeballs suggest greater variability way, way back but, of course, this may arise from methodological differences. In fact, to be totally honest, we can’t really be sure that the slight positive trend we are seeing is down to a genuine increase in cloud cover, for it could easily be a reflection of methodological changes over time. Unfortunately this unpleasant realisation is true for pretty much all climate-related datasets and yet everybody goes around ignoring the fact, including 97% of experts (see what I did there?).
Statistically-speaking I can run a linear regression and declare a statistically significant positive trend of +0.614 okta per century (p<0.001), but we have no idea whether this minuscule change is genuine or artefact! We might also have a look at the LOESS function (orange line) and ask whether a linear regression is a sensible method to apply to a time series that is a bit bendy. In my book it ain’t, but that won’t stop alarmists!
We might also split the data into pre- and post-WWII periods and ask what happens to trends over time then. When we do this we get a statistically significant positive trend of +0.776 okta per century (p<0.001) for the period 1855 – 1939 and a statistically insignificant negative trend of -0.051 okta per century (p=0.368) for the period 1940 – 2024. It’s called settled science™ but I’m not quite sure what tea experts are drinking these days.
Given that methodology likely settled into a robust routine after WWII I’m favouring zero change in cloud cover, as estimated by the ICOADS v3 dataset for the UK region. With that tray bake baked I fancy having a look at seasonal variation…
Kettle On!
Following the series, methodology double dutch to me! But it’s plain just how many variables there are in climate study.