Have you seen Connelly and Connelly’s papers on screen temps across China USA Ireland.. also Bill Johnston at Bomwatch.com.au provides a good look at station site changes effect on rainfall and max temps ..
I start with a decent cafetière of Colombian and treat myself to four squares of flapjack before grumbling about the crazy coding they've adopted! If I'm importing the CSV directly into SPSS it sets them as system missing. Importing into Excel is the pain and for this I use the find & replace function, which rips through them like a hot knife through butter. If you want a copy of my efforts for all 37 stations in xls format I'm happy to pop you a link, though you'd need to update the file when the August data becomes available.
Wowie! Scripting is deffo not my thing despite being a Fortran 77 bod back in the day. I've not had a glimpse of their ftp server as yet so can't comment on convention and coding.
Haha - I started with Fortran 77 too. I was looking into pulling the MIDAS data down with VBA or Python, when I went poking around the CEDA site and found the “bulk data” options. Basically FTP/FileZilla or WGET. Both need extra access keys to be added to your CEDA account. Despite using FileZilla extensively in a previous life, I just couldn’t get it to connect, so tried WGET. Once the really ugly access key was factored into the WGET command it just worked. Left it running overnight and ended up with about 2GB of data. However I know the process has dropped at least one file , so it ain’t faultless. Now the fun starts. Huge amounts of missing data, incompatible manual and automatic readings etc.
Fantastic effort! I really do despair sometimes over the quality of meteorological data. I’ve been downloading daily wind speed from Irish stations via ECA&D/KNMI and it’s a good job I’ve eyeballed a load of raw record charts before turning the handle. Sure makes you wonder what quality control the Met Office/Hadley Centre employ. This bit of fun will be published over the next few weeks.
I have been in correspondence with both the MO and CEDA about missing data. CEDA’s stance is that they are a “receiving archive” and have no control over what is sent. They also have a complex QC procedure which I haven’t got to the bottom of, but it doesn’t include noticing that a whole year or month is missing. The MO ‘s response was basically “dunno - try our archive people”. The rather helpful archivist sent me a PDF of an original Metform3208 for a missing month of data and referred me to similar scans online. Currently trying to audit / reconcile their “monthly averages” page… which is when all these discrepancies emerged…. E.g. treblicated data when they move from DLY3208 to AWSDLY, with dates not directly comparable due to “throwback”. Bit of a challenge! I am trying to get my “ingestion” process to handle all these wrinkles. Currently I can pull in an entire county’s data in one easy move, separate partial data from clean data, and summarise it all. If I can automate the treblication, we’re nearly there. This is all in Excel (Power Query) btw. Oh and we have a problem with old data - Excel can’t do dates before 1900. I should write it all up I guess.
Crikey Dave, that’s a mammoth job! Responses are pretty much as I had expected. Yep, the 1900 Excel limit is a right pain in the Aga - I resort to using an index that permits exchange with SPSS’ date format. This works fine unless you wish to go back further than 15 October 1582.
One of the issues with many temperature data-sets is the corrections that have been applied to the quoted values. The 'very handy resource' that you link to does not state whether the values are 'raw' or 'corrected'. Does the Met Office publish the data anywhere such that we can see and compare what was originally recorded versus what appears in their table?
I have actually scooped the whole of the MIDAS bunch of CSV files and am working my way through "ingesting" it into a usable format in XL ... Working with Ray Sanders (off the Tallbloke blog). Need some advice on ARIMA and Fourier tools and all the good stuff that you do. The data is pretty funky in places!
My goodness that is fabulous! As it so happens I jotted 'ARIMA article' in my black book for Private Passion only yesterday in order to pass on hints and tips gathered over the years.
Have you seen Connelly and Connelly’s papers on screen temps across China USA Ireland.. also Bill Johnston at Bomwatch.com.au provides a good look at station site changes effect on rainfall and max temps ..
Not yet!
How do you deal with the “—-“ missing data entries?
And when the weather gets lousy and keeps me indoors I might have a go a batch scooping the big data with Python or something.
I start with a decent cafetière of Colombian and treat myself to four squares of flapjack before grumbling about the crazy coding they've adopted! If I'm importing the CSV directly into SPSS it sets them as system missing. Importing into Excel is the pain and for this I use the find & replace function, which rips through them like a hot knife through butter. If you want a copy of my efforts for all 37 stations in xls format I'm happy to pop you a link, though you'd need to update the file when the August data becomes available.
thanks - I'll do some cross-checking in due course
Could probably write a script to download all the MIDAS data in one go, assuming some logic to their naming convention..
Wowie! Scripting is deffo not my thing despite being a Fortran 77 bod back in the day. I've not had a glimpse of their ftp server as yet so can't comment on convention and coding.
Haha - I started with Fortran 77 too. I was looking into pulling the MIDAS data down with VBA or Python, when I went poking around the CEDA site and found the “bulk data” options. Basically FTP/FileZilla or WGET. Both need extra access keys to be added to your CEDA account. Despite using FileZilla extensively in a previous life, I just couldn’t get it to connect, so tried WGET. Once the really ugly access key was factored into the WGET command it just worked. Left it running overnight and ended up with about 2GB of data. However I know the process has dropped at least one file , so it ain’t faultless. Now the fun starts. Huge amounts of missing data, incompatible manual and automatic readings etc.
Fantastic effort! I really do despair sometimes over the quality of meteorological data. I’ve been downloading daily wind speed from Irish stations via ECA&D/KNMI and it’s a good job I’ve eyeballed a load of raw record charts before turning the handle. Sure makes you wonder what quality control the Met Office/Hadley Centre employ. This bit of fun will be published over the next few weeks.
I have been in correspondence with both the MO and CEDA about missing data. CEDA’s stance is that they are a “receiving archive” and have no control over what is sent. They also have a complex QC procedure which I haven’t got to the bottom of, but it doesn’t include noticing that a whole year or month is missing. The MO ‘s response was basically “dunno - try our archive people”. The rather helpful archivist sent me a PDF of an original Metform3208 for a missing month of data and referred me to similar scans online. Currently trying to audit / reconcile their “monthly averages” page… which is when all these discrepancies emerged…. E.g. treblicated data when they move from DLY3208 to AWSDLY, with dates not directly comparable due to “throwback”. Bit of a challenge! I am trying to get my “ingestion” process to handle all these wrinkles. Currently I can pull in an entire county’s data in one easy move, separate partial data from clean data, and summarise it all. If I can automate the treblication, we’re nearly there. This is all in Excel (Power Query) btw. Oh and we have a problem with old data - Excel can’t do dates before 1900. I should write it all up I guess.
Crikey Dave, that’s a mammoth job! Responses are pretty much as I had expected. Yep, the 1900 Excel limit is a right pain in the Aga - I resort to using an index that permits exchange with SPSS’ date format. This works fine unless you wish to go back further than 15 October 1582.
I've written up the start of this little exercise - here : https://open.substack.com/pub/davesdata/p/met-office-temperature-data-cawood
One of the issues with many temperature data-sets is the corrections that have been applied to the quoted values. The 'very handy resource' that you link to does not state whether the values are 'raw' or 'corrected'. Does the Met Office publish the data anywhere such that we can see and compare what was originally recorded versus what appears in their table?
Absolutely so. As far as I am aware these are raw readings but you can check them against data held in MIDAS...
https://catalogue.ceda.ac.uk/uuid/dbd451271eb04662beade68da43546e1
Back in August I checked my own temp data against those for a station a few miles from me, a summary of which can be found in this newsletter...
https://jdeeclimate.substack.com/p/my-garden-part-3
Hi John
I have actually scooped the whole of the MIDAS bunch of CSV files and am working my way through "ingesting" it into a usable format in XL ... Working with Ray Sanders (off the Tallbloke blog). Need some advice on ARIMA and Fourier tools and all the good stuff that you do. The data is pretty funky in places!
My goodness that is fabulous! As it so happens I jotted 'ARIMA article' in my black book for Private Passion only yesterday in order to pass on hints and tips gathered over the years.