Hello, I wrote this code in order to obtain a series of monthly weather observations at Helsinki for a period from 1960 to 2020 and then I saved the data to a local file using package pickle. I used the data available from the API provided by the Finnish Meteorological Institute.

import datetime
import requests
import lxml.etree as etree
from pprint import pprint
import pickle
import pandas as pd
import numpy as np

class FmiApi:
    """
    a minimalistic wrapper for the Finnish Meteorological Institute API    
    """
    def __init__(self):
        self.base_url = 'http://opendata.fmi.fi/wfs'
        self.fixed_params = {
                'service': 'WFS',
                'version':'2.0.0',
                }
        self.namespaces ={
                'wfs':'http://www.opengis.net/wfs/2.0',
                'gml':'http://www.opengis.net/gml/3.2',
                'BsWfs':'http://xml.fmi.fi/schema/wfs/2.0'
                }                   

    def get_monthly_obs(self, place, year, month, maxlocations=5):                
        """
        get monthly simple observation

        in:
            place [str]
            year [int]
            month [int]
            maxlocations [int] (optional)   

        out:
            dictionary with tuple of locations and times keys 
            and dictionary of parameter names and values as values
            tmon <=> temperature
            rrmon <=> rainfall
        """
        sdate = str(datetime.date(year,month,1))
        parms = {
            'request': 'getFeature',
            'storedquery_id': 'fmi::observations::weather::monthly::simple',
            'place':place.lower(),
            'maxlocations':'%d'%maxlocations,           
            'starttime': sdate + 'T00:00:00',
            'endtime': sdate + 'T00:00:00',
            }
        parms.update(self.fixed_params)                
        try:                                      
            resp = requests.get(self.base_url, params=parms, stream=True)
        except:
            raise Exception('request failed')
        if resp.status_code != 200: raise Exception('FMI returned status code %d'%resp.status_code)        
        resp.raw.decode_content=True 
        try:               
            root = etree.parse(resp.raw)
        except:
            raise Exception('parsing failed')          
        members = root.xpath('//wfs:member', namespaces=self.namespaces)                            
        weather = {}
        for member in members:                
            ppos = member.xpath('.//gml:pos', namespaces=self.namespaces)[0].text.strip()
            ppos = tuple([float(x) for x in ppos.split()])
            ptime = member.xpath('.//BsWfs:Time', namespaces=self.namespaces)[0].text.strip()
            if not ppos in weather:
                weather[ppos] = {}         
            pname = member.xpath('.//BsWfs:ParameterName', namespaces=self.namespaces)[0].text.strip()
            pvalue = member.xpath('.//BsWfs:ParameterValue', namespaces=self.namespaces)[0].text.strip()                                                      
            weather[ppos][pname] = (pvalue, ptime)       
        return weather

def test():
    api = FmiApi()
    weather = api.get_monthly_obs('kuusamo', 1985, 1)
    pprint(weather)

try:
    with open('wdata.pkl', 'rb') as f:
        data=pickle.load(f)

except:
    wheaters=[]
    for year in range(1960,2021):  
            for month in range(1,13):
                api = FmiApi()
                w = api.get_monthly_obs('oulu', year, month )

                wheaters.append(w)
                pprint(w)

    with open('wdata.pkl', 'wb') as f:
                pickle.dump(wheaters, f)

Now I want to use the local file to access the data in order to plot the monthly average temperature for the years 1960 to 2020. I wrote this code but it doesn't print the average temperature

def pest():



    df_weath = pd.read_csv('wdata.pkl', parse_dates=[0], infer_datetime_format=True)   
    df_weath.sort_values('Date', inplace=True, ignore_index=True)
    df_weath['Date'] = df_weath['Date'].dt.date   #convert to datetime objects


    #input()



    api = FmiApi()  #instantiate the api
    params = {            
                'place': u'helsinki',                       
                'maxlocations': '5',
                }
    d0 = df_weath['date'].values[0]
    d1 = df_weath['date'].values[-1]
    n_days = (d1 - d0).days + 1  #number of days between d0 and d1
    wdata = []
    for i_day in range(n_days):  
            date = d0 + datetime.timedelta(days=i_day)
            params['starttime'] = str(date) + 'T00:00:00'
            params['endtime'] = str(date) + 'T00:00:00'
            try:
                print('requesting weather data for %s'%str(date))
                weather = api.get_daily_obs(params)                                
            except:                
                print('getting weather failed, skipping')
                continue
            wdata.append(weather)


     #move weather data to a pandas dataframe (calculate avg over valid observations)
    date = []
    temps=[]
    variables = ['tday'] 

    for wobs in wdata:        
        avgs = {}
        for pos, obs in wobs.items():                   
            for var, xx in obs.items():                
                if not var in variables: continue

                date = datetime.date(1990,6,15)            
                if xx[0] != 'NaN':
                    val = float(xx[0])                    
                else:
                    val = None
                if not var in avgs: avgs[var] = []  
                if val != None: avgs[var].append(val)       
        vals = []
        for var in variables:  #calculate the average when available
            if len(avgs[var]) > 0:
                vals.append(sum(avgs[var])/len(avgs[var]))
            else:
                vals.append(None)
        wdata.append(temps)
        wdata.append(date)      

Can you help me to find what I am doing wrong? Or do you know any easier way to plot the monthly average temperature?
Thank you.

Recommended Answers

All 14 Replies

Is the function pest() in a different source file, or the same one, and in either case, where does it get called?

I did copy the source code (into two separate files, since the FmiAPI only need to run once), and when running pest() I got the following error and traceback:

Traceback (most recent call last):
  File "/home/schol-r-lea/Documents/Programming/Projects/Quick Tests/weather_pest.py", line 66, in <module>
    pest()
  File "/home/schol-r-lea/Documents/Programming/Projects/Quick Tests/weather_pest.py", line 11, in pest
    df_weath = pd.read_csv('wdata.pkl', parse_dates=[0], infer_datetime_format=True)   
  File "/usr/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/usr/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 69, in __init__
    self._reader = parsers.TextReader(self.handles.handle, **kwds)
  File "pandas/_libs/parsers.pyx", line 542, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 642, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 1917, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

This seems to indicate a problem in reading the pickled data file. Since you seem to be trying to read the pickled (binary) file as if it were a CSV text file, that's presumably the source of the problem. You presumably want to use pd.read_pickle() rather than pd.read_csv().

Ok, thank you. I used pd.read_pickle().
Is the code I wrote correct to plot the monthly average temperature for the years 1960 to 2020?
Because to complete my code I have to plot a smoothed curve of the average temperatures using Lowess regression, but I can't do it if I don't now if the code I've already written is correct.

I'm still stuck on handling the DataFrame you are reading in. As I understand it, pd.read_pickle() returns a list:

[{(64.93503, 25.3392): {'rrmon': ('31.2', '1960-01-01T00:00:00Z'),
                        'tmon': ('-11.6', '1960-01-01T00:00:00Z')}, 
  (65.03371, 25.47957): {'rrmon': ('50.6', '1960-01-01T00:00:00Z'),
                        'tmon': ('-11.7', '1960-01-01T00:00:00Z')}}, 
 {(64.93503, 25.3392): {'rrmon': ('15.4', '1960-02-01T00:00:00Z'),
                        'tmon': ('-12.3', '1960-02-01T00:00:00Z')},
  (65.03371, 25.47957): {'rrmon': ('17.3', '1960-02-01T00:00:00Z'),
                        'tmon': ('-11.9', '1960-02-01T00:00:00Z')}}, 
 ...
 {(65.01967, 24.72758): {'rrmon': ('41.3', '2020-12-01T00:00:00Z'),
                        'tmon': ('-1.5', '2020-12-01T00:00:00Z')},
  (64.93503, 25.3392): {'rrmon': ('NaN', '2020-12-01T00:00:00Z'),
                        'tmon': ('-1.7', '2020-12-01T00:00:00Z')},
  (64.68421, 25.08919): {'rrmon': ('43.9', '2020-12-01T00:00:00Z'),
                        'tmon': ('-1.6', '2020-12-01T00:00:00Z')},
  (65.0064, 25.39321): {'rrmon': ('NaN', '2020-12-01T00:00:00Z'),
                        'tmon': ('-1.5', '2020-12-01T00:00:00Z')},
  (64.93698, 25.37299): {'rrmon': ('51.6', '2020-12-01T00:00:00Z'),
                        'tmon': ('-2.0', '2020-12-01T00:00:00Z')}}]

This then needs to be converted to a DataFrame like so:

df_weath = pd.DataFrame(pd.read_pickle('wdata.pkl'))

However, when I do this, the resulting DataFrame does not actually have a column 'Date' (or 'date' - case is significant).

                                   (64.93503, 25.3392)  ...                               (64.68421, 25.08919)
0    {'rrmon': ('31.2', '1960-01-01T00:00:00Z'), 't...  ...                                                NaN
1    {'rrmon': ('15.4', '1960-02-01T00:00:00Z'), 't...  ...                                                NaN
2    {'rrmon': ('6.7', '1960-03-01T00:00:00Z'), 'tm...  ...                                                NaN
3    {'rrmon': ('22.5', '1960-04-01T00:00:00Z'), 't...  ...                                                NaN
4    {'rrmon': ('19.7', '1960-05-01T00:00:00Z'), 't...  ...                                                NaN
..                                                 ...  ...                                                ...
727  {'rrmon': ('NaN', '2020-08-01T00:00:00Z'), 'tm...  ...  {'rrmon': ('30.2', '2020-08-01T00:00:00Z'), 't...
728  {'rrmon': ('NaN', '2020-09-01T00:00:00Z'), 'tm...  ...  {'rrmon': ('94.9', '2020-09-01T00:00:00Z'), 't...
729  {'rrmon': ('NaN', '2020-10-01T00:00:00Z'), 'tm...  ...  {'rrmon': ('91.8', '2020-10-01T00:00:00Z'), 't...
730  {'rrmon': ('NaN', '2020-11-01T00:00:00Z'), 'tm...  ...  {'rrmon': ('86.8', '2020-11-01T00:00:00Z'), 't...
731  {'rrmon': ('NaN', '2020-12-01T00:00:00Z'), 'tm...  ...  {'rrmon': ('43.9', '2020-12-01T00:00:00Z'), 't...

[732 rows x 12 columns]

I'm not sure how you would re-label the columns so as to have an explicit 'Date' column.

Yeah, there is not an explicit column for Date, but I think it prints the monthly average temperature. Do you know how I can plot a smoothed curve of the average temperatures using Lowess regression where y would be the temperature and x the time?

I'm afraid not, no.

To elaborate on what I was saying earlier, I have tried running the code, and came up with problems. I was able to get the un-pickled list converted to a DataFrame as described above, but the next line:

    df_weath.sort_values('Date', inplace=True, ignore_index=True)

raises a KeyError exception:

Traceback (most recent call last):
  File "/home/schol-r-lea/Documents/Programming/Projects/Quick Tests/weather_pest.py", line 68, in <module>
    pest()
  File "/home/schol-r-lea/Documents/Programming/Projects/Quick Tests/weather_pest.py", line 14, in pest
    df_weath.sort_values('Date', inplace=True, ignore_index=True)
  File "/usr/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/pandas/core/frame.py", line 6259, in sort_values
    k = self._get_label_or_level_values(by, axis=axis)
  File "/usr/lib/python3.9/site-packages/pandas/core/generic.py", line 1779, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'Date'

I would expect that you would need to address this before proceeding with the actual projection.

EDIT: Looking at it again, I think the real problem is in how FmiApi is packing the data in the first place. It would be better to build a DataFrame from the start. I'll see what I can do to figure out what you can do, but I'll need to read up on Pandas before I can do that.

Please ignore the last comment in the previous post, I misunderstood what the pest() function is actually doing.

Ok, thank you. I will try to change my code.

I'll post the code I've been working on myself, but I am not sure if it really is going where you want it to. One thing I want to ask about is, would it be better to read the entire dataset for the period being studied rather than repeatedly requesting a large number of days individually?

import datetime
import requests
import lxml.etree as etree
from pprint import pprint
import pickle
import pandas as pd
import numpy as np

class FmiApi:
    """
    a minimalistic wrapper for the Finnish Meteorological Institute API
    """
    base_url = 'http://opendata.fmi.fi/wfs'
    fixed_params = {
        'service': 'WFS',
        'version':'2.0.0',
    }
    namespaces = {
        'wfs':'http://www.opengis.net/wfs/2.0',
        'gml':'http://www.opengis.net/gml/3.2',
        'BsWfs':'http://xml.fmi.fi/schema/wfs/2.0'
    }

    def __init__(self):
        pass

    @classmethod
    def get_daily_obs(cls, place, year, month, day, maxlocations=5):
        """
        get daily simple observation

        in:
            place [str]
            year [int]
            month [int]
            day [int]
            maxlocations [int] (optional)

        out:
            dictionary with tuple of locations and times keys
            and dictionary of parameter names and values as values
            tmon <=> temperature
            rrmon <=> rainfall
        """
        sdate = str(datetime.date(year,month,day)) # current date as a string

        # parameters for the API request
        parms = {
            'request': 'getFeature',
            'storedquery_id': 'fmi::observations::weather::monthly::simple',
            'place':place.lower(),
            'maxlocations':'%d'%maxlocations,
            'starttime': sdate + 'T00:00:00',
            'endtime': sdate + 'T00:00:00',
            }
        parms.update(FmiApi.fixed_params)   # add the fixed parameters to the parameter list

        # perform the API request
        try:
            resp = requests.get(FmiApi.base_url, params=parms, stream=True)
        except:
            raise Exception('request failed')

        # check the request's status code for errors
        if resp.status_code != 200: raise Exception('FMI returned status code %d'%resp.status_code)

        # parse the raw XML data into an element tree object, exit on failure 
        resp.raw.decode_content=True
        try:
            root = etree.parse(resp.raw)
        except:
            raise Exception('parsing failed')

        # destructure the XML data to get the individual data elements
        members = root.xpath('//wfs:member', namespaces=FmiApi.namespaces)
        weather = {}
        for member in members:
            ppos = member.xpath('.//gml:pos', namespaces=FmiApi.namespaces)[0].text.strip()
            ppos = tuple([float(x) for x in ppos.split()])
            ptime = member.xpath('.//BsWfs:Time', namespaces=FmiApi.namespaces)[0].text.strip()
            if not 'date' in weather.keys():
                weather['date'] = sdate
            if not ppos in weather:
                weather[ppos] = {}
            pname = member.xpath('.//BsWfs:ParameterName', namespaces=FmiApi.namespaces)[0].text.strip()
            pvalue = member.xpath('.//BsWfs:ParameterValue', namespaces=FmiApi.namespaces)[0].text.strip()
            weather[ppos][pname] = (pvalue, ptime)
        return weather

    @classmethod
    def get_monthly_obs(cls, place, year, month, maxlocations=5):
        """
        get monthly simple observation

        in:
            place [str]
            year [int]
            month [int]
            maxlocations [int] (optional)

        out:
            dictionary with tuple of locations and times keys
            and dictionary of parameter names and values as values
            tmon <=> temperature
            rrmon <=> rainfall
        """
        return FmiApi.get_daily_obs(place, year, month, 1, maxlocations)



def pest():
    """
        Generate a chart of the weather in the selected regions over a period of years.
    """
    df_weath = pd.DataFrame(pd.read_pickle('wdata.pkl'))
    df_weath.sort_values('date', inplace=True, ignore_index=True)
    for i in range(len(df_weath)):
        df_weath['date'].values[i] = datetime.datetime.fromisoformat(df_weath['date'].values[i])   #convert to datetime objects

    #input()

    wdata = []
    try:
        with open('wdata_daily.pkl', 'rb') as f:
            wdata=pickle.load(f)

    except:
        d0 = df_weath['date'].values[0]
        d1 = df_weath['date'].values[-1]
        n_days = (d1 - d0).days + 1  #number of days between d0 and d1
        for i_day in range(n_days):
            date = d0 + datetime.timedelta(days=i_day)
            try:
                print('requesting weather data for %s'%str(date))
                weather = FmiApi.get_daily_obs(u'helsinki', date.year, date.month, date.day, 5)
            except Exception as e:
                print(e)
                print('getting weather failed, skipping')
                continue
            wdata.append(weather)

            if len(wdata) == 0: raise Exception('No valid weather data')

            with open('wdata_daily.pkl', 'wb') as f:
                pickle.dump(wdata, f)

    pprint(wdata)

    # move weather data to a pandas dataframe (calculate avg over valid observations)
    date = []
    temps = []
    variables = ['tday']

    for wobs in wdata:
        avgs = {}
        for pos, obs in wobs.items():
            for var, xx in obs.items():
                if not var in variables: continue

                date = datetime.date(1990,6,15)
                if xx[0] != 'NaN':
                    val = float(xx[0])
                else:
                    val = None
                if not var in avgs: avgs[var] = []
                if val != None: avgs[var].append(val)
        vals = []
        for var in variables:  #calculate the average when available
            if len(avgs[var]) > 0:
                vals.append(sum(avgs[var])/len(avgs[var]))
            else:
                vals.append(None)
        wdata.append(temps)
        wdata.append(date)


def test():
    api = FmiApi()
    weather = api.get_monthly_obs('kuusamo', 1985, 1)
    pprint(weather)



if __name__ ==  "__main__":
    try:
        with open('wdata.pkl', 'rb') as f:
            data=pickle.load(f)

    except:
        weathers = []
        for year in range(1960,2021):
            for month in range(1,13):
                w = FmiApi.get_monthly_obs('oulu', year, month )

                weathers.append(w)
                pprint(w)

                with open('wdata.pkl', 'wb') as f:
                    pickle.dump(weathers, f)

    pest()

I've done some more experiments, and one of the things I've learned is that FMI only has records for the first day of each month, for some reason. This code will get all of the entries for a period, and when I run it there are only entries for the first of the month. Note that this returns a somewhat different format than that returned by the existing get_monthly_obs() function.

    @classmethod 
    def get_period_obs(cls, place, start_date, end_date, maxlocations=5):
        """
        get multiple observations over a period of dates.

        in:
            place [str]
            start_date [datetime]
            end_date [datetime]
            maxlocations [int] (optional)

        out:
            dictionary with date of the observation,  
            tuple of locations and times keys,
            and dictionary of parameter names and values as strs
            tmon <=> temperature
            rrmon <=> rainfall
        """
        # parameters for the API request
        parms = {
            'request': 'getFeature',
            'storedquery_id': 'fmi::observations::weather::monthly::simple',
            'place':place.lower(),
            'maxlocations':'%d'%maxlocations,
            'starttime': start_date.strftime('%Y-%m-%dT00:00:00'),
            'endtime': end_date.strftime('%Y-%m-%dT00:00:00'),
            }
        parms.update(FmiApi.fixed_params)   # add the fixed parameters to the parameter list       

        # perform the API request
        try:
            resp = requests.get(FmiApi.base_url, params=parms, stream=True)
        except:
            raise Exception('request failed')

        # check the request's status code for errors
        if resp.status_code != 200: raise Exception('FMI returned status code %d'%resp.status_code)

        # parse the raw XML data into an element tree object, exit on failure 
        resp.raw.decode_content=True
        try:
            root = etree.parse(resp.raw)
        except:
            raise Exception('parsing failed')

        # destructure the XML data to get the individual data elements
        members = root.xpath('//wfs:member', namespaces=FmiApi.namespaces)
        weather = {}
        for member in members:
            ptime = member.xpath('.//BsWfs:Time', namespaces=FmiApi.namespaces)[0].text.strip()
            if not ptime in weather:
                weather[ptime] = {}
            ppos = member.xpath('.//gml:pos', namespaces=FmiApi.namespaces)[0].text.strip()
            ppos = tuple([float(x) for x in ppos.split()])
            if not ppos in weather[ptime]:
                weather[ptime][ppos] = {}
            pname = member.xpath('.//BsWfs:ParameterName', namespaces=FmiApi.namespaces)[0].text.strip()
            pvalue = member.xpath('.//BsWfs:ParameterValue', namespaces=FmiApi.namespaces)[0].text.strip()
            weather[ptime][ppos][pname] = pvalue
        return weather

I tested this with the following driver code:

if __name__ ==  "__main__":
    wobs = FmiApi.get_period_obs('oulu', datetime.datetime(2019, 1, 1), datetime.datetime(2021,1,1))
    pprint(wobs)

Sorry to keep serial posting, but I also noticed the following line:

            'storedquery_id': 'fmi::observations::weather::monthly::simple',

Presumably, there is a form of this parameter which can specify a daily query rather than a monthly one. I don't know enough about the FMI API to determine what that query would be, however, nor where I would find that information.

Thank you so much! Your code will help me for sure! I'm still trying to find more information, but it is not easy.

I'll be honest with you, I know very little about statistical analysis and plotting, though I did help someone in this forum with a related plotting issue before.

I did find this article on LOWESS Regression in Python which could help with this aspect of it.

In addition to NumPy and Pandas, that tutorial uses a library called statsmodels, which had an existing implementation of the LOWESS algorithm.

It also uses the Plotly library for rendering the graphs (I was expecting that they would use MatPlotLib, which is widely used for this sort of problem, but presumably there is some advantage to using Plotly - I am not especially familiar with either of them so I can't say). Since I was going to ask how you intended to render the plots, I feel this might be something you need.

Thank you again!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.