[ ]:
%load_ext autoreload
%autoreload 2

xweights’ example notebook

This notbook shows some examples how to use xweights. We want to calculate time series of spatial averages.

1) Read input data

In the first step, read the input data from disk and create a dataset dictionary. The values of the dataset dictionary are the xarray DataSets read from the input, the keys denote the corresponding dataset names. If you have enough computational ressources you can use large intake-esm catalogue files as input too.

[ ]:
import xarray as xr

import xweights as xw
[ ]:
netcdffile = (
    "tas_EUR-11_MIROC-MIROC5_rcp85_r1i1p1_CLMcom-CCLM4-8-17_v1_mon_200601-201012.nc"
)
[ ]:
ds = xr.open_dataset(netcdffile)
dataset_dict = {"tas_EUR-11_MIROC-MIROC5_rcp85_r1i1p1_CLMcom-CCLM4-8-17_v1_mon": ds}
[ ]:
for name, ds in dataset_dict.items():
    break
ds

2) Get some xweights information

xweights contains some pre-defined regions. Each region contains a geopandas GeoDataFrmae with several geometries. The pre-defined regions are the following:

[ ]:
xw.which_regions()
[ ]:
xw.which_subregions("states")
[ ]:
xw.which_subregions("counties")
[ ]:
xw.which_subregions("counties_merged")
[ ]:
xw.which_subregions("prudence")
[ ]:
xw.which_subregions("ipcc")

3) Choose a region

Now, let’s focus on German ‘Bundesländer’. We read in the data as geopandas GeoDataFrame.

[ ]:
bdl = xw.get_region("states")
bdl.plot()

Of course, you can select singel geometries. Let’s take Schleswig-Holstein, Hamburg, Bremen and Lower Saxony.

[ ]:
bdl_sub = xw.get_region(
    "states",
    name=["01_Schleswig-Holstein", "02_Hamburg", "03_Niedersachsen", "04_Bremen"],
)
bdl_sub.plot()

Now, we combine those four geometries to one new geometry called ‘NothSeaCoast’

[ ]:
nsc = xw.get_region(
    "states",
    name=["01_Schleswig-Holstein", "02_Hamburg", "03_Niedersachsen", "04_Bremen"],
    merge=["all", "NothSeaCoast"],
)
nsc.plot()

You can read your own shapefile as region file. Then, you have to specify a column name. This column name helps xweights to differentiate the shp file. The example shpfile contains geometries of Neusiedel am See, a small town in Austria.

[ ]:
shpfile = "Seewinkel_map/Seewinkel.shp"
[ ]:
custom = xw.get_region(shpfile, column="VA")
custom.plot()

Here again, you can merge all this small geometries.

[ ]:
custom = xw.get_region(shpfile, column="VA", merge="VA")
custom.plot()

4) Calculate a time series of patial averages

Now, we can calculate a time series of spatial averages using a xarray dataset and geopandas GeoDataFrame.

Here for all ‘Bundesländer’:

[ ]:
out_bdl = xw.spatial_averaging(ds, bdl)
out_bdl
[ ]:
out_bdl["tas"].plot(col="geom", col_wrap=4)
[ ]:
out_nsc = xw.spatial_averaging(ds, nsc)
out_nsc
[ ]:
out_nsc.tas.plot()

Ok, that’s not pretty exciting I know :/

5) Consider land points only

With xesmf it is very easy to consider land points only. You just have to add a land-sea mask to the xarray Dataset. xesmf automatically considers the mask whil calculating spatial averages.

[ ]:
import xarray as xr

lsmfile = "sftlf_EUR-11_MIROC-MIROC5_rcp85_r0i0p0_CLMcom-CCLM4-8-17_v1_fx.nc"
lsm = xr.open_dataset(lsmfile)
lsm["sftlf"].plot()

We need to normalize the land-sea mask to the values 0 ant 1.

[ ]:
import numpy as np

lsm_norm = xr.where(lsm["sftlf"] > 0, 50, 0)
lsm_norm.plot()

Now, let’s add this mask to our dataset. And calculat for the PRUDENCE regions.

[ ]:
ds_lsm = ds.copy()
ds_lsm["mask"] = lsm_norm
[ ]:
prudence = xw.get_region("prudence")
[ ]:
out_prudence = xw.spatial_averaging(ds, prudence)
out_prudence_lsm = xw.spatial_averaging(ds_lsm, prudence)
out_prudence_lsm
[ ]:
out_prudence["tas"].plot(col="geom", col_wrap=4)
[ ]:
out_prudence_lsm["tas"].plot(col="geom", col_wrap=4)
[ ]:
(out_prudence["tas"] - out_prudence_lsm["tas"]).plot(col="geom", col_wrap=4)

6) Save results to pandas DataFrame

We can simplify the above steps and use a new the function. The result will be written to a pandas DataFrame.

[ ]:
df = xw.compute_weighted_means_ds(ds, region="prudence")
df

Let’s use some paramters to adjust the DataFrame.

[ ]:
df = xw.compute_weighted_means_ds(
    ds,
    region="prudence",
    ds_name="tas_EUR-11_MIROC-MIROC5_rcp85_r1i1p1_CLMcom-CCLM4-8-17_v1_mon",
    column_names=[
        "institute_id",
        "driving_model_id",
        "experiment_id",
        "driving_model_ensemble_member",
        "model_id",
        "rcm_version_id",
        "standard_name",
        "units",
    ],
)
df

Now, with a pandas DataFrame feel free to do a lots of statistics :)

Set the parameter output to a file name and save the DataFrame on disk.

7) All in one

Besides xw.compute_weighted_means_ds there is another function called xw.compute_weighted_means. This function combines all above steps to one.

[ ]:
df = xw.compute_weighted_means(
    dataset_dict,
    region="prudence",
    column_names=[
        "institute_id",
        "driving_model_id",
        "experiment_id",
        "driving_model_ensemble_member",
        "model_id",
        "rcm_version_id",
        "standard_name",
        "units",
    ],
)
df

As mentioned above, if you have enough computational ressources you can use an intake-esm catalogue file as input instead of one single netCDF file.

[ ]: