The PastaStore YAML interface 

This notebook shows how Pastas models can be built from YAML files, using Pastastore.

Contents

Why YAML?
An example YAML file
The PastaStore.yaml interface
More examples

import os
from io import StringIO

import pastas as ps
import yaml

import pastastore as pst

# Set pastas log messages to ERROR level
ps.set_log_level("ERROR")

Why YAML?

YAML, according to the official webpage is “YAML is a human-friendly data serialization language for all programming languages”. The file structure is similar to JSON (nested dictionaries) and therefore similar to the storage format for pastas Models, i.e. .pas-files.

So why develop a method for reading/writing pastas models to and from YAML files? The human-readability of the file structure in combination with leveraging tools in pastastore allow users to quickly build pastas Models using a mini-language, without having to explicitly program each line of code. When users are working with a lot of models with different model structures, the YAML files can provide a simple and convenient interface to structure this work, without having to search through lots of lines of code.

Whether it is useful to “program” the models in YAML or in normal Python/pastas code depends on the application or project. This feature was developed to give users an extra option that combines human-readable files with useful tools from the pastastore to quickly develop pastas models.

An example YAML file 

A YAML file is text file that uses Python-style indentation to indicate nesting. The following shows the structure of a YAML file for defining a pastas model.

# comments are allowed, this is a pastas Model:

my_first_model:                   # model name
  oseries: head_nb1               # head time series name, obtained from pastastore
  stressmodels:                   # stressmodels dictionary
    recharge:                     # name of the recharge stressmodel
      class: RechargeModel        # type of pastas StressModel
      prec: prec1                 # name of precipitation stress, obtained from pastastore
      evap: evap1                 # name of evaporation stress, obtained from pastastore
      recharge: Linear            # pastas recharge type      
      rfunc: Exponential          # rfunc

Reading this file converts it into a nested dictionary, as shown below. This dictionary can be used to (re-)construct pastas models, as is shown in the next sections.

yaml_file = """
# comments are allowed, this is a pastas Model:

my_first_model:                   # model name
  oseries: head_nb1               # head time series name, obtained from pastastore
  stressmodels:                   # stressmodels dictionary
    recharge:                     # name of the recharge stressmodel
      class: RechargeModel        # type of pastas StressModel
      prec: prec1                 # name of precipitation stress, from pastastore
      evap: evap1                 # name of evaporation stress, from pastastore
      recharge: Linear            # pastas recharge type
      rfunc: Exponential          # response function
"""

# load the file
d = yaml.load(StringIO(yaml_file), Loader=yaml.Loader)

# view the resulting dictionary
d

{'my_first_model': {'oseries': 'head_nb1',
  'stressmodels': {'recharge': {'class': 'RechargeModel',
    'prec': 'prec1',
    'evap': 'evap1',
    'recharge': 'Linear',
    'rfunc': 'Exponential'}}}}

The PastaStore.yaml interface 

The logic for reading/writing YAML files is accessed through the PastaStore.yaml interface. First we need a PastaStore and fill it with some data to showcase this. Load the example dataset from the PastaStore (included since version 0.8.0 (note, this data is only available if the pastastore repository was cloned and not if it was installed with pip).

from pastastore.datasets import example_pastastore  # noqa: E402

pstore = example_pastastore()
pstore

<PastaStore> example: 
 - <DictConnector> 'my_db': 5 oseries, 15 stresses, 0 models

Let’s check which oseries are available:

pstore.oseries

	x	y
name
oseries1	165000.0	424000.0
oseries2	164000.0	423000.0
oseries3	165554.0	422685.0
head_nb5	200000.0	450000.0
head_mw	85850.0	383362.0

Building model(s) from a YAML file 

Note that pstore.yaml.load() can take both a path to a YAML file as its first argument, as well as a YAML-formatted string like the one defined below.

my_first_yaml = """
my_first_model:                   # model name
  oseries: oseries1               # head time series name, obtained from pastastore
  stressmodels:                   # stressmodels dictionary
    recharge:                     # name of the recharge stressmodel
      class: RechargeModel        # type of pastas StressModel
      prec: prec1                 # name of precipitation stress, from pastastore
      evap: evap1                 # name of evaporation stress, from pastastore
      recharge: Linear            # pastas recharge type
      rfunc: Exponential          # response function
"""

ml = pstore.yaml.load(my_first_yaml)[0]  # returns a list, so get the first entry
ml

Building model 'my_first_model' for oseries 'oseries1'
| parsing stressmodel: 'recharge'

Model(oseries=oseries1, name=my_first_model, constant=True, noisemodel=False)

ml.solve(report=False)
_ = ml.plots.results()

../_images/e134e75a987af99cb145ff70edb6383310503715d8d6c075993d404a5a7d2629.png

A YAML file can contain multiple models

my_multi_model_yaml = """
my_first_model:                   # model name
  oseries: oseries1               # head time series name, obtained from pastastore
  stressmodels:                   # stressmodels dictionary
    recharge:                     # name of the recharge stressmodel
      class: RechargeModel        # type of pastas StressModel
      prec: prec1                 # name of precipitation stress, from pastastore
      evap: evap1                 # name of evaporation stress, from pastastore
      recharge: Linear            # pastas recharge type
      rfunc: Exponential          # response function
      
my_second_model:                  # model name
  oseries: oseries1               # head time series name, obtained from pastastore
  stressmodels:                   # stressmodels dictionary
    recharge:                     # name of the recharge stressmodel
      class: RechargeModel        # type of pastas StressModel
      prec: prec1                 # name of precipitation stress, from pastastore
      evap: evap1                 # name of evaporation stress, from pastastore
      recharge: FlexModel         # pastas recharge type
      rfunc: Exponential          # response function
"""

models = pstore.yaml.load(my_multi_model_yaml)
models

Building model 'my_first_model' for oseries 'oseries1'
| parsing stressmodel: 'recharge'
Building model 'my_second_model' for oseries 'oseries1'
| parsing stressmodel: 'recharge'

[Model(oseries=oseries1, name=my_first_model, constant=True, noisemodel=False),
 Model(oseries=oseries1, name=my_second_model, constant=True, noisemodel=False)]

Note that these models are not automatically added to the PastaStore. They are only created. To store them use PastaStore.add_model.

for ml in models:
    pstore.add_model(ml)

pstore

<PastaStore> example: 
 - <DictConnector> 'my_db': 5 oseries, 15 stresses, 2 models

Writing model(s) to a YAML file 

Writing an existing model to a YAML file is done with PastaStore.yaml.export_model(). The resulting YAML file contains a lot more information as all model information is stored in the file, similar to saving a model as .pas-file with ml.to_file(). It can be useful to take a look at this file as a template for writing your own YAML files.

pstore.yaml.export_model(ml)

The YAML file can be simplified with the minimal_yaml keyword argument.

Warning: Using the `minimal_yaml=True` option can lead to a different model than the one being exported as certain important model settings might have been removed in the resulting YAML file. Use with care!

ml.name = ml.name + "_minimal"
pstore.yaml.export_model(ml, minimal_yaml=True)

Additionally, the use_nearest option fills in "nearest <n> <kind>" instead of the names of the time series, filling in <n> and <kind> where possible. This option is only used when minimal_yaml=True.

Warning: This option does not (yet) check whether the time series are actually nearest, it simply fills in "nearest" for all stresses and fills in "kind" where possible.

ml.name = ml.name + "_nearest"
pstore.yaml.export_model(ml, minimal_yaml=True, use_nearest=True)

The models can als be written to a single YAML-file using PastaStore.yaml.export_models(). The split=False kwarg forces all models to be written to the same file.

pstore.yaml.export_models(models=models, split=False)

“Nearest” options for time series

The YAML file format introduces some useful features that leverage useful tools in PastaStore. Instead of explicitly defining the time series to use for a particular stressmodel, there is a nearest option. Note that this requires the metadata of the time series in the PastaStore to be properly defined, with x and y coordinates for all time series.

First let’s revisit the first example YAML file, but this time use the “nearest” option to select the precipitation and evaporation time series. After nearest the kind identifier is supplied to tell the PastaStore which types of stresses to consider when looking for the nearest one.

nearest_yaml = """
my_first_model:                   # model name
  oseries: oseries1               # head time series name, obtained from pastastore
  stressmodels:                   # stressmodels dictionary
    recharge:                     # name of the recharge stressmodel
      class: RechargeModel        # type of pastas StressModel
      prec: nearest prec          # nearest stress with kind="prec", from pastastore
      evap: nearest evap          # nearest stress with kind="evap", from pastastore
      recharge: Linear            # pastas recharge type
      rfunc: Exponential          # response function
"""

ml = pstore.yaml.load(nearest_yaml)[0]  # returns a list, so get the first entry
ml

Building model 'my_first_model' for oseries 'oseries1'
| parsing stressmodel: 'recharge'
  | using nearest stress with kind='prec': 'prec1'
  | using nearest stress with kind='evap': 'evap1'

Model(oseries=oseries1, name=my_first_model, constant=True, noisemodel=False)

The nearest option is parsed depending on the type of stressmodel. Generally, the form is nearest <kind>, but for the RechargeModel, just providing nearest will assume the kind is kind="prec" or kind="evap".

For WellModel, the number of nearest stresses can be passed as well, e.g. nearest <n> <kind>.

The following examples illustrate this:

full_nearest_yaml = """
nearest_model_1:                  # model name
  oseries: head_nb5               # head time series name, obtained from pastastore
  stressmodels:                   # stressmodels dictionary
    recharge:                     # name of the recharge stressmodel
      class: RechargeModel        # type of pastas stressmodel
      prec: nearest               # nearest stress with kind="prec", from pastastore
      evap: nearest evap          # nearest stress with kind="evap", from pastastore
      recharge: Linear            # pastas recharge type
      rfunc: Exponential          # response function
    river:                        # name for river stressmodel
      class: StressModel          # type of pastas stressmodel
      stress: nearest riv         # nearest stress with kind="riv", from pastastore
      rfunc: One                  # response function
      settings: level             # time series settings
      
      
nearest_model_2:
  oseries: head_mw
  stressmodels:                   
    recharge:                     
      class: RechargeModel  
      prec: nearest prec               
      evap: nearest          
      recharge: Linear            
      rfunc: Exponential          
    wells:                        
      class: WellModel    
      stress: nearest 2 well         
      rfunc: HantushWellModel
      up: False
"""

pstore.oseries

	x	y
name
oseries1	165000.0	424000.0
oseries2	164000.0	423000.0
oseries3	165554.0	422685.0
head_nb5	200000.0	450000.0
head_mw	85850.0	383362.0

models = pstore.yaml.load(full_nearest_yaml)

Building model 'nearest_model_1' for oseries 'head_nb5'
| parsing stressmodel: 'recharge'
  | using nearest stress with kind='prec': 'prec_nb5'
  | using nearest stress with kind='evap': 'evap_nb5'
| parsing stressmodel: 'river'
  | using nearest stress with kind='riv': riv_nb5
Building model 'nearest_model_2' for oseries 'head_mw'
| parsing stressmodel: 'recharge'
  | using nearest stress with kind='prec': 'prec_mw'
  | using nearest stress with kind='evap': 'evap_mw'
| parsing stressmodel: 'wells'
  | using 2 nearest stress(es) with kind='well': ['extraction_2' 'extraction_3']

ml = models[0]  # get the first model from the list

ml.solve(report=False)
ml.plots.results()

[<Axes: xlabel='Date', ylabel='Head'>,
 <Axes: >,
 <Axes: title={'right': "Stresses: ['prec_nb5', 'evap_nb5']"}, ylabel='Rise'>,
 <Axes: title={'center': 'Step response'}>,
 <Axes: title={'right': "Stresses: ['riv_nb5']"}, ylabel='Rise'>,
 <Axes: title={'center': 'Step response'}>,
 <Axes: title={'left': 'Model parameters ($n_c$=5)'}>]

../_images/6d48eed014f7fcebde54a01cc85452f14dba4c64cb64452eecfa6646f35d1e2e.png

Defaults 

The Pastastore YAML interface adds some additional defaults. These defaults allow the user to only provide only certain information in a YAML file in order to construct a model. These defaults are determined based on commonly used options. It should be noted that these defaults are not necessarily appropriate in all situations, and it is highly recommended to try different models with different options. These defaults are therefore implemented to facilitate building models, but should not be deemed holy.

The YAML interface mostly uses the Pastas defaults, but adds some additional logic for stressmodels. When default settings implemented in the YAML interface are implemented, this is logged to the console.

RechargeModel:
- If stressmodel name is one of “rch”, “rech”, “recharge”, or “rechargemodel”, assume stressmodel type is RechargeModel.
- If no “prec” or “evap” keys are provided for RechargeModel, use the “nearest” option.
- Default rfunc for RechargeModel is “Exponential”.
- prec: accepts nearest or nearest <kind>, if only nearest is provided, stresses in PastaStore must be labelled with kind=”prec”
- evap: accepts nearest or nearest <kind>, if only nearest is provided, stresses in PastaStore must be labelled with kind=”evap”
StressModel:
- If no “stressmodel” key is contained in dictionary, assume stressmodel type is StressModel
- Default rfunc for StressModel is “Gamma”.
- stress: accepts nearest or nearest <kind>, if only “nearest” is provided, uses whichever stress is nearest.
WellModel:
- Default rfunc for WellModel is “HantushWellModel”.
- If “up” is not provided, assume up=False, i.e. positive discharge time series indicates pumping.
- stress: accepts nearest, nearest <n> and nearest <n> <kind>, where n is the number of wells to add. If kind is not passed, stresses must be labelled with kind=”well” in PastaStore. If n is not passed, assumes n=1.

This is the shortest possible YAML file for a model with recharge, that makes use of all of the defaults for RechargeModel:

minimal_yaml = """
ml_minimal:
  oseries: oseries2
  stressmodels:
    recharge:
"""

Note that the YAML load method recognizes the stressmodel name “recharge” and assumes the type of stress model should be RechargeModel. Additionally note the defaults as no other information is provided.

prec –> nearest stress with kind=”prec”
evap –> nearest stress with kind=”evap”
recharge –> Linear
rfunc –> Exponential

ml = pstore.yaml.load(minimal_yaml)[0]  # returns a list, so get the first entry

Building model 'ml_minimal' for oseries 'oseries2'
| parsing stressmodel: 'recharge'
| no StressModel type provided, using 'RechargeModel' based on stressmodel name.
  | using nearest stress with kind='prec': 'prec2'
  | using nearest stress with kind='evap': 'evap2'
  | no 'rfunc' provided, using 'Exponential'
  | no 'recharge' type provided, using 'Linear'

ml.solve(report=False)
_ = ml.plots.results()

../_images/0d2c928d4c3378130acb4dd2a20c3ae0e860ad15c284316c235c05cfa2501ae8.png

More examples 

yaml_examples = """
# Pastas YAML example file
# ------------------------

# 1. Explicitly provide oseries, stresses names rfunc and
#    recharge type.

ml_explicit:
  settings:
    freq: D
  oseries: oseries1
  stressmodels:
    recharge:
      class: RechargeModel
      prec: prec1
      evap: evap1
      rfunc: Exponential
      recharge: Linear

# 2. Provide oseries, stresses names but use defaults for
#    other settings:

ml_stresses:
  oseries: oseries1
  stressmodels:
    recharge:
      prec: prec1
      evap: evap1

# 3. Use "nearest" to obtain nearest precipitation and evaporation
#    time series. Requires x, y data to be present in oseries and
#    stresses metadata.

ml_nearest:
  oseries: oseries1
  stressmodels:
    recharge:
      prec: nearest prec
      evap: nearest
"""

models = pstore.yaml.load(yaml_examples)  # returns a list
models

Building model 'ml_explicit' for oseries 'oseries1'
| parsing stressmodel: 'recharge'
Building model 'ml_stresses' for oseries 'oseries1'
| parsing stressmodel: 'recharge'
| no StressModel type provided, using 'RechargeModel' based on stressmodel name.
  | no 'rfunc' provided, using 'Exponential'
  | no 'recharge' type provided, using 'Linear'
Building model 'ml_nearest' for oseries 'oseries1'
| parsing stressmodel: 'recharge'
| no StressModel type provided, using 'RechargeModel' based on stressmodel name.
  | using nearest stress with kind='prec': 'prec1'
  | using nearest stress with kind='evap': 'evap1'
  | no 'rfunc' provided, using 'Exponential'
  | no 'recharge' type provided, using 'Linear'

[Model(oseries=oseries1, name=ml_explicit, constant=True, noisemodel=False),
 Model(oseries=oseries1, name=ml_stresses, constant=True, noisemodel=False),
 Model(oseries=oseries1, name=ml_nearest, constant=True, noisemodel=False)]

The first and last models are identical, as shown in the comparison below (except for the name obviously). The second one is also the same, but is not shown in the comparison below.

pst.util.compare_models(
    models[0], models[-1], detailed_comparison=True, style_output=True
)

	model 0	model 1	comparison
name:	ml_explicit	ml_nearest	False
- settings: tmin	None	None	True
- settings: tmax	None	None	True
- settings: freq	D	D	True
- settings: warmup	3650 days 00:00:00	3650 days 00:00:00	True
- settings: time_offset	0 days 00:00:00	0 days 00:00:00	True
- settings: noise	False	False	True
- settings: solver	None	None	True
- settings: fit_constant	True	True	True
- settings: freq_obs	None	None	True
oseries: series_original	True	True	True
oseries: series	True	True	True
stressmodel: 'recharge'	recharge	recharge	True
- rfunc	Exponential	Exponential	True
- time series: 'prec1'	prec1	prec1	True
- prec1 settings: freq	D	D	True
- prec1 settings: sample_up	bfill	bfill	True
- prec1 settings: sample_down	mean	mean	True
- prec1 settings: fill_nan	0.000000	0.000000	True
- prec1 settings: fill_before	mean	mean	True
- prec1 settings: fill_after	mean	mean	True
- prec1 settings: tmin	2010-01-01 00:00:00	2010-01-01 00:00:00	True
- prec1 settings: tmax	2015-12-31 00:00:00	2015-12-31 00:00:00	True
- prec1 settings: time_offset	0 days 00:00:00	0 days 00:00:00	True
- prec1: series_original	True	True	True
- prec1: series	True	True	True
- time series: 'evap1'	evap1	evap1	True
- evap1 settings: freq	D	D	True
- evap1 settings: sample_up	bfill	bfill	True
- evap1 settings: sample_down	mean	mean	True
- evap1 settings: fill_nan	interpolate	interpolate	True
- evap1 settings: fill_before	mean	mean	True
- evap1 settings: fill_after	mean	mean	True
- evap1 settings: tmin	2010-01-01 00:00:00	2010-01-01 00:00:00	True
- evap1 settings: tmax	2015-12-31 00:00:00	2015-12-31 00:00:00	True
- evap1 settings: time_offset	0 days 00:00:00	0 days 00:00:00	True
- evap1: series_original	True	True	True
- evap1: series	True	True	True
param: recharge_A (init)	211.567577	211.567577	True
param: recharge_A (opt)	nan	nan	True
param: recharge_a (init)	10.000000	10.000000	True
param: recharge_a (opt)	nan	nan	True
param: recharge_f (init)	-1.000000	-1.000000	True
param: recharge_f (opt)	nan	nan	True
param: constant_d (init)	27.927937	27.927937	True
param: constant_d (opt)	nan	nan	True

Clean up the written YAML files.

for f in [fi for fi in os.listdir(".") if fi.endswith(".yaml")]:
    os.remove(f)

The PastaStore YAML interface

Contents

Why YAML?

An example YAML file

The PastaStore.yaml interface

Building model(s) from a YAML file

Writing model(s) to a YAML file