Managing time series and Pastas models with a database

This notebook shows how Pastas time series and models can be managed and stored on disk.

Content

The Connector object
1. PasConnector
2. Database structure
Initializing a PastaStore
Managing time series
1. Adding oseries and stresses
2. Accessing time series and metadata
3. Deleting oseries and stresses
4. Overview of oseries and stresses
Managing Pastas models
1. Creating a model
2. Storing a model
3. Loading a model
4. Overview of models
5. Deleting models
Bulk operations
Deleting databases

1. The Connector object

This sections shows how to initialize a connection to a new database (connecting to an existing database works the same way).

Import pastastore and some other modules:

[1]:

import pastastore as pst
import os
import pandas as pd
import pastas as ps

ps.show_versions()

Python version: 3.9.7
NumPy version: 1.21.2
Pandas version: 1.5.2
SciPy version: 1.10.0
Matplotlib version: 3.6.1
Numba version: 0.55.1
LMfit version: 1.0.3
Latexify version: Not Installed
Pastas version: 1.0.0b

1.1 PasConnector

The PasConnector requires the path to a directory and a name for the connector. Data is stored in JSON files in the given directory.

[2]:

path = "./pas"
name = "my_first_connector"

Initialize the PasConnector object:

[3]:

conn = pst.PasConnector(name, path)

PasConnector: library oseries created in /home/david/Github/pastastore/examples/notebooks/pas/oseries
PasConnector: library stresses created in /home/david/Github/pastastore/examples/notebooks/pas/stresses
PasConnector: library models created in /home/david/Github/pastastore/examples/notebooks/pas/models
PasConnector: library oseries_models created in /home/david/Github/pastastore/examples/notebooks/pas/oseries_models

Let’s take a look at conn. This shows us how many oseries, stresses and models are contained in the store:

[4]:

conn

[4]:

<PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models

1.2 Database structure

The database/store contains 4 libraries or collections. Each of these contains specific data related to the project. The four libraries are: - oseries - stresses - models - oseries_models

These libraries are usually not meant to be accessed directly, but they can be accessed through the internal method conn._get_library():

[5]:

# using the PasConnector
conn._get_library("stresses")

[5]:

'/home/david/Github/pastastore/examples/notebooks/pas/stresses'

The library handles are not generally used directly but internally they manage the reading, writing and deleting of data from the database/store. In the case of the PasConnector, the library is just a path to a directory.

2. Initializing a PastaStore object

The PastaStore object is used process and use the data in the database. The connector objects only manage the reading/writing/deleting of data. The PastaStore contains all kinds of methods to actually do other stuff with that data.

In order to access the data the PastaStore object must be initialized with a Connector object.

[6]:

pstore = pst.PastaStore("my_first_project", conn)

Let’s take a look at the object:

[7]:

pstore

[7]:

<PastaStore> my_first_project:
 - <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models

The connector object is accessible through store.conn, so all of the methods defined in the connector objects can be accessed through e.g. store.conn.<method>. The most common methods are also registered under the store object for easier access. The following statements are equivalent.

[8]:

pstore.conn.get_oseries

[8]:

<bound method BaseConnector.get_oseries of <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models>

[9]:

pstore.get_oseries

[9]:

<bound method BaseConnector.get_oseries of <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models>

3. Managing time series

This section explains how time series can be added, retrieved or deleted from the database. We’ll be using the PastaStore instance we created before.

3.1 Adding oseries and stresses

Let’s read some data to put into the database as an oseries. The data we are using is in the tests/data directory.

[10]:

datadir = "../../tests/data/"  # relative path to data directory
oseries1 = pd.read_csv(
    os.path.join(datadir, "head_nb1.csv"), index_col=0, parse_dates=True
)
oseries1.head()

[10]:

	head
date
1985-11-14	27.61
1985-11-28	27.73
1985-12-14	27.91
1985-12-28	28.13
1986-01-13	28.32

Add the time series to the oseries library using store.add_oseries. Metadata can be optionally be provided as a dictionary. In this example a dictionary x and y coordinates is passed as metadata which is convenient later for automatically creating Pastas models.

[11]:

pstore.add_oseries(oseries1, "oseries1", metadata={"x": 100300, "y": 400400})

The series was added to the oseries library. Let’s confirm by looking at the store object:

[12]:

pstore

[12]:

<PastaStore> my_first_project:
 - <PasConnector> 'my_first_connector': 1 oseries, 0 stresses, 0 models

Stresses can be added similarly using pstore.add_stress. The only thing to keep in mind when adding stresses is to pass the kind argument so that different types of stresses (i.e. precipitation or evaporation) can be distinguished. The code below reads the precipitation and evaporation csv-files and adds them to our project:

[13]:

# prec
s = pd.read_csv(os.path.join(datadir, "rain_nb1.csv"), index_col=0, parse_dates=True)
pstore.add_stress(s, "prec1", kind="prec", metadata={"x": 100300, "y": 400400})

# evap
s = pd.read_csv(os.path.join(datadir, "evap_nb1.csv"), index_col=0, parse_dates=True)
pstore.add_stress(s, "evap1", kind="evap", metadata={"x": 100300, "y": 400400})

[14]:

pstore

[14]:

<PastaStore> my_first_project:
 - <PasConnector> 'my_first_connector': 1 oseries, 2 stresses, 0 models

3.2 Accessing time series and metadata

Time series can be accessed through pstore.get_oseries() or pstore.get_stresses(). These methods accept just a name or a list of names. In the latter case a dictionary of dataframes is returned.

[15]:

ts = pstore.get_oseries("oseries1")
ts.head()

[15]:

	oseries1
1985-11-14	27.61
1985-11-28	27.73
1985-12-14	27.91
1985-12-28	28.13
1986-01-13	28.32

Using a list of names:

[16]:

stresses = pstore.get_stresses(["prec1", "evap1"])
stresses

[16]:

{'prec1':              prec1
 1980-01-01  0.0033
 1980-01-02  0.0025
 1980-01-03  0.0003
 1980-01-04  0.0075
 1980-01-05  0.0080
 ...            ...
 2016-10-27  0.0000
 2016-10-28  0.0000
 2016-10-29  0.0003
 2016-10-30  0.0000
 2016-10-31  0.0000

 [13454 rows x 1 columns],
 'evap1':              evap1
 1980-01-01  0.0002
 1980-01-02  0.0003
 1980-01-03  0.0002
 1980-01-04  0.0001
 1980-01-05  0.0001
 ...            ...
 2016-11-18  0.0004
 2016-11-19  0.0003
 2016-11-20  0.0005
 2016-11-21  0.0003
 2016-11-22  0.0005

 [13476 rows x 1 columns]}

The metadata of a time series can be accessed through pstore.get_metadata(). Provide the library and the name to load the metadata for an oseries…

[17]:

meta = pstore.get_metadata("oseries", "oseries1")
meta

[17]:

	x	y
name
oseries1	100300	400400

or for multiple stresses:

[18]:

meta = pstore.get_metadata("stresses", ["prec1", "evap1"])
meta

[18]:

	x	y	kind
name
prec1	100300.0	400400.0	prec
evap1	100300.0	400400.0	evap

4.3 Deleting oseries and stresses

Deleting time series can be done using pstore.del_oseries or pstore.del_stress. These functions accept a single name or list of names of time series to delete.

4.4 Overview of oseries and stresses

An overview of the oseries and stresses is available through pstore.oseries and pstore.stresses. These are dataframes containing the metadata of all the time series. These dataframes are cached for performance. The cache is cleared when a time series is added or modified in the database.

[19]:

pstore.oseries

[19]:

	x	y
name
oseries1	100300	400400

[20]:

pstore.stresses

[20]:

	x	y	kind
name
evap1	100300.0	400400.0	evap
prec1	100300.0	400400.0	prec

4. Managing Pastas models

This section shows how Pastas models can be created, stored, and loaded from the database.

4.1 Creating a model

Creating a new model is straightforward using pstore.create_model(). The add_recharge keyword argument allows the user to choose (default is True) whether recharge is automatically added to the model using the nearest precipitation and evaporation stations in the stresses library. Note that the x,y-coordinates of the stresses and oseries must be set in the metadata for each time series in order for this to work.

[21]:

ml = pstore.create_model("oseries1", add_recharge=True)
ml

[21]:

Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)

4.2 Storing a model

The model that was created in the previous step is not automatically stored in the models library. Use store.add_model() to store the model. If the model already exists, an Exception is raised warning the user the model is already in the library. Use overwrite=True to add the model anyway.

Note: The model is stored without the time series. It is assumed the time series are already stored in the oseries or stresses libraries, making it redundant to store these again. When adding the model, the stored copy of the time series is compared to the version in the model to ensure these are the same. If not, an error is raised and the model cannot be stored. These validation options can be overridden, but that is only recommended for advanced users.

[22]:

ml.solve()

Fit report oseries1                 Fit Statistics
==================================================
nfev    22                     EVP           92.91
nobs    644                    R2             0.93
noise   True                   RMSE           0.11
tmin    1985-11-14 00:00:00    AIC        -3257.53
tmax    2015-06-28 00:00:00    BIC        -3235.20
freq    D                      Obj            2.02
warmup  3650 days 00:00:00     ___
solver  LeastSquares           Interp.          No

Parameters (5 optimized)
==================================================
                optimal   stderr     initial  vary
recharge_A   686.247135   ±5.30%  215.674528  True
recharge_a   159.386040   ±5.01%   10.000000  True
recharge_f    -1.305359   ±4.04%   -1.000000  True
constant_d    27.920134   ±0.21%   27.900078  True
noise_alpha   49.911868  ±11.86%   15.000000  True

[23]:

pstore.add_model(ml)

[24]:

pstore.models

[24]:

['oseries1']

[25]:

pstore.oseries_models

[25]:

{'oseries1': ['oseries1']}

4.3 Loading a model

Loading a stored model is simple using pstore.get_models("<name>") or using the key-value interface for models: pstore.models["<name>"].

The model is stored as a dictionary (see ml.to_dict()) without the time series data. The time series in the model are picked up based on the names of those series from the respective libraries (oseries or stresses).

[26]:

ml2 = pstore.get_models("oseries1")
ml2

[26]:

Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)

4.4 Overview of models

An overview of the models is available through pstore.models which lists the names of all the models:

[27]:

pstore.models

[27]:

['oseries1']

4.5 Deleting models

Deleting the model is done with pstore.del_models:

[28]:

pstore.del_models("oseries1")

Checking to see if it was indeed deleted:

[29]:

pstore

[29]:

<PastaStore> my_first_project:
 - <PasConnector> 'my_first_connector': 1 oseries, 2 stresses, 0 models

[30]:

pstore.models

[30]:

[]

[31]:

pstore.oseries_models

[31]:

{}

5. Bulk operations

The following bulk operations are available: - create_models: create models for all or a selection of oseries in database - solve_models: solve all or selection of models in database - model_results: get results for all or selection of models in database. Requires the art_tools module!

Let’s add some more data to the pystore to show how the bulk operations work.

[32]:

# oseries 2
o = pd.read_csv(os.path.join(datadir, "obs.csv"), index_col=0, parse_dates=True)
pstore.add_oseries(o, "oseries2", metadata={"x": 100000, "y": 400000})

# prec 2
s = pd.read_csv(os.path.join(datadir, "rain.csv"), index_col=0, parse_dates=True)
pstore.add_stress(s, "prec2", kind="prec", metadata={"x": 100000, "y": 400000})

# evap 2
s = pd.read_csv(os.path.join(datadir, "evap.csv"), index_col=0, parse_dates=True)
pstore.add_stress(s, "evap2", kind="evap", metadata={"x": 100000, "y": 400000})

Let’s take a look at our PastaStore:

[33]:

pstore

[33]:

<PastaStore> my_first_project:
 - <PasConnector> 'my_first_connector': 2 oseries, 4 stresses, 0 models

Let’s try using the bulk methods on our database. The pstore.create_models_bulk() method allows the user to get models for all or a selection of oseries in the database. Options include: - selecting specific oseries to create models for - automatically adding recharge based on nearest precipitation and evaporation stresses - solving the models - storing the models in the models library

Note: when using the progressbar, for a prettier result the pastas log level can be set to ERROR using: ps.set_log_level("ERROR") or ps.logger.setLevel("ERROR").

[34]:

# to suppress most of the log messages
ps.logger.setLevel("ERROR")

[35]:

errors = pstore.create_models_bulk()

Bulk creation models: 100%|██████████| 2/2 [00:00<00:00, 12.09it/s]

To solve all or a selection of models use pstore.solve_models(). Options for this method include: - selecting models to solve - store results in models library - raise error (or not) when solving fails - print solve reports

[36]:

pstore

[36]:

<PastaStore> my_first_project:
 - <PasConnector> 'my_first_connector': 2 oseries, 4 stresses, 2 models

[37]:

pstore.solve_models(store_result=True, report=False)

Solving models: 100%|██████████| 2/2 [00:01<00:00,  1.73it/s]

Obtaining the model parameters and statistics is easy with pstore.get_parameters() and pstore.get_statistics(). Results can be obtained for all or a selection of models. The results are returned as DataFrames.

[38]:

params = pstore.get_parameters()
params

[38]:

	recharge_A	recharge_a	recharge_f	constant_d	noise_alpha
oseries2	607.978079	154.752928	-1.423119	28.094025	66.035700
oseries1	686.247135	159.386040	-1.305359	27.920134	49.911868

[39]:

stats = pstore.get_statistics(["evp", "rmse"])
stats

[39]:

	evp	rmse
oseries2	88.661133	0.124792
oseries1	92.913581	0.114420

6. Deleting databases

The pastastore.util submodule contains functions for deleting database contents:

[40]:

pst.util.delete_pastastore(pstore)

Deleting PasConnector database: 'my_first_connector' ...  Done!

[ ]: