Managing time series and Pastas models with a database

This notebook shows how Pastas time series and models can be managed and stored on disk.

Content

  1. The Connector object

    1. PasConnector

    2. Database structure

  2. Initializing a PastaStore

  3. Managing time series

    1. Adding oseries and stresses

    2. Accessing time series and metadata

    3. Deleting oseries and stresses

    4. Overview of oseries and stresses

  4. Managing Pastas models

    1. Creating a model

    2. Storing a model

    3. Loading a model

    4. Overview of models

    5. Deleting models

  5. Bulk operations

  6. Deleting databases


1. The Connector object

This sections shows how to initialize a connection to a new database (connecting to an existing database works the same way).

Import pastastore and some other modules:

from pathlib import Path

import pandas as pd
import pastas as ps

import pastastore as pst

pst.show_versions()
Pastastore version : 1.12.0

Python version     : 3.13.11
Pandas version     : 2.3.3
Matplotlib version : 3.10.8
Pastas version     : 1.12.0
PyYAML version     : 6.0.3

1.1 PasConnector

The PasConnector requires the path to a directory and a name for the connector. Data is stored in JSON files in the given directory.

path = "./pas"
name = "my_first_connector"

Initialize the PasConnector object:

conn = pst.PasConnector(name, path)
PasConnector: library 'oseries' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/oseries'
PasConnector: library 'stresses' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/stresses'
PasConnector: library 'models' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/models'
PasConnector: library 'oseries_models' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/oseries_models'
PasConnector: library 'stresses_models' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/stresses_models'

Let’s take a look at conn. This shows us how many oseries, stresses and models are contained in the store:

conn
<PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models

1.2 Database structure

The database/store contains 4 libraries or collections. Each of these contains specific data related to the project. The four libraries are:

  • oseries

  • stresses

  • models

  • oseries_models

These libraries are usually not meant to be accessed directly, but they can be accessed through the internal method conn._get_library():

# using the PasConnector
conn._get_library("stresses")
PosixPath('/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/stresses')

The library handles are not generally used directly but internally they manage the reading, writing and deleting of data from the database/store. In the case of the PasConnector, the library is just a path to a directory.


2. Initializing a PastaStore object

The PastaStore object is used process and use the data in the database. The connector objects only manage the reading/writing/deleting of data. The PastaStore contains all kinds of methods to actually do other stuff with that data.

In order to access the data the PastaStore object must be initialized with a Connector object.

pstore = pst.PastaStore(conn, name="my_first_project")

Let’s take a look at the object:

pstore
<PastaStore> my_first_project: 
 - <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models

The connector object is accessible through store.conn, so all of the methods defined in the connector objects can be accessed through e.g. store.conn.<method>. The most common methods are also registered under the store object for easier access. The following statements are equivalent.

pstore.conn.get_oseries
<bound method BaseConnector.get_oseries of <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models>
pstore.get_oseries
<bound method BaseConnector.get_oseries of <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models>

3. Managing time series

This section explains how time series can be added, retrieved or deleted from the database. We’ll be using the PastaStore instance we created before.

3.1 Adding oseries and stresses

Let’s read some data to put into the database as an oseries. The data we are using is in the tests/data directory.

datadir = Path("../../tests/data/")  # relative path to data directory
oseries1 = pd.read_csv(datadir / "head_nb1.csv", index_col=0, parse_dates=True)
oseries1.head()
head
date
1985-11-14 27.61
1985-11-28 27.73
1985-12-14 27.91
1985-12-28 28.13
1986-01-13 28.32

Add the time series to the oseries library using store.add_oseries. Metadata can be optionally be provided as a dictionary. In this example a dictionary x and y coordinates is passed as metadata which is convenient later for automatically creating Pastas models.

pstore.add_oseries(oseries1, "oseries1", metadata={"x": 100300, "y": 400400})

The series was added to the oseries library. Let’s confirm by looking at the store object:

pstore
<PastaStore> my_first_project: 
 - <PasConnector> 'my_first_connector': 1 oseries, 0 stresses, 0 models

Stresses can be added similarly using pstore.add_stress. The only thing to keep in mind when adding stresses is to pass the kind argument so that different types of stresses (i.e. precipitation or evaporation) can be distinguished. The code below reads the precipitation and evaporation csv-files and adds them to our project:

# prec
s = pd.read_csv(datadir / "rain_nb1.csv", index_col=0, parse_dates=True)
pstore.add_stress(s, "prec1", kind="prec", metadata={"x": 100300, "y": 400400})

# evap
s = pd.read_csv(datadir / "evap_nb1.csv", index_col=0, parse_dates=True)
pstore.add_stress(s, "evap1", kind="evap", metadata={"x": 100300, "y": 400400})
pstore
<PastaStore> my_first_project: 
 - <PasConnector> 'my_first_connector': 1 oseries, 2 stresses, 0 models

3.2 Accessing time series and metadata

Time series can be accessed through pstore.get_oseries() or pstore.get_stresses(). These methods accept just a name or a list of names. In the latter case a dictionary of dataframes is returned.

ts = pstore.get_oseries("oseries1")
ts.head()
1985-11-14    27.61
1985-11-28    27.73
1985-12-14    27.91
1985-12-28    28.13
1986-01-13    28.32
Name: oseries1, dtype: float64

Using a list of names:

stresses = pstore.get_stresses(["prec1", "evap1"])
stresses
{'prec1': 1980-01-01    0.0033
 1980-01-02    0.0025
 1980-01-03    0.0003
 1980-01-04    0.0075
 1980-01-05    0.0080
                ...  
 2016-10-27    0.0000
 2016-10-28    0.0000
 2016-10-29    0.0003
 2016-10-30    0.0000
 2016-10-31    0.0000
 Name: prec1, Length: 13454, dtype: float64,
 'evap1': 1980-01-01    0.0002
 1980-01-02    0.0003
 1980-01-03    0.0002
 1980-01-04    0.0001
 1980-01-05    0.0001
                ...  
 2016-11-18    0.0004
 2016-11-19    0.0003
 2016-11-20    0.0005
 2016-11-21    0.0003
 2016-11-22    0.0005
 Name: evap1, Length: 13476, dtype: float64}

The metadata of a time series can be accessed through pstore.get_metadata(). Provide the library and the name to load the metadata for an oseries…

meta = pstore.get_metadata("oseries", "oseries1")
meta
x y
name
oseries1 100300 400400

or for multiple stresses:

meta = pstore.get_metadata("stresses", ["prec1", "evap1"])
meta
x y kind
name
prec1 100300.0 400400.0 prec
evap1 100300.0 400400.0 evap

4.3 Deleting oseries and stresses

Deleting time series can be done using pstore.del_oseries or pstore.del_stress. These functions accept a single name or list of names of time series to delete.

4.4 Overview of oseries and stresses

An overview of the oseries and stresses is available through pstore.oseries and pstore.stresses. These are dataframes containing the metadata of all the time series. These dataframes are cached for performance. The cache is cleared when a time series is added or modified in the database.

pstore.oseries
x y
name
oseries1 100300 400400
pstore.stresses
x y kind
name
prec1 100300.0 400400.0 prec
evap1 100300.0 400400.0 evap

4. Managing Pastas models

This section shows how Pastas models can be created, stored, and loaded from the database.

4.1 Creating a model

Creating a new model is straightforward using pstore.create_model(). The add_recharge keyword argument allows the user to choose (default is True) whether recharge is automatically added to the model using the nearest precipitation and evaporation stations in the stresses library. Note that the x,y-coordinates of the stresses and oseries must be set in the metadata for each time series in order for this to work.

ml = pstore.create_model("oseries1", add_recharge=True)
ml
Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False)

4.2 Storing a model

The model that was created in the previous step is not automatically stored in the models library. Use store.add_model() to store the model. If the model already exists, an Exception is raised warning the user the model is already in the library. Use overwrite=True to add the model anyway.

Note: The model is stored without the time series. It is assumed the time series are already stored in the oseries or stresses libraries, making it redundant to store these again. When adding the model, the stored copy of the time series is compared to the version in the model to ensure these are the same. If not, an error is raised and the model cannot be stored. These validation options can be overridden, but that is only recommended for advanced users.
ml.solve()
Fit report oseries1                 Fit Statistics
==================================================
nfev     20                     EVP          93.19
nobs     644                    R2            0.93
noise    False                  RMSE          0.11
tmin     1985-11-14 00:00:00    AICc      -2809.63
tmax     2015-06-28 00:00:00    BIC       -2791.83
freq     D                      Obj           4.05
freq_obs None                   ___               
warmup   3650 days 00:00:00     Interp.         No
solver   LeastSquares           weights        Yes

Parameters (4 optimized)
==================================================
               optimal     initial  vary
recharge_A  631.256911  215.674528  True
recharge_a  165.320923   10.000000  True
recharge_f   -1.466316   -1.000000  True
constant_d   28.081353   27.900078  True
pstore.add_model(ml)
pstore.models
<ModelAccessor> 1 model(s): 
['oseries1']
pstore.model_names
['oseries1']
pstore.oseries_models
{'oseries1': ['oseries1']}

4.3 Loading a model

Loading a stored model is simple using pstore.get_models("<name>") or using the key-value interface for models: pstore.models["<name>"].

The model is stored as a dictionary (see ml.to_dict()) without the time series data. The time series in the model are picked up based on the names of those series from the respective libraries (oseries or stresses).

ml2 = pstore.get_models("oseries1")
ml2
Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False)

4.4 Overview of models

An overview of the models is available through pstore.models which lists the names of all the models:

pstore.models
<ModelAccessor> 1 model(s): 
['oseries1']

4.5 Deleting models

Deleting the model is done with pstore.del_models:

pstore.del_models("oseries1")
Deleted 1 model(s) from database.

Checking to see if it was indeed deleted:

pstore
<PastaStore> my_first_project: 
 - <PasConnector> 'my_first_connector': 1 oseries, 2 stresses, 0 models
pstore.models
<ModelAccessor> 0 model(s): 
[]
pstore.model_names
[]
pstore.oseries_models
{}

5. Bulk operations

The following bulk operations are available:

  • create_models: create models for all or a selection of oseries in database

  • solve_models: solve all or selection of models in database

  • model_results: get results for all or selection of models in database. Requires the art_tools module!

Let’s add some more data to the pastastore to show how the bulk operations work.

# oseries 2
o = pd.read_csv(datadir / "obs.csv", index_col=0, parse_dates=True)
pstore.add_oseries(o, "oseries2", metadata={"x": 100000, "y": 400000})

# prec 2
s = pd.read_csv(datadir / "rain.csv", index_col=0, parse_dates=True)
pstore.add_stress(s, "prec2", kind="prec", metadata={"x": 100000, "y": 400000})

# evap 2
s = pd.read_csv(datadir / "evap.csv", index_col=0, parse_dates=True)
pstore.add_stress(s, "evap2", kind="evap", metadata={"x": 100000, "y": 400000})
The Time Series 'oseries2' has nan-values. Pastas will use the fill_nan settings to fill up the nan-values.
The Time Series 'oseries2' has nan-values. Pastas will use the fill_nan settings to fill up the nan-values.

Let’s take a look at our PastaStore:

pstore
<PastaStore> my_first_project: 
 - <PasConnector> 'my_first_connector': 2 oseries, 4 stresses, 0 models

Let’s try using the bulk methods on our database. The pstore.create_models_bulk() method allows the user to get models for all or a selection of oseries in the database. Options include:

  • selecting specific oseries to create models for

  • automatically adding recharge based on nearest precipitation and evaporation stresses

  • solving the models

  • storing the models in the models library

Note: when using the progressbar, for a prettier result the pastas log level can be set to ERROR using: ps.set_log_level("ERROR") or ps.logger.setLevel("ERROR").

# to suppress most of the log messages
ps.logger.setLevel("ERROR")
errors = pstore.create_models_bulk()

To solve all or a selection of models use pstore.solve_models(). Options for this method include:

  • selecting models to solve

  • store results in models library

  • raise error (or not) when solving fails

  • print solve reports

pstore
<PastaStore> my_first_project: 
 - <PasConnector> 'my_first_connector': 2 oseries, 4 stresses, 2 models
pstore.solve_models(report=False)

Obtaining the model parameters and statistics is easy with pstore.get_parameters() and pstore.get_statistics(). Results can be obtained for all or a selection of models. The results are returned as DataFrames.

params = pstore.get_parameters()
params
recharge_A recharge_a recharge_f constant_d
oseries2 429.424496 147.226264 -1.913980 28.383626
oseries1 631.256911 165.320923 -1.466316 28.081353
stats = pstore.get_statistics(["evp", "rmse"])
stats
evp rmse
oseries2 90.448729 0.114373
oseries1 93.188297 0.112180

6. Deleting databases

The pastastore.util submodule contains functions for deleting database contents:

pst.util.delete_pastastore(pstore)
Deleting PasConnector database: 'my_first_connector' ... 
Done!