Managing time series and Pastas models with a database
This notebook shows how Pastas time series and models can be managed and stored on disk.
Content
1. The Connector object
This sections shows how to initialize a connection to a new database (connecting to an existing database works the same way).
Import pastastore and some other modules:
from pathlib import Path
import pandas as pd
import pastas as ps
import pastastore as pst
pst.show_versions()
Pastastore version : 1.12.0
Python version : 3.13.11
Pandas version : 2.3.3
Matplotlib version : 3.10.8
Pastas version : 1.12.0
PyYAML version : 6.0.3
1.1 PasConnector
The PasConnector requires the path to a directory and a name for the connector. Data is stored in JSON files in the given directory.
path = "./pas"
name = "my_first_connector"
Initialize the PasConnector object:
conn = pst.PasConnector(name, path)
PasConnector: library 'oseries' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/oseries'
PasConnector: library 'stresses' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/stresses'
PasConnector: library 'models' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/models'
PasConnector: library 'oseries_models' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/oseries_models'
PasConnector: library 'stresses_models' created in '/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/stresses_models'
Let’s take a look at conn. This shows us how many oseries, stresses and models are contained in the store:
conn
<PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models
1.2 Database structure
The database/store contains 4 libraries or collections. Each of these contains specific data related to the project. The four libraries are:
oseries
stresses
models
oseries_models
These libraries are usually not meant to be accessed directly, but they can be accessed through the internal method conn._get_library():
# using the PasConnector
conn._get_library("stresses")
PosixPath('/home/david/github/pastastore/examples/notebooks/pas/my_first_connector/stresses')
The library handles are not generally used directly but internally they manage the reading, writing and deleting of data from the database/store. In the case of the PasConnector, the library is just a path to a directory.
2. Initializing a PastaStore object
The PastaStore object is used process and use the data in the database. The
connector objects only manage the reading/writing/deleting of data. The
PastaStore contains all kinds of methods to actually do other stuff with
that data.
In order to access the data the PastaStore object must be initialized with a
Connector object.
pstore = pst.PastaStore(conn, name="my_first_project")
Let’s take a look at the object:
pstore
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models
The connector object is accessible through store.conn, so all of the methods defined in the connector objects can be accessed through e.g. store.conn.<method>. The most common methods are also registered under the store object for easier access. The following statements are equivalent.
pstore.conn.get_oseries
<bound method BaseConnector.get_oseries of <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models>
pstore.get_oseries
<bound method BaseConnector.get_oseries of <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models>
3. Managing time series
This section explains how time series can be added, retrieved or deleted from the database. We’ll be using the PastaStore instance we created before.
3.1 Adding oseries and stresses
Let’s read some data to put into the database as an oseries. The data we are using is in the tests/data directory.
datadir = Path("../../tests/data/") # relative path to data directory
oseries1 = pd.read_csv(datadir / "head_nb1.csv", index_col=0, parse_dates=True)
oseries1.head()
| head | |
|---|---|
| date | |
| 1985-11-14 | 27.61 |
| 1985-11-28 | 27.73 |
| 1985-12-14 | 27.91 |
| 1985-12-28 | 28.13 |
| 1986-01-13 | 28.32 |
Add the time series to the oseries library using store.add_oseries. Metadata can be optionally be provided as a dictionary. In this example a dictionary x and y coordinates is passed as metadata which is convenient later for automatically creating Pastas models.
pstore.add_oseries(oseries1, "oseries1", metadata={"x": 100300, "y": 400400})
The series was added to the oseries library. Let’s confirm by looking at the store object:
pstore
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 1 oseries, 0 stresses, 0 models
Stresses can be added similarly using pstore.add_stress. The only thing to keep in mind when adding stresses is to pass the kind argument so that different types of stresses (i.e. precipitation or evaporation) can be distinguished. The code below reads the precipitation and evaporation csv-files and adds them to our project:
# prec
s = pd.read_csv(datadir / "rain_nb1.csv", index_col=0, parse_dates=True)
pstore.add_stress(s, "prec1", kind="prec", metadata={"x": 100300, "y": 400400})
# evap
s = pd.read_csv(datadir / "evap_nb1.csv", index_col=0, parse_dates=True)
pstore.add_stress(s, "evap1", kind="evap", metadata={"x": 100300, "y": 400400})
pstore
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 1 oseries, 2 stresses, 0 models
3.2 Accessing time series and metadata
Time series can be accessed through pstore.get_oseries() or pstore.get_stresses(). These methods accept just a name or a list of names. In the latter case a dictionary of dataframes is returned.
ts = pstore.get_oseries("oseries1")
ts.head()
1985-11-14 27.61
1985-11-28 27.73
1985-12-14 27.91
1985-12-28 28.13
1986-01-13 28.32
Name: oseries1, dtype: float64
Using a list of names:
stresses = pstore.get_stresses(["prec1", "evap1"])
stresses
{'prec1': 1980-01-01 0.0033
1980-01-02 0.0025
1980-01-03 0.0003
1980-01-04 0.0075
1980-01-05 0.0080
...
2016-10-27 0.0000
2016-10-28 0.0000
2016-10-29 0.0003
2016-10-30 0.0000
2016-10-31 0.0000
Name: prec1, Length: 13454, dtype: float64,
'evap1': 1980-01-01 0.0002
1980-01-02 0.0003
1980-01-03 0.0002
1980-01-04 0.0001
1980-01-05 0.0001
...
2016-11-18 0.0004
2016-11-19 0.0003
2016-11-20 0.0005
2016-11-21 0.0003
2016-11-22 0.0005
Name: evap1, Length: 13476, dtype: float64}
The metadata of a time series can be accessed through pstore.get_metadata(). Provide the library and the name to load the metadata for an oseries…
meta = pstore.get_metadata("oseries", "oseries1")
meta
| x | y | |
|---|---|---|
| name | ||
| oseries1 | 100300 | 400400 |
or for multiple stresses:
meta = pstore.get_metadata("stresses", ["prec1", "evap1"])
meta
| x | y | kind | |
|---|---|---|---|
| name | |||
| prec1 | 100300.0 | 400400.0 | prec |
| evap1 | 100300.0 | 400400.0 | evap |
4.3 Deleting oseries and stresses
Deleting time series can be done using pstore.del_oseries or pstore.del_stress. These functions accept a single name or list of names of time series to delete.
4.4 Overview of oseries and stresses
An overview of the oseries and stresses is available through pstore.oseries and pstore.stresses. These are dataframes containing the metadata of all the time series. These dataframes are cached for performance. The cache is cleared when a time series is added or modified in the database.
pstore.oseries
| x | y | |
|---|---|---|
| name | ||
| oseries1 | 100300 | 400400 |
pstore.stresses
| x | y | kind | |
|---|---|---|---|
| name | |||
| prec1 | 100300.0 | 400400.0 | prec |
| evap1 | 100300.0 | 400400.0 | evap |
4. Managing Pastas models
This section shows how Pastas models can be created, stored, and loaded from the database.
4.1 Creating a model
Creating a new model is straightforward using pstore.create_model(). The
add_recharge keyword argument allows the user to choose (default is True)
whether recharge is automatically added to the model using the nearest
precipitation and evaporation stations in the stresses library. Note that the
x,y-coordinates of the stresses and oseries must be set in the metadata for
each time series in order for this to work.
ml = pstore.create_model("oseries1", add_recharge=True)
ml
Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False)
4.2 Storing a model
The model that was created in the previous step is not automatically stored in
the models library. Use store.add_model() to store the model. If the model
already exists, an Exception is raised warning the user the model is already in
the library. Use overwrite=True to add the model anyway.
ml.solve()
Fit report oseries1 Fit Statistics
==================================================
nfev 20 EVP 93.19
nobs 644 R2 0.93
noise False RMSE 0.11
tmin 1985-11-14 00:00:00 AICc -2809.63
tmax 2015-06-28 00:00:00 BIC -2791.83
freq D Obj 4.05
freq_obs None ___
warmup 3650 days 00:00:00 Interp. No
solver LeastSquares weights Yes
Parameters (4 optimized)
==================================================
optimal initial vary
recharge_A 631.256911 215.674528 True
recharge_a 165.320923 10.000000 True
recharge_f -1.466316 -1.000000 True
constant_d 28.081353 27.900078 True
pstore.add_model(ml)
pstore.models
<ModelAccessor> 1 model(s):
['oseries1']
pstore.model_names
['oseries1']
pstore.oseries_models
{'oseries1': ['oseries1']}
4.3 Loading a model
Loading a stored model is simple using pstore.get_models("<name>") or using
the key-value interface for models: pstore.models["<name>"].
The model is stored as a dictionary (see ml.to_dict()) without the time
series data. The time series in the model are picked up based on the names of
those series from the respective libraries (oseries or stresses).
ml2 = pstore.get_models("oseries1")
ml2
Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=False)
4.4 Overview of models
An overview of the models is available through pstore.models which lists the names of all the models:
pstore.models
<ModelAccessor> 1 model(s):
['oseries1']
4.5 Deleting models
Deleting the model is done with pstore.del_models:
pstore.del_models("oseries1")
Deleted 1 model(s) from database.
Checking to see if it was indeed deleted:
pstore
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 1 oseries, 2 stresses, 0 models
pstore.models
<ModelAccessor> 0 model(s):
[]
pstore.model_names
[]
pstore.oseries_models
{}
5. Bulk operations
The following bulk operations are available:
create_models: create models for all or a selection of oseries in databasesolve_models: solve all or selection of models in databasemodel_results: get results for all or selection of models in database. Requires theart_toolsmodule!
Let’s add some more data to the pastastore to show how the bulk operations work.
# oseries 2
o = pd.read_csv(datadir / "obs.csv", index_col=0, parse_dates=True)
pstore.add_oseries(o, "oseries2", metadata={"x": 100000, "y": 400000})
# prec 2
s = pd.read_csv(datadir / "rain.csv", index_col=0, parse_dates=True)
pstore.add_stress(s, "prec2", kind="prec", metadata={"x": 100000, "y": 400000})
# evap 2
s = pd.read_csv(datadir / "evap.csv", index_col=0, parse_dates=True)
pstore.add_stress(s, "evap2", kind="evap", metadata={"x": 100000, "y": 400000})
The Time Series 'oseries2' has nan-values. Pastas will use the fill_nan settings to fill up the nan-values.
The Time Series 'oseries2' has nan-values. Pastas will use the fill_nan settings to fill up the nan-values.
Let’s take a look at our PastaStore:
pstore
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 2 oseries, 4 stresses, 0 models
Let’s try using the bulk methods on our database. The pstore.create_models_bulk()
method allows the user to get models for all or a selection of oseries in the
database. Options include:
selecting specific oseries to create models for
automatically adding recharge based on nearest precipitation and evaporation stresses
solving the models
storing the models in the models library
Note: when using the progressbar, for a prettier result the pastas log
level can be set to ERROR using: ps.set_log_level("ERROR") or
ps.logger.setLevel("ERROR").
# to suppress most of the log messages
ps.logger.setLevel("ERROR")
errors = pstore.create_models_bulk()
To solve all or a selection of models use pstore.solve_models(). Options for this method include:
selecting models to solve
store results in models library
raise error (or not) when solving fails
print solve reports
pstore
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 2 oseries, 4 stresses, 2 models
pstore.solve_models(report=False)
Obtaining the model parameters and statistics is easy with
pstore.get_parameters() and pstore.get_statistics(). Results can be
obtained for all or a selection of models. The results are returned as
DataFrames.
params = pstore.get_parameters()
params
| recharge_A | recharge_a | recharge_f | constant_d | |
|---|---|---|---|---|
| oseries2 | 429.424496 | 147.226264 | -1.913980 | 28.383626 |
| oseries1 | 631.256911 | 165.320923 | -1.466316 | 28.081353 |
stats = pstore.get_statistics(["evp", "rmse"])
stats
| evp | rmse | |
|---|---|---|
| oseries2 | 90.448729 | 0.114373 |
| oseries1 | 93.188297 | 0.112180 |
6. Deleting databases
The pastastore.util submodule contains functions for deleting database contents:
pst.util.delete_pastastore(pstore)
Deleting PasConnector database: 'my_first_connector' ...
Done!