Managing time series and Pastas models with a database
This notebook shows how Pastas time series and models can be managed and stored on disk.
Content
The Connector object
PasConnector
Database structure
Initializing a PastaStore
Managing time series
Adding oseries and stresses
Accessing time series and metadata
Deleting oseries and stresses
Overview of oseries and stresses
Managing Pastas models
Creating a model
Storing a model
Loading a model
Overview of models
Deleting models
Bulk operations
Deleting databases
1. The Connector object
This sections shows how to initialize a connection to a new database (connecting to an existing database works the same way).
Import pastastore
and some other modules:
[1]:
import pastastore as pst
import os
import pandas as pd
import pastas as ps
ps.show_versions()
Python version: 3.9.7
NumPy version: 1.21.2
Pandas version: 1.5.2
SciPy version: 1.10.0
Matplotlib version: 3.6.1
Numba version: 0.55.1
LMfit version: 1.0.3
Latexify version: Not Installed
Pastas version: 1.0.0b
1.1 PasConnector
The PasConnector requires the path to a directory and a name for the connector. Data is stored in JSON files in the given directory.
[2]:
path = "./pas"
name = "my_first_connector"
Initialize the PasConnector
object:
[3]:
conn = pst.PasConnector(name, path)
PasConnector: library oseries created in /home/david/Github/pastastore/examples/notebooks/pas/oseries
PasConnector: library stresses created in /home/david/Github/pastastore/examples/notebooks/pas/stresses
PasConnector: library models created in /home/david/Github/pastastore/examples/notebooks/pas/models
PasConnector: library oseries_models created in /home/david/Github/pastastore/examples/notebooks/pas/oseries_models
Let’s take a look at conn
. This shows us how many oseries, stresses and models are contained in the store:
[4]:
conn
[4]:
<PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models
1.2 Database structure
The database/store contains 4 libraries or collections. Each of these contains specific data related to the project. The four libraries are: - oseries - stresses - models - oseries_models
These libraries are usually not meant to be accessed directly, but they can be accessed through the internal method conn._get_library()
:
[5]:
# using the PasConnector
conn._get_library("stresses")
[5]:
'/home/david/Github/pastastore/examples/notebooks/pas/stresses'
The library handles are not generally used directly but internally they manage the reading, writing and deleting of data from the database/store. In the case of the PasConnector
, the library is just a path to a directory.
2. Initializing a PastaStore object
The PastaStore
object is used process and use the data in the database. The connector objects only manage the reading/writing/deleting of data. The PastaStore
contains all kinds of methods to actually do other stuff with that data.
In order to access the data the PastaStore
object must be initialized with a Connector object.
[6]:
pstore = pst.PastaStore("my_first_project", conn)
Let’s take a look at the object:
[7]:
pstore
[7]:
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models
The connector object is accessible through store.conn
, so all of the methods defined in the connector objects can be accessed through e.g. store.conn.<method>
. The most common methods are also registered under the store
object for easier access. The following statements are equivalent.
[8]:
pstore.conn.get_oseries
[8]:
<bound method BaseConnector.get_oseries of <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models>
[9]:
pstore.get_oseries
[9]:
<bound method BaseConnector.get_oseries of <PasConnector> 'my_first_connector': 0 oseries, 0 stresses, 0 models>
3. Managing time series
This section explains how time series can be added, retrieved or deleted from the database. We’ll be using the PastaStore
instance we created before.
3.1 Adding oseries and stresses
Let’s read some data to put into the database as an oseries. The data we are using is in the tests/data
directory.
[10]:
datadir = "../../tests/data/" # relative path to data directory
oseries1 = pd.read_csv(
os.path.join(datadir, "head_nb1.csv"), index_col=0, parse_dates=True
)
oseries1.head()
[10]:
head | |
---|---|
date | |
1985-11-14 | 27.61 |
1985-11-28 | 27.73 |
1985-12-14 | 27.91 |
1985-12-28 | 28.13 |
1986-01-13 | 28.32 |
Add the time series to the oseries library using store.add_oseries
. Metadata can be optionally be provided as a dictionary. In this example a dictionary x and y coordinates is passed as metadata which is convenient later for automatically creating Pastas models.
[11]:
pstore.add_oseries(oseries1, "oseries1", metadata={"x": 100300, "y": 400400})
The series was added to the oseries library. Let’s confirm by looking at the store
object:
[12]:
pstore
[12]:
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 1 oseries, 0 stresses, 0 models
Stresses can be added similarly using pstore.add_stress
. The only thing to keep in mind when adding stresses is to pass the kind
argument so that different types of stresses (i.e. precipitation or evaporation) can be distinguished. The code below reads the precipitation and evaporation csv-files and adds them to our project:
[13]:
# prec
s = pd.read_csv(os.path.join(datadir, "rain_nb1.csv"), index_col=0, parse_dates=True)
pstore.add_stress(s, "prec1", kind="prec", metadata={"x": 100300, "y": 400400})
# evap
s = pd.read_csv(os.path.join(datadir, "evap_nb1.csv"), index_col=0, parse_dates=True)
pstore.add_stress(s, "evap1", kind="evap", metadata={"x": 100300, "y": 400400})
[14]:
pstore
[14]:
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 1 oseries, 2 stresses, 0 models
3.2 Accessing time series and metadata
Time series can be accessed through pstore.get_oseries()
or pstore.get_stresses()
. These methods accept just a name or a list of names. In the latter case a dictionary of dataframes is returned.
[15]:
ts = pstore.get_oseries("oseries1")
ts.head()
[15]:
oseries1 | |
---|---|
1985-11-14 | 27.61 |
1985-11-28 | 27.73 |
1985-12-14 | 27.91 |
1985-12-28 | 28.13 |
1986-01-13 | 28.32 |
Using a list of names:
[16]:
stresses = pstore.get_stresses(["prec1", "evap1"])
stresses
[16]:
{'prec1': prec1
1980-01-01 0.0033
1980-01-02 0.0025
1980-01-03 0.0003
1980-01-04 0.0075
1980-01-05 0.0080
... ...
2016-10-27 0.0000
2016-10-28 0.0000
2016-10-29 0.0003
2016-10-30 0.0000
2016-10-31 0.0000
[13454 rows x 1 columns],
'evap1': evap1
1980-01-01 0.0002
1980-01-02 0.0003
1980-01-03 0.0002
1980-01-04 0.0001
1980-01-05 0.0001
... ...
2016-11-18 0.0004
2016-11-19 0.0003
2016-11-20 0.0005
2016-11-21 0.0003
2016-11-22 0.0005
[13476 rows x 1 columns]}
The metadata of a time series can be accessed through pstore.get_metadata()
. Provide the library and the name to load the metadata for an oseries…
[17]:
meta = pstore.get_metadata("oseries", "oseries1")
meta
[17]:
x | y | |
---|---|---|
name | ||
oseries1 | 100300 | 400400 |
or for multiple stresses:
[18]:
meta = pstore.get_metadata("stresses", ["prec1", "evap1"])
meta
[18]:
x | y | kind | |
---|---|---|---|
name | |||
prec1 | 100300.0 | 400400.0 | prec |
evap1 | 100300.0 | 400400.0 | evap |
4.3 Deleting oseries and stresses
Deleting time series can be done using pstore.del_oseries
or pstore.del_stress
. These functions accept a single name or list of names of time series to delete.
4.4 Overview of oseries and stresses
An overview of the oseries and stresses is available through pstore.oseries
and pstore.stresses
. These are dataframes containing the metadata of all the time series. These dataframes are cached for performance. The cache is cleared when a time series is added or modified in the database.
[19]:
pstore.oseries
[19]:
x | y | |
---|---|---|
name | ||
oseries1 | 100300 | 400400 |
[20]:
pstore.stresses
[20]:
x | y | kind | |
---|---|---|---|
name | |||
evap1 | 100300.0 | 400400.0 | evap |
prec1 | 100300.0 | 400400.0 | prec |
4. Managing Pastas models
This section shows how Pastas models can be created, stored, and loaded from the database.
4.1 Creating a model
Creating a new model is straightforward using pstore.create_model()
. The add_recharge
keyword argument allows the user to choose (default is True) whether recharge is automatically added to the model using the nearest precipitation and evaporation stations in the stresses library. Note that the x,y-coordinates of the stresses and oseries must be set in the metadata for each time series in order for this to work.
[21]:
ml = pstore.create_model("oseries1", add_recharge=True)
ml
[21]:
Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)
4.2 Storing a model
The model that was created in the previous step is not automatically stored in the models library. Use store.add_model()
to store the model. If the model already exists, an Exception is raised warning the user the model is already in the library. Use overwrite=True
to add the model anyway.
Note: The model is stored without the time series. It is assumed the time series are already stored in the oseries or stresses libraries, making it redundant to store these again. When adding the model, the stored copy of the time series is compared to the version in the model to ensure these are the same. If not, an error is raised and the model cannot be stored. These validation options can be overridden, but that is only recommended for advanced users.
[22]:
ml.solve()
Fit report oseries1 Fit Statistics
==================================================
nfev 22 EVP 92.91
nobs 644 R2 0.93
noise True RMSE 0.11
tmin 1985-11-14 00:00:00 AIC -3257.53
tmax 2015-06-28 00:00:00 BIC -3235.20
freq D Obj 2.02
warmup 3650 days 00:00:00 ___
solver LeastSquares Interp. No
Parameters (5 optimized)
==================================================
optimal stderr initial vary
recharge_A 686.247135 ±5.30% 215.674528 True
recharge_a 159.386040 ±5.01% 10.000000 True
recharge_f -1.305359 ±4.04% -1.000000 True
constant_d 27.920134 ±0.21% 27.900078 True
noise_alpha 49.911868 ±11.86% 15.000000 True
[23]:
pstore.add_model(ml)
[24]:
pstore.models
[24]:
['oseries1']
[25]:
pstore.oseries_models
[25]:
{'oseries1': ['oseries1']}
4.3 Loading a model
Loading a stored model is simple using pstore.get_models("<name>")
or using the key-value interface for models: pstore.models["<name>"]
.
The model is stored as a dictionary (see ml.to_dict()
) without the time series data. The time series in the model are picked up based on the names of those series from the respective libraries (oseries or stresses).
[26]:
ml2 = pstore.get_models("oseries1")
ml2
[26]:
Model(oseries=oseries1, name=oseries1, constant=True, noisemodel=True)
4.4 Overview of models
An overview of the models is available through pstore.models
which lists the names of all the models:
[27]:
pstore.models
[27]:
['oseries1']
4.5 Deleting models
Deleting the model is done with pstore.del_models
:
[28]:
pstore.del_models("oseries1")
Checking to see if it was indeed deleted:
[29]:
pstore
[29]:
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 1 oseries, 2 stresses, 0 models
[30]:
pstore.models
[30]:
[]
[31]:
pstore.oseries_models
[31]:
{}
5. Bulk operations
The following bulk operations are available: - create_models
: create models for all or a selection of oseries in database - solve_models
: solve all or selection of models in database - model_results
: get results for all or selection of models in database. Requires the art_tools
module!
Let’s add some more data to the pystore to show how the bulk operations work.
[32]:
# oseries 2
o = pd.read_csv(os.path.join(datadir, "obs.csv"), index_col=0, parse_dates=True)
pstore.add_oseries(o, "oseries2", metadata={"x": 100000, "y": 400000})
# prec 2
s = pd.read_csv(os.path.join(datadir, "rain.csv"), index_col=0, parse_dates=True)
pstore.add_stress(s, "prec2", kind="prec", metadata={"x": 100000, "y": 400000})
# evap 2
s = pd.read_csv(os.path.join(datadir, "evap.csv"), index_col=0, parse_dates=True)
pstore.add_stress(s, "evap2", kind="evap", metadata={"x": 100000, "y": 400000})
Let’s take a look at our PastaStore
:
[33]:
pstore
[33]:
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 2 oseries, 4 stresses, 0 models
Let’s try using the bulk methods on our database. The pstore.create_models_bulk()
method allows the user to get models for all or a selection of oseries in the database. Options include: - selecting specific oseries to create models for - automatically adding recharge based on nearest precipitation and evaporation stresses - solving the models - storing the models in the models library
Note: when using the progressbar, for a prettier result the pastas log level can be set to ERROR using: ps.set_log_level("ERROR")
or ps.logger.setLevel("ERROR")
.
[34]:
# to suppress most of the log messages
ps.logger.setLevel("ERROR")
[35]:
errors = pstore.create_models_bulk()
Bulk creation models: 100%|██████████| 2/2 [00:00<00:00, 12.09it/s]
To solve all or a selection of models use pstore.solve_models()
. Options for this method include: - selecting models to solve - store results in models library - raise error (or not) when solving fails - print solve reports
[36]:
pstore
[36]:
<PastaStore> my_first_project:
- <PasConnector> 'my_first_connector': 2 oseries, 4 stresses, 2 models
[37]:
pstore.solve_models(store_result=True, report=False)
Solving models: 100%|██████████| 2/2 [00:01<00:00, 1.73it/s]
Obtaining the model parameters and statistics is easy with pstore.get_parameters()
and pstore.get_statistics()
. Results can be obtained for all or a selection of models. The results are returned as DataFrames.
[38]:
params = pstore.get_parameters()
params
[38]:
recharge_A | recharge_a | recharge_f | constant_d | noise_alpha | |
---|---|---|---|---|---|
oseries2 | 607.978079 | 154.752928 | -1.423119 | 28.094025 | 66.035700 |
oseries1 | 686.247135 | 159.386040 | -1.305359 | 27.920134 | 49.911868 |
[39]:
stats = pstore.get_statistics(["evp", "rmse"])
stats
[39]:
evp | rmse | |
---|---|---|
oseries2 | 88.661133 | 0.124792 |
oseries1 | 92.913581 | 0.114420 |
6. Deleting databases
The pastastore.util
submodule contains functions for deleting database contents:
[40]:
pst.util.delete_pastastore(pstore)
Deleting PasConnector database: 'my_first_connector' ... Done!
[ ]: