{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Parallel processing with Pastastore\n", "\n", "This notebook shows parallel processing capabilities of `PastaStore`.\n", "\n", "\n", "
\n", "\n", "Note \n", "\n", "Parallel processing is platform dependent and may not\n", "always work. The current implementation works well for Linux users, though this\n", "will likely change with Python 3.14 and higher. For Windows users, parallel\n", "solving does not work when called directly from Jupyter Notebooks or IPython.\n", "To use parallel solving on Windows, the following code should be used in a\n", "Python file. \n", "\n", "
\n", "\n", "```python\n", "from multiprocessing import freeze_support\n", "\n", "if __name__ == \"__main__\":\n", " freeze_support()\n", " pstore.apply(\"models\", some_func, parallel=True)\n", "```" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pastastore version : 1.12.0\n", "\n", "Python version : 3.13.11\n", "Pandas version : 2.3.3\n", "Matplotlib version : 3.10.8\n", "Pastas version : 1.12.0\n", "PyYAML version : 6.0.3\n", "\n" ] } ], "source": [ "import pastas as ps\n", "\n", "import pastastore as pst\n", "from pastastore.datasets import example_pastastore\n", "\n", "ps.set_log_level(\"ERROR\") # silence Pastas logger for this notebook\n", "pst.get_color_logger(\"INFO\", \"pastastore\")\n", "pst.show_versions()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example pastastore\n", "\n", "Load some example data, create models and solve them to showcase parallel processing." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mPasConnector: library 'oseries' created in '/home/david/github/pastastore/examples/notebooks/temp/my_connector/oseries'\u001b[0m\n", "\u001b[32mPasConnector: library 'stresses' created in '/home/david/github/pastastore/examples/notebooks/temp/my_connector/stresses'\u001b[0m\n", "\u001b[32mPasConnector: library 'models' created in '/home/david/github/pastastore/examples/notebooks/temp/my_connector/models'\u001b[0m\n", "\u001b[32mPasConnector: library 'oseries_models' created in '/home/david/github/pastastore/examples/notebooks/temp/my_connector/oseries_models'\u001b[0m\n", "\u001b[32mPasConnector: library 'stresses_models' created in '/home/david/github/pastastore/examples/notebooks/temp/my_connector/stresses_models'\u001b[0m\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "bed50aeb459b4fec8f1762a7ece2a9fd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Bulk creation models: 0%| | 0/5 [00:00 float:\n", " \"\"\"Compute the R-squared value of a Pastas model.\"\"\"\n", " ml = pstore.get_models(model_name)\n", " return ml.stats.rsq()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can apply this function to all models in the pastastore using `pstore.apply()`. \n", "By default this function is run sequentially. " ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "04e56cf5b3114507b23ea192b623b61b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Computing rsq: 0%| | 0/5 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rsqmae
head_nb50.4381290.318361
oseries20.9318830.087067
head_mw0.1593180.631517
oseries10.9044870.091329
oseries30.0304680.106254
\n", "" ], "text/plain": [ " rsq mae\n", "head_nb5 0.438129 0.318361\n", "oseries2 0.931883 0.087067\n", "head_mw 0.159318 0.631517\n", "oseries1 0.904487 0.091329\n", "oseries3 0.030468 0.106254" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstore.get_statistics([\"rsq\", \"mae\"])" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rsqmae
_get_statistics
head_nb50.4381290.318361
oseries20.9318830.087067
head_mw0.1593180.631517
oseries10.9044870.091329
oseries30.0304680.106254
\n", "
" ], "text/plain": [ " rsq mae\n", "_get_statistics \n", "head_nb5 0.438129 0.318361\n", "oseries2 0.931883 0.087067\n", "head_mw 0.159318 0.631517\n", "oseries1 0.904487 0.091329\n", "oseries3 0.030468 0.106254" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstore.get_statistics([\"rsq\", \"mae\"], parallel=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute prediction intervals\n", "\n", "Let's try using a more complex function and passing that to apply to use\n", "parallel processing. In this case we want to compute the prediction interval,\n", "and pass along the $\\alpha$ value via the keyword arguments." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "def prediction_interval(model_name, **kwargs):\n", " \"\"\"Compute the prediction interval for a Pastas model.\"\"\"\n", " ml = pstore.get_models(model_name)\n", " return ml.solver.prediction_interval(**kwargs)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8cb46f7fdcf04aa9b2b541fdea020318", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Computing prediction_interval: 0%| | 0/5 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
head_nb5oseries2head_mwoseries1oseries3
0.0250.9750.0250.9750.0250.9750.0250.9750.0250.975
1960-04-29NaNNaNNaNNaN6.3002049.406311NaNNaNNaNNaN
1960-04-30NaNNaNNaNNaN6.3401139.291056NaNNaNNaNNaN
1960-05-01NaNNaNNaNNaN6.1979639.459711NaNNaNNaNNaN
1960-05-02NaNNaNNaNNaN6.2564969.436107NaNNaNNaNNaN
1960-05-03NaNNaNNaNNaN6.1612019.336737NaNNaNNaNNaN
.................................
2020-01-177.9613499.682071NaNNaNNaNNaNNaNNaNNaNNaN
2020-01-187.9757189.576222NaNNaNNaNNaNNaNNaNNaNNaN
2020-01-197.9143769.647849NaNNaNNaNNaNNaNNaNNaNNaN
2020-01-207.9146639.710748NaNNaNNaNNaNNaNNaNNaNNaN
2020-01-217.9470209.636971NaNNaNNaNNaNNaNNaNNaNNaN
\n", "

21817 rows × 10 columns

\n", "" ], "text/plain": [ " head_nb5 oseries2 head_mw oseries1 \\\n", " 0.025 0.975 0.025 0.975 0.025 0.975 0.025 \n", "1960-04-29 NaN NaN NaN NaN 6.300204 9.406311 NaN \n", "1960-04-30 NaN NaN NaN NaN 6.340113 9.291056 NaN \n", "1960-05-01 NaN NaN NaN NaN 6.197963 9.459711 NaN \n", "1960-05-02 NaN NaN NaN NaN 6.256496 9.436107 NaN \n", "1960-05-03 NaN NaN NaN NaN 6.161201 9.336737 NaN \n", "... ... ... ... ... ... ... ... \n", "2020-01-17 7.961349 9.682071 NaN NaN NaN NaN NaN \n", "2020-01-18 7.975718 9.576222 NaN NaN NaN NaN NaN \n", "2020-01-19 7.914376 9.647849 NaN NaN NaN NaN NaN \n", "2020-01-20 7.914663 9.710748 NaN NaN NaN NaN NaN \n", "2020-01-21 7.947020 9.636971 NaN NaN NaN NaN NaN \n", "\n", " oseries3 \n", " 0.975 0.025 0.975 \n", "1960-04-29 NaN NaN NaN \n", "1960-04-30 NaN NaN NaN \n", "1960-05-01 NaN NaN NaN \n", "1960-05-02 NaN NaN NaN \n", "1960-05-03 NaN NaN NaN \n", "... ... ... ... \n", "2020-01-17 NaN NaN NaN \n", "2020-01-18 NaN NaN NaN \n", "2020-01-19 NaN NaN NaN \n", "2020-01-20 NaN NaN NaN \n", "2020-01-21 NaN NaN NaN \n", "\n", "[21817 rows x 10 columns]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstore.apply(\"models\", prediction_interval, kwargs={\"alpha\": 0.05})" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "af6ae1e2cc6e4e1198087958115b31ea", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Computing prediction_interval (parallel): 0%| | 0/5 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
head_nb5oseries2head_mwoseries1oseries3
0.0250.9750.0250.9750.0250.9750.0250.9750.0250.975
1960-04-29NaNNaNNaNNaN6.3187639.625481NaNNaNNaNNaN
1960-04-30NaNNaNNaNNaN6.1573549.441837NaNNaNNaNNaN
1960-05-01NaNNaNNaNNaN6.1785719.421986NaNNaNNaNNaN
1960-05-02NaNNaNNaNNaN6.2433359.478925NaNNaNNaNNaN
1960-05-03NaNNaNNaNNaN6.2377749.313179NaNNaNNaNNaN
.................................
2020-01-177.8603939.630678NaNNaNNaNNaNNaNNaNNaNNaN
2020-01-187.8749689.671061NaNNaNNaNNaNNaNNaNNaNNaN
2020-01-197.9130599.650184NaNNaNNaNNaNNaNNaNNaNNaN
2020-01-207.9562319.667919NaNNaNNaNNaNNaNNaNNaNNaN
2020-01-217.9270839.712015NaNNaNNaNNaNNaNNaNNaNNaN
\n", "

21817 rows × 10 columns

\n", "" ], "text/plain": [ " head_nb5 oseries2 head_mw oseries1 \\\n", " 0.025 0.975 0.025 0.975 0.025 0.975 0.025 \n", "1960-04-29 NaN NaN NaN NaN 6.318763 9.625481 NaN \n", "1960-04-30 NaN NaN NaN NaN 6.157354 9.441837 NaN \n", "1960-05-01 NaN NaN NaN NaN 6.178571 9.421986 NaN \n", "1960-05-02 NaN NaN NaN NaN 6.243335 9.478925 NaN \n", "1960-05-03 NaN NaN NaN NaN 6.237774 9.313179 NaN \n", "... ... ... ... ... ... ... ... \n", "2020-01-17 7.860393 9.630678 NaN NaN NaN NaN NaN \n", "2020-01-18 7.874968 9.671061 NaN NaN NaN NaN NaN \n", "2020-01-19 7.913059 9.650184 NaN NaN NaN NaN NaN \n", "2020-01-20 7.956231 9.667919 NaN NaN NaN NaN NaN \n", "2020-01-21 7.927083 9.712015 NaN NaN NaN NaN NaN \n", "\n", " oseries3 \n", " 0.975 0.025 0.975 \n", "1960-04-29 NaN NaN NaN \n", "1960-04-30 NaN NaN NaN \n", "1960-05-01 NaN NaN NaN \n", "1960-05-02 NaN NaN NaN \n", "1960-05-03 NaN NaN NaN \n", "... ... ... ... \n", "2020-01-17 NaN NaN NaN \n", "2020-01-18 NaN NaN NaN \n", "2020-01-19 NaN NaN NaN \n", "2020-01-20 NaN NaN NaN \n", "2020-01-21 NaN NaN NaN \n", "\n", "[21817 rows x 10 columns]" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstore.apply(\"models\", prediction_interval, kwargs={\"alpha\": 0.05}, parallel=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get signatures\n", "\n", "The function `pstore.get_signatures` does not explicitly support parallel processing but can be used in combination with `pstore.apply`" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "signatures = [\n", " \"cv_period_mean\",\n", " \"cv_date_min\",\n", " \"cv_date_max\",\n", " \"cv_fall_rate\",\n", " \"cv_rise_rate\",\n", "]" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
head_nb5oseries2head_mwoseries1oseries3
cv_period_mean0.0618790.0151990.1450620.0130660.029168
cv_date_min0.2460210.1286360.2546270.1458841.394852
cv_date_max1.2624250.7229451.0839290.3003280.444442
cv_fall_rate-1.136450-0.722718-1.430200-0.744797-1.032837
cv_rise_rate1.2594500.8366781.0972570.8629810.931181
\n", "
" ], "text/plain": [ " head_nb5 oseries2 head_mw oseries1 oseries3\n", "cv_period_mean 0.061879 0.015199 0.145062 0.013066 0.029168\n", "cv_date_min 0.246021 0.128636 0.254627 0.145884 1.394852\n", "cv_date_max 1.262425 0.722945 1.083929 0.300328 0.444442\n", "cv_fall_rate -1.136450 -0.722718 -1.430200 -0.744797 -1.032837\n", "cv_rise_rate 1.259450 0.836678 1.097257 0.862981 0.931181" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstore.get_signatures(signatures=signatures)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "674d9be95e7e41d4b698cdbb8266b2c5", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Computing get_signatures (parallel): 0%| | 0/5 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
get_signatureshead_nb5oseries2head_mwoseries1oseries3
cv_period_mean0.0618790.0151990.1450620.0130660.029168
cv_date_min0.2460210.1286360.2546270.1458841.394852
cv_date_max1.2624250.7229451.0839290.3003280.444442
cv_fall_rate-1.136450-0.722718-1.430200-0.744797-1.032837
cv_rise_rate1.2594500.8366781.0972570.8629810.931181
\n", "" ], "text/plain": [ "get_signatures head_nb5 oseries2 head_mw oseries1 oseries3\n", "cv_period_mean 0.061879 0.015199 0.145062 0.013066 0.029168\n", "cv_date_min 0.246021 0.128636 0.254627 0.145884 1.394852\n", "cv_date_max 1.262425 0.722945 1.083929 0.300328 0.444442\n", "cv_fall_rate -1.136450 -0.722718 -1.430200 -0.744797 -1.032837\n", "cv_rise_rate 1.259450 0.836678 1.097257 0.862981 0.931181" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstore.apply(\n", " \"oseries\", pstore.get_signatures, kwargs={\"signatures\": signatures}, parallel=True\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load models\n", "\n", "Load models in parallel." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "95e57bc33e1645beab06a844183063d3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Computing get_models: 0%| | 0/5 [00:00\n", "Note\n", "\n", "This section is mostly for the developer so he doesn't forget why and how \n", "delayed updating of the model links was implemented.\n", "\n", "\n", "We want to build and solve our time series models in 2 steps, first without a noise\n", "model and then with a noise model, and then store the result. We empty the models\n", "library to start from scratch." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mEmptied library models in my_connector: \u001b[0m\n", "\u001b[32mEmptied library oseries_models in my_connector: \u001b[0m\n", "\u001b[32mEmptied library stresses_models in my_connector: \u001b[0m\n" ] } ], "source": [ "pstore.empty_library(\"models\", prompt=False, progressbar=False)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "def two_step_solve(name):\n", " \"\"\"Solve a Pastas model in two steps: first without noise model, then with.\"\"\"\n", " ml = pstore.create_model(name)\n", " ml.solve(report=False)\n", " ml.add_noisemodel(ps.ArNoiseModel())\n", " ml.solve(initial=False, report=False)\n", " pstore.add_model(ml, overwrite=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the first example we apply the function in parallel. A separate recomputation is\n", "performed after the parallel apply to update the links between the time series names\n", "and the models.\n", "\n", "The `conn._added_models` keeps track of added models so that the model links can be\n", "updated after all models have been added. In parallel mode, the child processes do not\n", "have access to this variable in the main thread, meaning it is not updated." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstore.conn._added_models" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a0c23ad5e26b4d9cb9533da49443e4ce", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Computing two_step_solve (parallel): 0%| | 0/5 [00:00\u001b[0m\n", "\u001b[32mEmptied library oseries_models in my_connector: \u001b[0m\n", "\u001b[32mEmptied library stresses_models in my_connector: \u001b[0m\n" ] } ], "source": [ "pstore.empty_library(\"models\", prompt=False, progressbar=False)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "fbaf40a90eb147f7a87bc1f07f4dbb32", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Computing two_step_solve: 0%| | 0/5 [00:00