PEtab-dev · dilpath · Feb 13, 2025 · Feb 13, 2025 · Feb 13, 2025 · Feb 13, 2025
diff --git a/doc/examples.rst b/doc/examples.rst
@@ -1,12 +1,19 @@
 Examples
 ========
 
-Various example notebooks.
+There are currently notebooks aimed at users and developers.
+
+Users that are working with PEtab problems should consult the model selection documentation of the calibration tool they wish to use. In this case, there is no need to use the PEtab Select package directly to perform model selection. After model selection, results can be analyzed with PEtab Select. See the API documentation for :mod:`petab_select.analyze`, or the "Visualization gallery" notebook.
+
+Users who wish to apply model selection methods using a calibration tool that doesn't support PEtab can refer to the "Model selection with non-SBML models" notebook for a demonstration using statsmodels.
+
+Developers wishing to implement model selection in their tool via PEtab Select can consult the notebooks on workflows ("Example usage with the CLI" and "Example usage with Python 3") or "FAMoS in PEtab Select".
 
 .. toctree::
    :maxdepth: 1
 
-   examples/example_cli_famos.ipynb
    examples/visualization.ipynb
+   examples/other_model_types.ipynb
+   examples/example_cli_famos.ipynb
    examples/workflow_cli.ipynb
    examples/workflow_python.ipynb
diff --git a/doc/examples/README.md b/doc/examples/README.md
@@ -1 +1 @@
-These notebooks need to be run to see the output. Pre-computed output can be viewed in the documentation, at https://petab-select.readthedocs.io/en/stable/examples.html
+These notebooks need to be run to see the output. Pre-computed output and further information about the notebooks can be viewed in the documentation, at https://petab-select.readthedocs.io/en/stable/examples.html
diff --git a/doc/examples/other_model_types.ipynb b/doc/examples/other_model_types.ipynb
@@ -0,0 +1,300 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2e9d5661-b193-4eb1-905c-9b54387933c1",
+   "metadata": {},
+   "source": [
+    "# Model selection with non-SBML models\n",
+    "\n",
+    "PEtab Select is currently designed for use with PEtab v1, which only supports SBML for model specification. However, the SBML model is irrelevant to PEtab Select, which only reads the parameter table of a model's PEtab problem to determine the estimated parameters (e.g. to accurately compute criteria values).\n",
+    "\n",
+    "In this notebook, we demonstrate model selection with a modeling framework that currently does not support PEtab problems (specifically, [statsmodels](https://www.statsmodels.org/stable/index.html)). Since statsmodels is a Python package, we use the Python interface of PEtab Select. Non-Python frameworks can be used with the PEtab Select command-line interface instead. Synthetic data was generated from the model\n",
+    "$$y(t) = 20t + 10t^2 + 5t^3 + \\epsilon$$\n",
+    "where $\\epsilon$ is drawn from the standard normal distribution and represents measurement noise. The hypothetical scenario is that you are provided $(t,y)$ data pairs and asked to identify the powers of $t$ that are evidenced by the data (i.e., that were used to generate the data). Here, we only consider linear regression models involving terms up to $t^5$, i.e., the superset model is\n",
+    "$$y(t) = \\sum_{i=0}^5 k_i t^i,$$\n",
+    "where $k_i$ is the $i$th coefficient in the linear regression model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2a20814-e11d-4d65-9987-b84a50abf59d",
+   "metadata": {},
+   "source": [
+    "## \"True\" model\n",
+    "\n",
+    "We start by loading the data and demonstrating model calibration of the true model using statsmodels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6889e520-e13f-4b1c-8dc4-43226e37d165",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Based on the statsmodels \"Ordinary Least Squares\" example\n",
+    "# https://www.statsmodels.org/stable/examples/notebooks/generated/ols.html\n",
+    "\n",
+    "import numpy as np\n",
+    "import petab\n",
+    "import statsmodels.api as sm\n",
+    "from petab import MEASUREMENT, TIME\n",
+    "\n",
+    "import petab_select\n",
+    "from petab_select.constants import (\n",
+    "    CANDIDATE_SPACE,\n",
+    "    ESTIMATE,\n",
+    "    TERMINATE,\n",
+    "    UNCALIBRATED_MODELS,\n",
+    "    Criterion,\n",
+    ")\n",
+    "\n",
+    "# Data was generated with the formula (see next cell to regenerate data)\n",
+    "# y(t) = 20*t + 10*t^2 + 5*t^3\n",
+    "df = petab.get_measurement_df(\n",
+    "    \"other_model_types_problem/petab/measurements.tsv\"\n",
+    ")\n",
+    "t = df[TIME].to_numpy()\n",
+    "y = df[MEASUREMENT].to_numpy()\n",
+    "\n",
+    "# Show statsmodels fit for the \"true\" model\n",
+    "true_T = np.column_stack([t**1, t**2, t**3])\n",
+    "print(sm.OLS(y, true_T).fit().summary())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d14b4c55-2494-4fdc-a45e-e2983e6bf838",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Uncomment to regenerate data\n",
+    "\n",
+    "# import numpy as np\n",
+    "# rng = np.random.default_rng(seed=0)\n",
+    "\n",
+    "# # Measurement time points\n",
+    "# t = np.linspace(0, 10, 101)\n",
+    "# # The first three powers of x are used to generate the data\n",
+    "# T = np.column_stack([t**1, t**2, t**3])\n",
+    "# K = np.array([20, 10, 5])\n",
+    "# noise = rng.normal(size=t.size)\n",
+    "# y = np.dot(T, K) + noise\n",
+    "\n",
+    "# # Write PEtab measurement table\n",
+    "# lines = [\"observableId\\tsimulationConditionId\\ttime\\tmeasurement\"]\n",
+    "# for t_, y_ in zip(t, y, strict=True):\n",
+    "#     lines.append(f\"y\\tdefault\\t{round(t_,1)}\\t{round(y_,5)}\")\n",
+    "# content = \"\\n\".join(lines)\n",
+    "# with open(\"other_model_types_problem/petab/measurements.tsv\", \"w\") as f:\n",
+    "#     f.write(content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c566b1b7-d01b-4964-b7f6-31cb9f4e4acc",
+   "metadata": {},
+   "source": [
+    "## Model selection\n",
+    "\n",
+    "To perform model selection, we define three methods to:\n",
+    "\n",
+    "1. calibrate a single model\n",
+    "\n",
+    "    This is where we convert a PEtab Select model into a statsmodels model, fit the model with statsmodels, then save the log-likelihood value in the PEtab Select model.\n",
+    "\n",
+    "2. perform a single iteration of a model selection method involving calibration of multiple models\n",
+    "\n",
+    "    This is generic code that executes the required PEtab Select commands.\n",
+    "\n",
+    "3. perform a full model selection run, involving all iterations of a model selection method\n",
+    "\n",
+    "    This loads the PEtab Select problem from disk and performs all iterations of a model selection method.\n",
+    "\n",
+    "In this case, the PEtab Select problem is setup to perform backward selection ([see the specification files for this example](https://github.com/PEtab-dev/petab_select/tree/main/doc/examples/other_model_types_problem/petab_select))."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89fb34cb-3b78-4b89-b58f-78e3f90414c1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def calibrate_model(model: petab_select.Model, t=t, y=y):\n",
+    "    # Convert the PEtab Select model into a statsmodel model\n",
+    "    powers = [\n",
+    "        int(parameter_id[1:])\n",
+    "        for parameter_id, parameter_value in model.parameters.items()\n",
+    "        if parameter_value == ESTIMATE\n",
+    "    ]\n",
+    "    if not powers:\n",
+    "        return\n",
+    "    T = np.column_stack([t**i for i in powers])\n",
+    "    sm_model = sm.OLS(y, T)\n",
+    "\n",
+    "    # Calibrate\n",
+    "    results = sm_model.fit()\n",
+    "\n",
+    "    # Save the log-likelihood\n",
+    "    model.set_criterion(criterion=Criterion.LLH, value=results.llf)\n",
+    "\n",
+    "\n",
+    "def perform_iteration(\n",
+    "    problem: petab_select.Problem, candidate_space: petab_select.CandidateSpace\n",
+    "):\n",
+    "    # Start the iteration, request models according to the model selection method.\n",
+    "    iteration = petab_select.ui.start_iteration(\n",
+    "        problem=problem, candidate_space=candidate_space\n",
+    "    )\n",
+    "\n",
+    "    # Calibrate each model.\n",
+    "    models = iteration[UNCALIBRATED_MODELS]\n",
+    "    for model in models:\n",
+    "        if all(\n",
+    "            parameter_value != ESTIMATE\n",
+    "            for parameter_value in model.parameters.values()\n",
+    "        ):\n",
+    "            del models[model.hash]\n",
+    "            continue\n",
+    "        calibrate_model(model=model)\n",
+    "        # Compute the criterion based on the log-likelihood computed by statsmodels.\n",
+    "        model.compute_criterion(criterion=problem.criterion)\n",
+    "\n",
+    "    # End the iteration and return the results.\n",
+    "    return petab_select.ui.end_iteration(\n",
+    "        problem=problem,\n",
+    "        candidate_space=iteration[CANDIDATE_SPACE],\n",
+    "        calibrated_models=models,\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "def run_model_selection():\n",
+    "    # Load the PEtab Select problem.\n",
+    "    problem = petab_select.Problem.from_yaml(\n",
+    "        \"other_model_types_problem/petab_select/problem.yaml\"\n",
+    "    )\n",
+    "    candidate_space = None\n",
+    "\n",
+    "    # Perform all iterations of the model selection method until the termination signal is emitted.\n",
+    "    while True:\n",
+    "        iteration_results = perform_iteration(\n",
+    "            problem=problem, candidate_space=candidate_space\n",
+    "        )\n",
+    "        if iteration_results[TERMINATE]:\n",
+    "            break\n",
+    "        candidate_space = iteration_results[CANDIDATE_SPACE]\n",
+    "\n",
+    "    # Return the problem, which contains the calibrated models.\n",
+    "    return problem"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "93ffcec5-3143-4906-87a8-acf60c541ce4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Run the model selection method.\n",
+    "problem = run_model_selection()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a93ab68f-7831-40bd-82b0-d365cca8008c",
+   "metadata": {},
+   "source": [
+    "## Analysis\n",
+    "\n",
+    "As when using a PEtab-compatible calibration tool, the results from calibration with statsmodels can also be analyzed with PEtab Select.\n",
+    "\n",
+    "Firstly, we get the best model identified during model selection. It matches the \"true\" model, with the same AIC as computed by statsmodels at the top of this notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ba3a58af-8b13-4525-812a-9c7708292bc7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "best_model = petab_select.analyze.get_best(\n",
+    "    models=problem.state.models, criterion=problem.criterion\n",
+    ")\n",
+    "print(f\"\"\"\n",
+    "Best model information\n",
+    "----------------------\n",
+    "PEtab Select model hash: {best_model.hash}\n",
+    "Selected powers of t: {[int(parameter_id[1:]) for parameter_id, parameter_value in best_model.parameters.items() if parameter_value == ESTIMATE]}\n",
+    "Log-Likelihood: {best_model.get_criterion(criterion=Criterion.LLH)}\n",
+    "AIC: {best_model.get_criterion(criterion=Criterion.AIC)}\n",
+    "\"\"\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "92857455-d8be-4bde-adc6-e6ab8de27748",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Plot AIC of all models that were calibrated.\n",
+    "\n",
+    "import petab_select.plot\n",
+    "\n",
+    "plot_data = petab_select.plot.PlotData(\n",
+    "    models=problem.state.models, criterion=problem.criterion\n",
+    ")\n",
+    "\n",
+    "petab_select.plot.scatter_criterion_vs_n_estimated(plot_data=plot_data);"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c0a4f6f-9099-4e22-a589-80cebaf466cc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Plot the history of model selection iterations.\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "draw_networkx_kwargs = {\n",
+    "    \"arrowstyle\": \"-|>\",\n",
+    "    \"node_shape\": \"s\",\n",
+    "    \"edgecolors\": \"k\",\n",
+    "}\n",
+    "fig, ax = plt.subplots(figsize=(20, 20))\n",
+    "petab_select.plot.graph_iteration_layers(\n",
+    "    plot_data=plot_data,\n",
+    "    draw_networkx_kwargs=draw_networkx_kwargs,\n",
+    "    ax=ax,\n",
+    ");"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/examples/other_model_types_problem/petab/conditions.tsv b/doc/examples/other_model_types_problem/petab/conditions.tsv
@@ -0,0 +1,2 @@
+conditionId
+default
diff --git a/doc/examples/other_model_types_problem/petab/dummy.xml b/doc/examples/other_model_types_problem/petab/dummy.xml
@@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<sbml xmlns="http://www.sbml.org/sbml/level3/version2/core" level="3" version="2">
+<model/>
+</sbml>
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		These notebooks need to be run to see the output. Pre-computed output can be viewed in the documentation, at https://petab-select.readthedocs.io/en/stable/examples.html
		These notebooks need to be run to see the output. Pre-computed output and further information about the notebooks can be viewed in the documentation, at https://petab-select.readthedocs.io/en/stable/examples.html