Add troubleshooting doc (#368)

CliMA · Apr 25, 2024 · 1b7489e · 1b7489e
1 parent 8cdeeab
commit 1b7489e
Show file tree

Hide file tree

Showing 5 changed files with 36 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -36,3 +36,8 @@ If you use the examples or code, please cite our article at JOSS in your publish
 ### Requirements
 Julia version 1.6+ 
 
+### Getting Started 
+![eki-getting-started](https://github.com/CliMA/EnsembleKalmanProcesses.jl/assets/45243236/e083ab8c-4f93-432f-9ad5-97aff22764ad)
+<!---
+# Link to Miro for editing photo (ask haakon for access): https://miro.com/app/board/uXjVNm_1teY=/?share_link_id=329380184889  
+-->
diff --git a/docs/make.jl b/docs/make.jl
@@ -84,6 +84,7 @@ pages = [
     "Inflation" => "inflation.md",
     "Parallelism and HPC" => "parallel_hpc.md",
     "Observations" => "observations.md",
+    "Troubleshooting" => "troubleshooting.md",
     "API" => api,
     "Contributing" => "contributing.md",
     "Glossary" => "glossary.md",

diff --git a/docs/src/inflation.md b/docs/src/inflation.md
@@ -1,4 +1,4 @@
-# Inflation 
+# [Inflation](@id inflation)
 Inflation is an approach that slows down collapse in ensemble Kalman methods.
 Two distinct forms of inflation are implemented in this package. Both involve perturbing the ensemble members following the standard update rule of the chosen Kalman process.
 Multiplicative inflation expands ensemble members away from their mean in a

diff --git a/docs/src/localization.md b/docs/src/localization.md
@@ -1,4 +1,4 @@
-# Localization and Sampling Error Correction (SEC)
+# [Localization and Sampling Error Correction (SEC)](@id localization)
 
 Ensemble Kalman inversion (EKI) seeks to find an optimal parameter vector ``\theta \in \mathbb{R}^p`` by minimizing the mismatch between some data ``y \in \mathbb{R}^d`` and the forward model output ``\mathcal{G}(\theta) \in \mathbb{R}^d``. Instead of relying on derivatives of the map ``\mathcal{G}`` with respect to ``\theta`` to find the optimum, EKI leverages sample covariances ``\mathrm{Cov}(\theta, \mathcal{G})`` and  ``\mathrm{Cov}(\mathcal{G}, \mathcal{G})`` diagnosed from an ensemble of ``J`` particles,
 

diff --git a/docs/src/troubleshooting.md b/docs/src/troubleshooting.md
@@ -0,0 +1,28 @@
+# Troubleshooting and Workflow Tips
+
+## High failure rate
+
+While some EKI variants include failure handlers, excessively high failure rates (i.e., > 80%) can lead to inversions finding local minima or failing to converge. To address this:
+
+- **Stabilize the Forward Model**: Ensure the forward model remains stable for small parameter perturbations in offline tests. If the forward model is unstable for most of the parameter space, it is challenging to explore it with a calibration method.
+- **Adjust Priors**: Reduce the uncertainty in priors. Priors with large variances can lead to forward evaluations that deviate significantly from the known prior means, increasing the likelihood of failures.
+- **Increase Ensemble Size**: Without [localization](@ref localization) or other methods that break the subspace property, the ensemble size should generally exceed the number of parameters being optimized. The ensemble size needs to be large enough to ensure a sufficient number of successful runs, given the failure rate.
+- **Consider a Preconditioner**: While not currently a native feature in EKP, consider using a preconditioning method to find stable parameter pairs before the first iteration. A preconditioner, applied in each ensemble member, recursively draws from the prior distribution until a stable parameter pair is achieved. The successful parameter pairs serve as the parameter values for the first iteration. Depending on the stability of the forward model,this may need to be as high as 5-10 retries.
+- **Implement Parameter Inflation**: High failure rates in the initial iterations can lead to rapid collapse of ensemble members. Prevent the ensemble from collapsing prematurely by adding parameter inflation. For more, see [inflation](@ref inflation)
+
+## Loss does not converge or final fits are inadequate
+
+If either the loss decreases too slowly/diverges or the final fits appear inadequate:
+
+- **Check Observation Noise in Data Space**: Ensure that noise estimates are realistic and consistent across variables with different dimensions and variability characteristics. Observation noise that is unrealistically large for a given variable or data point may prevent convergence to solutions that closely fit the data. Carefully base noise estimates on empirical data or domain knowledge, and try reducing noise if the previous suggestions don’t work. This is especially common if using ``\sigma^2 * I`` as artificial noise. Even if ``u`` appears incorrect, it is advisable to examine the graphs of ``G(u)`` compared to  ``y \pm 2\sigma`` to determine if the forward map lies within the noise level. If it does, further convergence cannot be achieved without reducing the noise or altering the loss function.
+- **Check for Failures**: Refer to the suggestions for handling a high failure rate.
+- **Adjust the Artificial Timestep**: For indirect learning problems involving neural networks, larger timesteps [O(10)] are generally more effective and using variable timesteppers (e.g., `DataMisfitController()`) tends to yield the best results. For scheduler options, see [scheduler](@ref learning-rate-schedulers) docs.
+- **If Batching, Increase Batch Size**: While not natively implemented within the EKP package, users employing mini-batching (using subsamples of the full dataset in each EKI iteration) should consider modifying the batch size. If the loss is too noisy and convergence is slow, consider increasing the batch size. See [inflation](@ref inflation) if using mini-batching with inflation. 
+- **Reevaluate Loss Function**: Consider exploring alternative loss functions with different variables.
+- **Structural Model Errors**: If these troubleshooting tips do not work, remaining discrepancies might suggest inherent structural errors between the model and the data, which could lead to trade-offs in parameter estimation. Modifications may be needed to the underlying forward model. 
+
+## I have a model ``\Psi``. But how do I design my forward map G?
+- Ensure prior means are chosen appropriately, and that any hard [constraints](@ref parameter-distributions) (i.e., parameter values must be positive) are enforced.
+- Start with a perfect model experiment, where an attempt is made to recover known parameter values in ``\Psi`` through calibration, to learn about what outputs from ``\Psi`` are sensitive to the parameters.
+- For time-evolving systems, consider aggregating data through spatial or temporal averaging, rather than using the full timeseries. 
+- Find out which observational data are available for the problem at hand, and what observational noise is provided for measuring instruments.