Feat/strided dataset for torch and regression models #2624

madtoinou · 2024-12-19T14:49:55Z

Checklist before merging this PR:

Mentioned all issues that this PR fixes or addresses.
Summarized the updates of this PR under Summary.
Added an entry under Unreleased in the Changelog.

Fixes #2621, fixes #1064, fixes #940, fixes #1487.

Summary

added the stride argument to the torch models datasets and the tabularization methods
updated the tests accordingly

Other Information

As discussed offline, using limit_train_batches with max_samples_per_ts=None should allow to obtain uniformly selected training samples when using torch-based models. Changing the way the samples are retained when max_samples_per_ts!=None is hence not a priority and will be tackled in a separate PR.

codecov · 2024-12-19T15:15:08Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.19%. Comparing base (b441192) to head (c375832).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2624      +/-   ##
==========================================
- Coverage   94.24%   94.19%   -0.05%     
==========================================
  Files         141      141              
  Lines       15466    15475       +9     
==========================================
+ Hits        14576    14577       +1     
- Misses        890      898       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dennisbader

Thanks @madtoinou, this looks very good already 🚀

I left some suggestions, mainly regarding:

add support to TorchForecastingModel.fit()
we should extract samples starting from the end of the series in both cases (TFM already does that, RegressionModel not yet)
add support to HorizonBasedDataset

dennisbader · 2024-12-31T11:52:18Z

darts/models/forecasting/regression_model.py

+        stride
+            The number of time steps between consecutive entries.


We could try to make this a bit more informative (and everywhere else where it's documented).

let's not add another wording like "entries" (we already have samples, osbservations, ...).

make more clear that we create the training set by extracting samples with stride from the series

does it take from start or end?

Maybe mention that this should be used with caution regarding predict, historical forecasts, ... that one should apply it only on the same stridden scenario

darts/utils/data/shifted_dataset.py

dennisbader · 2024-12-31T12:28:51Z

darts/utils/data/tabularization.py

@@ -576,6 +585,7 @@ def create_lagged_prediction_data(
    check_inputs: bool = True,
    use_moving_windows: bool = True,
    concatenate: bool = True,
+    stride: int = 1,


should this be supported for prediction?

I don't think that it should be supported in predict() as it might cause a lot of issues but I need it in create_lagged_prediction_data() for the optimized historical forecast with auto-regression.

Ahh like roll_size for the torch models @madtoinou? If so then I would use that name here as well :)

No no, here, I need to be able to generate a strided prediction dataset for historical forecasts in order to predict all the horizons at once. The rolling is going to happen in another function.

darts/models/forecasting/regression_model.py

darts/utils/data/tabularization.py

dennisbader · 2024-12-31T13:37:02Z

CHANGELOG.md

@@ -12,6 +12,7 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
 **Improved**

 - New model: `StatsForecastAutoTBATS`. This model offers the [AutoTBATS](https://nixtlaverse.nixtla.io/statsforecast/src/core/models.html#autotbats) model from Nixtla's `statsforecasts` library. [#2611](https://github.com/unit8co/darts/pull/2611) by [He Weilin](https://github.com/cnhwl).
+- Added a `stride` argument to the `Dataset` classes (torch-based models) and the fitting methods of the `RegressionModels` to reduce the size of the training set or apply elaborate training approaches. [#2624](https://github.com/unit8co/darts/pull/2529) by [Antoine Madrona](https://github.com/madtoinou)


I mention somewhere below that it would be good to add stride also to TorchForecastingModel.predict(). Once that is done, we should update the changelog here

darts/tests/datasets/test_datasets.py

darts/tests/utils/tabularization/test_create_lagged_training_data.py

madtoinou added 6 commits December 18, 2024 14:26

feat: adding stride to shifted_dataset

aaaa234

feat: adding stride to the sequential dataset

dec1636

feat: add striding to tabularization

56b53ed

feat: add stride to RegressioModel fit API

9847469

fix: bug in striding implementation

d055824

feat: updating the tabularization tests

67fd550

madtoinou requested a review from dennisbader as a code owner December 19, 2024 14:49

fix: bug

d615552

madtoinou and others added 6 commits December 19, 2024 16:58

Merge branch 'master' into feat/strided_dataset

1162c2c

feat: adding test for stride in torch datasets

9ff31a4

update changelog

69fbc47

fix: missing test and small bug

911b8c5

Merge branch 'master' into feat/strided_dataset

7c3e5a2

update changelog

e8ac91d

dennisbader requested changes Dec 31, 2024

View reviewed changes

madtoinou and others added 8 commits December 31, 2024 15:02

Merge branch 'master' into feat/strided_dataset

c375832

doc: update some docstrings

198edd5

feat: stride is now applied from the end of the series

a38b9f5

feat: added stride support to the horizon based dataset

c546991

feat: updated unit tests

dd8ab50

feat: adding support of stride in the fit method of torch models

2fc77e0

fix: update the test so that the stride is applied from the end

1a7e42d

Merge branch 'master' into feat/strided_dataset

ad6bd31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/strided dataset for torch and regression models #2624

Feat/strided dataset for torch and regression models #2624

madtoinou commented Dec 19, 2024 •

edited

Loading

codecov bot commented Dec 19, 2024 •

edited

Loading

dennisbader left a comment

dennisbader Dec 31, 2024

dennisbader Dec 31, 2024

madtoinou Jan 6, 2025

dennisbader Jan 6, 2025

madtoinou Jan 7, 2025

dennisbader Dec 31, 2024

Feat/strided dataset for torch and regression models #2624

Are you sure you want to change the base?

Feat/strided dataset for torch and regression models #2624

Conversation

madtoinou commented Dec 19, 2024 • edited Loading

Summary

Other Information

codecov bot commented Dec 19, 2024 • edited Loading

Codecov Report

dennisbader left a comment

Choose a reason for hiding this comment

dennisbader Dec 31, 2024

Choose a reason for hiding this comment

dennisbader Dec 31, 2024

Choose a reason for hiding this comment

madtoinou Jan 6, 2025

Choose a reason for hiding this comment

dennisbader Jan 6, 2025

Choose a reason for hiding this comment

madtoinou Jan 7, 2025

Choose a reason for hiding this comment

dennisbader Dec 31, 2024

Choose a reason for hiding this comment

madtoinou commented Dec 19, 2024 •

edited

Loading

codecov bot commented Dec 19, 2024 •

edited

Loading