[BaseMultivariate] OutOfMemoryError on GPU #1202

Antoine-Schwartz · 2024-11-15T17:35:40Z

What happened + What you expected to happen

Hello Nixtla community,

I suspect a bug, or a sampling optimization problem, especially for BaseMultivariate models!

First of all, the number of additional columns in df (exogenous variables) has a strong impact on memory demand, even if they are not taken into account by the model (see the example below with TSMixer). I think this problem also exists for univariate solutions, but the impact is less exponential than for multivariate ones.

Secondly, even if you keep only the necessary columns, it's still hard for the samples to fit into GPU memory during training when you have tens of thousands series.
I know that multivariate scales badly with a number of series, but it seems feasible if I refer to TSMixer's paper: experiments on M5 data (30,490 series with static features) with an NVIDIA Tesla V100 GPU.

Versions / Dependencies

neuralforecast==1.7.5
Databricks Runtime: 14.3 LTS ML
GPU: g5.8xlarge

Reproduction script

n_series = 30000

# Add columns to df (static features) until memory crash
for n_static_features in range(0, 10):

    print(f"====== Nb of static feat: {n_static_features} ======")

    df = generate_series(
        n_series=n_series,
        freq="W",
        min_length=208,
        max_length=208,
        equal_ends=True,
        n_static_features=n_static_features,
    )

    model = TSMixer(
        h=52,
        input_size=104,
        n_series=n_series, 
        max_steps=60,
        val_check_steps=60,
        enable_model_summary=False,
    )
    nf = NeuralForecast(models=[model], freq='W')

    nf.fit(df=df)

    del df, model, nf
    gc.collect()
    torch.cuda.empty_cache()
    time.sleep(60)

Crash from 4 static features:
OutOfMemoryError: CUDA out of memory. Tried to allocate 10.88 GiB. GPU 0 has a total capacity of 21.99 GiB of which 10.24 GiB is free. Process 31599 has 11.74 GiB memory in use. Of the allocated memory 949.35 MiB is allocated by PyTorch, and 10.52 GiB is reserved by PyTorch but unallocated.

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

elephaint · 2024-11-18T19:29:04Z

The following code trains just fine on max 14 GB GPU mem on my RTX 3090, meaning that it should run fine on any V100 too.

You should avoid looping through tryouts, even with all the gc and empty_cache shenanigans it's nearly impossible to properly clear the memory.

from neuralforecast.models import TSMixer
from utilsforecast.data import generate_series
import torch

n_series = 30490
n_static_features = 10

df = generate_series(
    n_series=n_series,
    freq="W",
    min_length=208,
    max_length=208,
    equal_ends=True,
    n_static_features=0,
)

model = TSMixer(
    h=52,
    input_size=104,
    n_series=n_series, 
    max_steps=60,
    val_check_steps=60,
    enable_model_summary=False,

)
nf = NeuralForecast(models=[model], freq='W')

nf.fit(df=df)

Antoine-Schwartz · 2024-11-19T09:36:08Z

Hello, thanks @elephaint for you answer.

In you example you still have n_static_features=0 in the generate_seriescall. Does it really run on 14GB of your RTX 3090 with 10 features?

You're right, the combo gc + empty_cache isn't perfect, but it does allow you to get very close to 0 memory usage for testing, as shown by the graph above :)

elephaint · 2024-11-19T15:22:53Z

Argh, I was stupid. TSMixer doesn't even support exogenous variables, so you're starting with the wrong model. You should use TSMixerx and use the correct way of incorporating exogenous variables.

The following runs with a very low mem cost on my GPU:

from neuralforecast import NeuralForecast
from neuralforecast.models import TSMixerx
from utilsforecast.data import generate_series

n_series = 30490
n_static_features = 10

df = generate_series(
    n_series=n_series,
    freq="W",
    min_length=208,
    max_length=208,
    equal_ends=True,
    n_static_features=10,
)

static_df = df.groupby("unique_id", as_index=False).first().drop(columns=["y", "ds"])

model = TSMixerx(
    h=52,
    input_size=104,
    n_series=n_series, 
    max_steps=5,
    val_check_steps=60,
    enable_model_summary=False,
    stat_exog_list=[f"static_{i}" for i in range(n_static_features)],
)
nf = NeuralForecast(models=[model], freq='W')

nf.fit(df=df, static_df=static_df)

This should close the issue, re-open if required.

Antoine-Schwartz · 2024-11-19T16:54:00Z

Hello again @elephaint, sorry perhaps I didn't express myself clearly enough in my first message.

I'm well aware that TSMixer (without x) doesn't support covariates, however my example above served to demonstrate that even though static columns are not used, they still have an impact on the total memory required during training.
I re-run your code with TSMixerx, and no surprise: OutOfMemoryError: CUDA out of memory. Tried to allocate 22.12 GiB. GPU 0 has a total capacity of 21.99 GiB of which 20.57 GiB is free.
Could this be a set-up problem with Databricks GPUs, or more generally with NVIDIA A10G?

Thanks in advance!

jmoralez · 2024-11-19T17:39:59Z

I think these are indeed two issues.

I think we currently build the dataset with every column in the dataframe without checking if they're actually used by the models, we can probably be smarter than this and filter it first to keep only the features that will be used.

Does the error happen here?

neuralforecast/neuralforecast/common/_base_multivariate.py

Line 169 in 642ced4

windows = windows[:, :, final_condition, :]

The windows are a view so they don't consume memory until we materialize them, we can try keeping them as a view and just materialize the sample that is taken below

neuralforecast/neuralforecast/common/_base_multivariate.py

Lines 179 to 187 in 642ced4

    
           # Sample windows 
        
           n_windows = windows.shape[2] 
        
           if self.batch_size is not None: 
        
               w_idxs = np.random.choice( 
        
                   n_windows, 
        
                   size=self.batch_size, 
        
                   replace=(n_windows < self.batch_size), 
        
               ) 
        
               windows = windows[:, :, w_idxs, :]

Antoine-Schwartz · 2024-11-20T14:35:35Z

Exactly @jmoralez, for the first issue it's actually what I suspected, for the 2nd it's indeed at this line that the error happens.
However, I don't understand how the code can pass on a 14GB GPU.

I'll try running it on other types of graphics card...

Edit: Same issue with NVIDIA T4 on Google Collab

Antoine-Schwartz added the bug label Nov 15, 2024

elephaint added the awaiting response label Nov 18, 2024

github-actions bot removed the awaiting response label Nov 19, 2024

elephaint added awaiting response and removed bug labels Nov 19, 2024

elephaint closed this as completed Nov 19, 2024

github-actions bot removed the awaiting response label Nov 19, 2024

github-actions bot reopened this Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BaseMultivariate] OutOfMemoryError on GPU #1202

[BaseMultivariate] OutOfMemoryError on GPU #1202

Antoine-Schwartz commented Nov 15, 2024

elephaint commented Nov 18, 2024

Antoine-Schwartz commented Nov 19, 2024

elephaint commented Nov 19, 2024 •

edited

Loading

Antoine-Schwartz commented Nov 19, 2024

jmoralez commented Nov 19, 2024

Antoine-Schwartz commented Nov 20, 2024 •

edited

Loading

[BaseMultivariate] OutOfMemoryError on GPU #1202

[BaseMultivariate] OutOfMemoryError on GPU #1202

Comments

Antoine-Schwartz commented Nov 15, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

elephaint commented Nov 18, 2024

Antoine-Schwartz commented Nov 19, 2024

elephaint commented Nov 19, 2024 • edited Loading

Antoine-Schwartz commented Nov 19, 2024

jmoralez commented Nov 19, 2024

Antoine-Schwartz commented Nov 20, 2024 • edited Loading

elephaint commented Nov 19, 2024 •

edited

Loading

Antoine-Schwartz commented Nov 20, 2024 •

edited

Loading