Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion of methods alternative to CNNs (e.g. RNNs or transformers) #5

Open
modenesi opened this issue Jan 28, 2025 · 6 comments
Open
Assignees

Comments

@modenesi
Copy link

Hi all,

I'm not used to using issues on github, but I love the idea and I want to be using it more. Feel free to give me feedback on it anytime.

I just want to start an issue to discuss whether or not we want to implement models alternative to CNNs in order to make our ML model able to take inputs of any size. It might be that the current joint CNN solution that you both have is enough to tackle the issue. I'll try to learn a bit more about it.

ALTERNATIVE SOLUTIONs:
It might also be interesting to try alternative models, especially if the cost of running them is low? I wonder if we can handle our current code to chatGPT and ask it to write a 2nd model directly suited for time series. It might even capture temporal aspects better than CNNs. Some options:

  • RNNs, Recurrent Neural Nets, not be confused with Recursive Neural Net (e.g. LSTM or GRU)
  • Transformers

I need to check, but I think we want to train these models w/ very long vectors, but with padding and masking. We can talk more about it during our meeting today.

I read that RNNs or transformers are great to model temporal dependencies, while CNNs with global pooling is more efficient to capture broad patters in the time sequence.

@modenesi
Copy link
Author

RNNs are like Neural Nets, but with a "hidden state" that keeps track of the dependency between observations over time. I like this simple explanation of it:
Image

@modenesi
Copy link
Author

modenesi commented Jan 30, 2025

So we would be able to incorporate time-invariant features (such as population size, etc) to the RNN, which is great news!

Roughly, it would update the hidden state as:
$h_t = f ( W_h \cdot h_{t-1} + W_x \cdot x_{t} + W_h \cdot h_{t-1} + W_c \cdot C + b)$
where $C$ accounts for the time-invariant variables.

It is simple to implement it:

  1. TIME SERIES VECTOR: get the time series data, pad it to e.g. 200 days, and mask it
  2. REPEAT TIME-INVARIANT: create 200 rows of the repeated time-invariant features (no masking or padding here)
  3. FINAL INPUT: concatenate the previous two datasets, it will be the input for the model

@gvegayon: What would be the time-invariant variables we'd like to include?

@modenesi
Copy link
Author

Also, after reading about options and thinking about our problem. I think that LSTM (which is a fancy RNN) might be an overkill, given that our time series isn't that long and it is univariate. In fact, we might even have problems with overfitting the data.

My suggestion:

  1. Try a simple RNN first, adding the time-invariant variables as described above
  2. Also try a Gated Recurrent Unit (GRU), which is an RNN with a bit more structure for time dependencies, but not as complex as a LSTM, in order to compare it to the simple RNN. If there is significant accuracy gains compared to the simple RNN, then consider training a LSTM. If accuracy is similar to the simple RNN, stick to the simple RNN.

Happy to talk about it more next time we meet.

@sima-njf
Copy link
Collaborator

Thank you so much, Bernardo!
that is an excellent explanation of LSTM. I am now working on splitting the data so I am in the first steps. when I do that part in a way that it runs faster I will implement your ideas on it.

@gvegayon
Copy link
Member

What would be the time-invariant variables we'd like to include?

That's a great question, @modenesi. For the moment, the only one I can think of is population size. We could add other things, but generally that would make the model less usable. For instance, we could add Rt and generation interval estimates, but that information is not always available.

@modenesi
Copy link
Author

From chatGPT, I asked for a specific architecture for a simple RNN in R, with 1 time invariant and 1 time variant variable:

library(keras)
library(tensorflow)

# Define Temporal Input (Variable-Length Time-Series Data)
temporal_input <- layer_input(shape = c(90, 1), name = "temporal_input")  # Max 90 days

# Apply Masking to Ignore Padded Timesteps
masked_temporal_input <- temporal_input %>%
  layer_masking(mask_value = 0.0)  # Ignore 0 values (padding)

# Define Time-Invariant Feature Input
time_invariant_input <- layer_input(shape = c(1), name = "time_invariant_input")

# Repeat Time-Invariant Feature Across Timesteps
time_invariant_repeated <- time_invariant_input %>%
  layer_repeat_vector(90)  # Repeat for max length = 90 days

# Concatenate Temporal & Time-Invariant Features
combined_input <- layer_concatenate(list(masked_temporal_input, time_invariant_repeated))

# RNN Layer (Handles Variable-Length Inputs)
rnn_output <- combined_input %>%
  layer_simple_rnn(units = 32, activation = "tanh",
                   kernel_regularizer = regularizer_l2(0.001)) %>%
  layer_dropout(rate = 0.2)  # Dropout for Regularization

# Final Output Layer
final_output <- rnn_output %>%
  layer_dense(units = 1, activation = "linear", name = "output")  # Regression output

# Define Model
model <- keras_model(inputs = list(temporal_input, time_invariant_input),
                     outputs = final_output)

# Compile Model
model %>% compile(
  optimizer = optimizer_adam(),
  loss = "mse"
)

# Print Model Summary
summary(model)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants