Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics
nnRNN - NeurIPS 2019
expRNN code taken from here
EURNN tests based on code taken from here
Model | Hidden Size | Optimizer | LR | Orth. LR | δ | T decay | Recurrent init |
RNN | 128 | RMSprop α=0.9 | 0.001 | Glorot Normal | |||
RNN-orth | 128 | RMSprop α=0.99 | 0.0002 | Random Orth | |||
EURNN | 128 | RMSprop α=0.5 | 0.001 | ||||
EURNN | 256 | RMSprop α=0.5 | 0.001 | ||||
expRNN | 128 | RMSprop α=0.99 | 0.001 | 0.0001 | Henaff | ||
expRNN | 176 | RMSprop α=0.99 | 0.001 | 0.0001 | Henaff | ||
nnRNN | 128 | RMSprop α = 0.99 | 0.0005 | 10-6 | 0.0001 | 10-6 | Cayley |
Model | Hidden Size | Optimizer | LR | Orth. LR | δ | T decay | Recurrent init |
RNN | 512 | RMSprop α=0.9 | 0.0001 | Glorot Normal | |||
RNN-orth | 512 | RMSprop α=0.99 | 5*10-5 | Random orth | |||
EURNN | 512 | RMSprop α=0.9 | 0.0001 | ||||
EURNN | 1024 | RMSprop α=0.9 | 0.0001 | ||||
expRNN | 512 | RMSprop α=0.99 | 0.0005 | 5*10-5 | Cayley | ||
expRNN | 722 | RMSprop α=0.99 | 5*10-5 | Cayley | |||
nnRNN | 512 | RMSprop α=0.99 | 0.0002 | 2*10-5 | 0.1 | 0.0001 | Cayley |
LSTM | 512 | RMSprop α=0.99 | 0.0005 | Glorot Normal | |||
LSTM | 257 | RMSprop α=0.9 | 0.0005 | Glorot Normal |
python copytask.py [args]
Options:
- net-type : type of RNN to use in test
- nhid : number if hidden units
- cuda : use CUDA
- T : delay between sequence lengths
- labels : number of labels in output and input, maximum 8
- c-length : sequence length
- onehot : onehot labels and inputs
- vari : variable length
- random-seed : random seed for experiment
- batch : batch size
- lr : learning rate for optimizer
- lr_orth : learning rate for orthogonal optimizer
- alpha : alpha value for optimizer (always RMSprop)
- rinit : recurrent weight matrix initialization options: [xavier, henaff, cayley, random orth.]
- iinit : input weight matrix initialization, options: [xavier, kaiming]
- nonlin : non linearity type, options: [None, tanh, relu, modrelu]
- alam : strength of penalty on (δ in the paper)
- Tdecay : weight decay on upper triangular matrix values
python sMNIST.py [args]
Options:
- net-type : type of RNN to use in test
- nhid : number if hidden units
- epochs : number of epochs
- cuda : use CUDA
- permute : permute the order of the input
- random-seed : random seed for experiment (excluding permute order which has independent seed)
- batch : batch size
- lr : learning rate for optimizer
- lr_orth : learning rate for orthogonal optimizer
- alpha : alpha value for optimizer (always RMSprop)
- rinit : recurrent weight matrix initialization options: [xavier, henaff, cayley, random orth.]
- iinit : input weight matrix initialization, options: [xavier, kaiming]
- nonlin : non linearity type, options: [None, tanh, relu, modrelu]
- alam : strength of penalty on (δ in the paper)
- Tdecay : weight decay on upper triangular matrix values
- save_freq : frequency in epochs to save data and network