RNN have promise for ts b/c memory
RNN-RBM Hessian free opt?
what input causes the hidden unit i to be maximally active?
\[ a_i = f\left(∑\limits_j ωij(1)x_j + b_j \right) \]
constraint
- nhidden
- types of ts: frequency-rich, stnarity
- can it be insensitive to the window?, robustness in general
- sleep stage classification
- astro
- ted dunniang ekg
- smaller data was faster on cpu. gpu shuffling?
- the yaml thing makes it complicated for a beginner. it’s supposed to make your life easier but it’s another thing to learn it’s weird
coursera lec 15 “modeling real-valued data with rbm”
http://arxiv.org/abs/1406.2572
- thanks ted dunning, kirk borne, jay kaufold
- we want to be more general with RNN as opposed to HMM, lin dynamics sys
- learn about batches
- can train multiple len sequences?
- how does it go though training
So Network obj is a base obj.: regressor, autoenc, b/c it has a cost fx
- see if it uses the latest hf optimizer
only run setup_enc/decoder once
- check if dataset can do varying length sequences
- waht does train_batches do?? it’s added in seqdataset
- i want to predict on just one sequence..see how i can cpy params from trained mdl
- trn and val gen ds still not working
- why does the ds beak the time axis when using callable()?
- the callable is exhausted..what happens?
- how would classifying work? bc it’s seq->(one cls)
- also chk the log out of climate. it says logistic but i’m not sure
- can mod the code to make a ‘monitor’ as a separate proc?
- Nando into https://www.youtube.com/watch?v=4vGiHC35j9s
- mathmonk intro https://www.youtube.com/watch?v=vU6AiEYED9E
- http://nlpers.blogspot.com/2014/10/hyperparameter-search-bayesian.html
- a tutorial on bayesian opt of expensive cost funcs
- multivar gaussian theorem: go from joint to conditional.. why?
- capture similarity of points with a ‘kernel’
does hyperopt handle stochastic??
- Why? b/c we can integrate for marginals
- http://www.robots.ox.ac.uk/~mebden/reports/GPtutorial.pdf google link
- better video http://videolectures.net/gpip06_mackay_gpb/ though if by far. makes use of a computer to make you google youll get nando’s lecture
coming fron sci and eng . tricky thnx engstats.com. however it took my analytical function oriented mind to get around appealing to stats for “functions”.
- REALLY understand joint, conditional, and marginal distributions in
this more abstranct sense how to learn on your own. ive been out of classes for a while:
- find intro papers, and videos
- build a toy. use alpha blending
[conditionals ovhttp://stats.stackexchange.com/questions/30588/deriving-the-conditional-distributions-of-a-multivariate-normal-distribution REALLY understand the differences b/w marginals, conditionals, and joints., and EV in this more abstract context gptools adds noise?
need to consider temporal aspect. the literature if filled with techniques suitable for specific application domains! want to bring review neural net literature anomaly detection
http://www.idsia.ch/~juergen/rnn.html Hyper-opt for a ‘db’ of priors hierarchy AND recurrent aspects HF opt http://pillowlab.wordpress.com/2013/06/11/lab-meeting-6102013-hessian-free-optimization/ great ideas for data examples in anomaly detection of ts simple explanation of rnn http://www.willamette.edu/~gorr/classes/cs449/rnn1.html explain rnn vidoedeo http://techtalks.tv/talks/on-the-difficulty-of-training-recurrent-neural-networks/58134/ basins of attraction lstm tutorial http://techtalks.tv/talks/on-the-difficulty-of-training-recurrent-neural-networks/58134/ exploding and vanishing gradients problem for rnn..rnn hard for long term dependency
4 ways to to train RNN: back prop thru time your weights blow up or diminish: has local min problem
- long short term memory LSTM
- Hessian free opt: can deal with small grads
- Echo state net
- good init with momentum
comparison in hf paper.
talk about hierarchy of ts in training and analysis don’t care about interpretability that much but it may be incl for free. need to ‘model’ ts.. what if we don’t want human involvement to model? comparison with other techniques: in review paper most reserach on training (they are difficult) instead of analysis now go through different
specify a novelty detection mathematical context
Recurrent Neural Networks http://www.cs.bham.ac.uk/~jxb/INC/l12.pdf
better predictor makes a better novelty detector? can you use a nn to understand the data gen proc?
what?:
- architechture
- training:
cool thing about ts is easy to viz. ‘watch’ the predicted as it is training by superimposing. maybe we don’t need to go to the most min possible can a neural net show regularity progressively?
RNN display and memorize temporal nature
future work:
- multivar ts
- lstm
human intuition => less computation no human => lots of computation
transfer learning: if another ts is similar. then, the work used to train 1st one could help train the next?
if you just do a sliding window, it def wont remember anything beyond size of win. so it requires user intervention.
next task: read on the difficulty of training rnn
do the long term effect captured if they are outside the bptt?
ilyan sketkuver goes through the rnn algos. and makes the case, for me!, that the other algos need specific tweaks
important to get unknown lags
motivations: lots of ts and idk the chars of them. want robustness
how to feed the net??
so is sgd doing backprop?
time series segmentation like anomaly detection?
google: time series segmentation anomaly detection’ to put things in perspective
why is generative model beter for ad??
understand vanishing gradients in ‘advances in traiing rnns’
state of the art coming from cs, data mining, and stats communities
now go through GroundHog and assoc paper
- How do neural nets fit in statistical machine translation?
I Feature extraction I Continuous-space representation I Truly data-driven: requires minimal domain knowledge
123
http://www.nehalemlabs.net/prototype/blog/2013/10/10/implementing-a-recurrent-neural-network-in-python/ u could use a simple feedforward neural network and feed all dimensions as inputs. But this comes with downsides: first, the dimensionality of your first layer will be huge, leading to more parameters to learn; second, if the data really is dynamic it could be the case that inputs will have different dimensionality and there is no elegant way to handle that.
theanets trainingRNNs/ might have clipping http://andrew.gibiansky.com/blog/machine-learning/speech-recognition-neural-networks/ We may want to predict things other than the things in the range of the nonlinearity, so instead we do not apply the nonlinearity ^^check the recrnt code in thenaonet.. how exactly does it train? how does it make the trng seq?
for prod. find a way to reduce ts size while keeping important features. reduce and extrapolate to higher def. find the errror
should i just give it various distortions of the full ts? prediction or autoencoder? seems like it predicts better with long seqs. cant fill in short seqs error vs seq len make ‘epoch’ a hyperparam! maybe opt for length of window
machine learning andrew ng
cool uses: compressiion
mcomp package http://yahoolabs.tumblr.com/post/114590420346/a-benchmark-dataset-for-time-series-anomaly?utm_content=buffer2047a&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer http://nikhilbuduma.com/2015/01/11/a-deep-dive-into-recurrent-neural-networks/ three layers of optimization furutre: practical issue: reduce fidelity of ts w/o losing features future: multivariate perfect reproduction. identity function problem: assume validation set has no anomaly? argument for multiple lstm layers https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenTerm1201415/sak2.pdf they beat me to it!!! https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2015-56.pdf different lstms are about the same http://colah.github.io/posts/2015-08-Understanding-LSTMs/