diff --git a/docs/en/week14/practicum14.sbv b/docs/en/week14/practicum14.sbv index 179935366..cb8a594c5 100644 --- a/docs/en/week14/practicum14.sbv +++ b/docs/en/week14/practicum14.sbv @@ -1,5679 +1,2837 @@ 0:00:00.320,0:00:07.759 -so today last lesson um - -0:00:04.160,0:00:09.599 -yeah i'm smiling but i'm sad uh +so today last lesson um yeah i'm smiling but i'm sad uh 0:00:07.759,0:00:12.559 -i wanted to talk about energy-based - -0:00:09.599,0:00:14.400 -models and how to train them +i wanted to talk about energy-based models and how to train them 0:00:12.559,0:00:16.080 -but i think i need to prepare like for a - -0:00:14.400,0:00:18.960 -month before that +but i think i need to prepare like for a month before that 0:00:16.080,0:00:20.640 -so actually uh if you are still - -0:00:18.960,0:00:22.000 -interested in this summer you will be +so actually uh if you are still interested in this summer you will be 0:00:20.640,0:00:24.720 -able to - -0:00:22.000,0:00:25.760 -get a tutorial on energy-based models uh +able to get a tutorial on energy-based models uh 0:00:24.720,0:00:28.880 -we are writing a paper - -0:00:25.760,0:00:30.560 -with jan together and so we actually i'm +we are writing a paper with jan together and so we actually i'm 0:00:28.880,0:00:32.559 -planning to get this paper written as - -0:00:30.560,0:00:33.600 -like part it's going to be math and then +planning to get this paper written as like part it's going to be math and then 0:00:32.559,0:00:35.040 -part is going to be actually the - -0:00:33.600,0:00:39.120 -implementation +part is going to be actually the implementation 0:00:35.040,0:00:41.840 -such that you can actually execute uh - -0:00:39.120,0:00:42.559 -the paper basically and you can get you +such that you can actually execute uh the paper basically and you can get you 0:00:41.840,0:00:44.399 -know - -0:00:42.559,0:00:47.600 -a better understanding of what's going +know a better understanding of what's going 0:00:44.399,0:00:50.480 -on um - -0:00:47.600,0:00:51.680 -yeah so yeah that's gonna come out maybe +on um yeah so yeah that's gonna come out maybe 0:00:50.480,0:00:54.480 -in a month uh - -0:00:51.680,0:00:56.079 -we we i have to do pretty a pretty good +in a month uh we we i have to do pretty a pretty good 0:00:54.480,0:00:59.440 -job there - -0:00:56.079,0:01:01.199 -um so and maybe uh if the +job there um so and maybe uh if the 0:00:59.440,0:01:02.960 -maybe we can even have a additional - -0:01:01.199,0:01:05.040 -class later on +maybe we can even have a additional class later on 0:01:02.960,0:01:06.799 -if you're interested and you know i'm - -0:01:05.040,0:01:10.000 -always here uh +if you're interested and you know i'm always here uh 0:01:06.799,0:01:11.280 -up for uh teaching you so again if - -0:01:10.000,0:01:12.240 -you're interested in this energy-based +up for uh teaching you so again if you're interested in this energy-based 0:01:11.280,0:01:14.400 -model later on - -0:01:12.240,0:01:15.439 -like outside the course and whatever we +model later on like outside the course and whatever we 0:01:14.400,0:01:18.640 -can again meet - -0:01:15.439,0:01:19.360 -and uh record and and pretend it's +can again meet and uh record and and pretend it's 0:01:18.640,0:01:22.560 -actually - -0:01:19.360,0:01:24.400 -one more class okay so yeah i i didn't +actually one more class okay so yeah i i didn't 0:01:22.560,0:01:26.560 -manage to do it for today - -0:01:24.400,0:01:28.240 -so today we're going to be covering um +manage to do it for today so today we're going to be covering um 0:01:26.560,0:01:31.520 -if i get to finish two - -0:01:28.240,0:01:35.280 -topics um we never +if i get to finish two topics um we never 0:01:31.520,0:01:36.960 -talked about them uh too much before uh - -0:01:35.280,0:01:38.880 -because they are more machine learning +talked about them uh too much before uh because they are more machine learning 0:01:36.960,0:01:41.119 -related but nevertheless - -0:01:38.880,0:01:42.240 -we care also in deep learning and the +related but nevertheless we care also in deep learning and the 0:01:41.119,0:01:45.360 -topic of today - -0:01:42.240,0:01:47.200 -is regularization overfitting and +topic of today is regularization overfitting and 0:01:45.360,0:01:50.479 -regularization let me start - -0:01:47.200,0:01:53.600 -sharing the screen so again this is my +regularization let me start sharing the screen so again this is my 0:01:50.479,0:01:56.000 -as usual perspective of - -0:01:53.600,0:01:57.280 -the topic uh it's not usually the +as usual perspective of the topic uh it's not usually the 0:01:56.000,0:02:00.320 -mainstream but you know - -0:01:57.280,0:02:02.560 -it's what you get since it's my +mainstream but you know it's what you get since it's my 0:02:00.320,0:02:04.079 -view and i'm your educator your - -0:02:02.560,0:02:06.479 -instructor today +view and i'm your educator your instructor today 0:02:04.079,0:02:08.879 -so overfitting and regularization - -0:02:06.479,0:02:11.039 -connection between them right so +so overfitting and regularization connection between them right so 0:02:08.879,0:02:12.160 -those are two different topics those are - -0:02:11.039,0:02:12.560 -two different things but they are of +those are two different topics those are two different things but they are of 0:02:12.160,0:02:16.480 -course - -0:02:12.560,0:02:16.959 -connected so i start with this drawing +course connected so i start with this drawing 0:02:16.480,0:02:19.360 -here - -0:02:16.959,0:02:20.720 -uh someone told me it's not intuitive +here uh someone told me it's not intuitive 0:02:19.360,0:02:23.760 -but again - -0:02:20.720,0:02:27.040 -for me so there you get it +but again for me so there you get it 0:02:23.760,0:02:30.000 -uh here i'm showing you in the - -0:02:27.040,0:02:30.720 -uh with the pink box the data complexity +uh here i'm showing you in the uh with the pink box the data complexity 0:02:30.000,0:02:35.200 -okay so - -0:02:30.720,0:02:37.920 -those dots are sampled from my +okay so those dots are sampled from my 0:02:35.200,0:02:38.879 -samples from from my training data set - -0:02:37.920,0:02:41.280 -and +samples from from my training data set and 0:02:38.879,0:02:42.800 -then i tried to fit their three - -0:02:41.280,0:02:46.800 -different models okay +then i tried to fit their three different models okay 0:02:42.800,0:02:50.160 -so in the first case that is - -0:02:46.800,0:02:52.480 -basically the model complexity is below +so in the first case that is basically the model complexity is below 0:02:50.160,0:02:54.480 -is under is - -0:02:52.480,0:02:55.519 -you know it's smaller than the data +is under is you know it's smaller than the data 0:02:54.480,0:02:57.599 -complexity - -0:02:55.519,0:02:59.120 -and therefore you have some phenomenon +complexity and therefore you have some phenomenon 0:02:57.599,0:03:00.239 -called under fitting right because you - -0:02:59.120,0:03:02.879 -try to fit +called under fitting right because you try to fit 0:03:00.239,0:03:06.480 -uh what looks like a parabola with a - -0:03:02.879,0:03:08.400 -straight line and therefore you're +uh what looks like a parabola with a straight line and therefore you're 0:03:06.480,0:03:09.920 -not you're not going you're not doing a - -0:03:08.400,0:03:12.560 -good job right +not you're not going you're not doing a good job right 0:03:09.920,0:03:14.239 -then what happened next here we actually - -0:03:12.560,0:03:17.040 -have the right fitting +then what happened next here we actually have the right fitting 0:03:14.239,0:03:17.599 -in this case the model complexity - -0:03:17.040,0:03:21.120 -matches +in this case the model complexity matches 0:03:17.599,0:03:22.560 -the data complexity right um and so - -0:03:21.120,0:03:24.480 -in this case what's the difference with +the data complexity right um and so in this case what's the difference with 0:03:22.560,0:03:27.280 -the previous case uh - -0:03:24.480,0:03:28.000 -in this case you have zero error right +the previous case uh in this case you have zero error right 0:03:27.280,0:03:31.280 -so your - -0:03:28.000,0:03:32.000 -your model exactly matches the training +so your your model exactly matches the training 0:03:31.280,0:03:36.239 -points those - -0:03:32.000,0:03:36.560 -points finally we have overfitting where +points those points finally we have overfitting where 0:03:36.239,0:03:39.920 -the - -0:03:36.560,0:03:43.200 -model complexity is actually +the model complexity is actually 0:03:39.920,0:03:44.799 -greater than the data complexity in this - -0:03:43.200,0:03:48.159 -case +greater than the data complexity in this case 0:03:44.799,0:03:51.519 -the model doesn't choose a parabola - -0:03:48.159,0:03:54.959 -because why question for your +the model doesn't choose a parabola because why question for your 0:03:51.519,0:03:58.239 -audience live my own live audience why - -0:03:54.959,0:04:00.400 -is this model like +audience live my own live audience why is this model like 0:03:58.239,0:04:03.040 -wiggly in this case why is not a - -0:04:00.400,0:04:05.200 -parabola +wiggly in this case why is not a parabola 0:04:03.040,0:04:06.720 -and you're supposed to type in the chat - -0:04:05.200,0:04:08.879 -because otherwise +and you're supposed to type in the chat because otherwise 0:04:06.720,0:04:10.080 -i don't know if you're following so my - -0:04:08.879,0:04:12.959 -question is in +i don't know if you're following so my question is in 0:04:10.080,0:04:15.280 -the last case my data my model - -0:04:12.959,0:04:19.280 -complexity is superior than the +the last case my data my model complexity is superior than the 0:04:15.280,0:04:21.120 -it's larger than the data complexity and - -0:04:19.280,0:04:22.800 -although those points look like they +it's larger than the data complexity and although those points look like they 0:04:21.120,0:04:26.320 -belong to a parabola - -0:04:22.800,0:04:28.560 -my model decides to get that spiky guy +belong to a parabola my model decides to get that spiky guy 0:04:26.320,0:04:30.479 -like spiky peak on the left and you know - -0:04:28.560,0:04:35.759 -some weird stuff +like spiky peak on the left and you know some weird stuff 0:04:30.479,0:04:37.440 -model doesn't learn but memorizes um - -0:04:35.759,0:04:39.759 -overfitting but sure sure it's written +model doesn't learn but memorizes um overfitting but sure sure it's written 0:04:37.440,0:04:41.600 -they're overfitting but why - -0:04:39.759,0:04:44.240 -if those points are coming from a +they're overfitting but why if those points are coming from a 0:04:41.600,0:04:45.600 -parabola i would expect even a very - -0:04:44.240,0:04:47.520 -larger model +parabola i would expect even a very larger model 0:04:45.600,0:04:48.720 -would make like a very nice parabola - -0:04:47.520,0:04:51.440 -right +would make like a very nice parabola right 0:04:48.720,0:04:52.240 -you're privately writing to me don't - -0:04:51.440,0:04:56.479 -private +you're privately writing to me don't private 0:04:52.240,0:04:59.600 -private right um - -0:04:56.479,0:05:02.160 -so if +private right um so if 0:04:59.600,0:05:04.000 -and this is a big if right if my points - -0:05:02.160,0:05:06.800 -my training points come from +and this is a big if right if my points my training points come from 0:05:04.000,0:05:09.520 -an actual parabola even the overfitting - -0:05:06.800,0:05:12.240 -model would be making a perfect parabola +an actual parabola even the overfitting model would be making a perfect parabola 0:05:09.520,0:05:12.720 -the point here is that there is some - -0:05:12.240,0:05:14.800 -noise +the point here is that there is some noise 0:05:12.720,0:05:17.680 -right there is always some noise and - -0:05:14.800,0:05:20.960 -therefore the model that perfectly +right there is always some noise and therefore the model that perfectly 0:05:17.680,0:05:23.280 -goes through every training point - -0:05:20.960,0:05:24.400 -will be like that it's going to be like +goes through every training point will be like that it's going to be like 0:05:23.280,0:05:26.960 -crazy because - -0:05:24.400,0:05:28.479 -all those points don't exactly live on +crazy because all those points don't exactly live on 0:05:26.960,0:05:31.360 -the parabola but they are - -0:05:28.479,0:05:32.479 -slightly offset and in order to be +the parabola but they are slightly offset and in order to be 0:05:31.360,0:05:34.960 -perfectly - -0:05:32.479,0:05:36.400 -uh going through them you're gonna have +perfectly uh going through them you're gonna have 0:05:34.960,0:05:37.120 -you know the mother is gonna have to try - -0:05:36.400,0:05:39.680 -to +you know the mother is gonna have to try to 0:05:37.120,0:05:40.960 -come up with some funky function okay - -0:05:39.680,0:05:44.080 -does it make sense +come up with some funky function okay does it make sense 0:05:40.960,0:05:47.120 -so the point is that without noise - -0:05:44.080,0:05:49.759 -this would be just a perfect parabola +so the point is that without noise this would be just a perfect parabola 0:05:47.120,0:05:53.840 -so someone would say okay maybe we - -0:05:49.759,0:05:53.840 -should use the right fitting right +so someone would say okay maybe we should use the right fitting right 0:05:54.080,0:05:57.759 -in machine learning maybe we are doing - -0:05:56.319,0:06:00.479 -deep learning +in machine learning maybe we are doing deep learning 0:05:57.759,0:06:01.840 -and it's not quite the case right - -0:06:00.479,0:06:03.759 -fitting +and it's not quite the case right fitting 0:06:01.840,0:06:05.199 -it's it's definitely not the case - -0:06:03.759,0:06:08.639 -actually our models +it's it's definitely not the case actually our models 0:06:05.199,0:06:10.960 -are so so so so powerful that they - -0:06:08.639,0:06:12.000 -even managed to learn noise like there +are so so so so powerful that they even managed to learn noise like there 0:06:10.960,0:06:14.319 -was a paper there - -0:06:12.000,0:06:15.520 -where they were showing that you can +was a paper there where they were showing that you can 0:06:14.319,0:06:18.720 -label imagenet - -0:06:15.520,0:06:19.280 -with random labels you can get a network +label imagenet with random labels you can get a network 0:06:18.720,0:06:21.759 -to - -0:06:19.280,0:06:22.319 -you know perfectly memorize every label +to you know perfectly memorize every label 0:06:21.759,0:06:25.039 -uh - -0:06:22.319,0:06:25.759 -for each of these samples so you can +uh for each of these samples so you can 0:06:25.039,0:06:28.080 -clearly - -0:06:25.759,0:06:30.400 -tell that these the models we are using +clearly tell that these the models we are using 0:06:28.080,0:06:32.080 -are absolutely over parameterized and - -0:06:30.400,0:06:35.520 -therefore means that +are absolutely over parameterized and therefore means that 0:06:32.080,0:06:38.720 -you have way more power than - -0:06:35.520,0:06:40.240 -the uh you know then then it's necessary +you have way more power than the uh you know then then it's necessary 0:06:38.720,0:06:44.080 -in order to learn - -0:06:40.240,0:06:47.759 -the structure of the data nevertheless +in order to learn the structure of the data nevertheless 0:06:44.080,0:06:50.560 -we actually need that hmm - -0:06:47.759,0:06:52.960 -so let's figure out what's going on okay +we actually need that hmm so let's figure out what's going on okay 0:06:50.560,0:06:52.960 -um - -0:06:53.199,0:06:57.120 -oh actually maybe you know the answer +um oh actually maybe you know the answer 0:06:54.639,0:06:58.960 -right so what is the point - -0:06:57.120,0:07:00.400 -why do we want to go in very very high +right so what is the point why do we want to go in very very high 0:06:58.960,0:07:03.840 -dimensional space - -0:07:00.400,0:07:06.479 -i told you a few times right because +dimensional space i told you a few times right because 0:07:03.840,0:07:06.479 -who answers - -0:07:07.199,0:07:14.960 -come on it's the last class answer me +who answers come on it's the last class answer me 0:07:12.000,0:07:15.199 -why do we want to go in very to expand - -0:07:14.960,0:07:17.680 -the +why do we want to go in very to expand the 0:07:15.199,0:07:19.039 -the data distribution yeah optimization - -0:07:17.680,0:07:20.960 -is easier yeah fantastic +the data distribution yeah optimization is easier yeah fantastic 0:07:19.039,0:07:23.199 -that's the point right whenever we go in - -0:07:20.960,0:07:25.120 -a hype over parameterized space +that's the point right whenever we go in a hype over parameterized space 0:07:23.199,0:07:27.280 -everything is very easy to move around - -0:07:25.120,0:07:29.039 -right and therefore we always +everything is very easy to move around right and therefore we always 0:07:27.280,0:07:30.319 -would like to put ourselves in the - -0:07:29.039,0:07:32.400 -overfitting +would like to put ourselves in the overfitting 0:07:30.319,0:07:35.120 -scenarios with our networks because it's - -0:07:32.400,0:07:37.919 -the training is going to be easier +scenarios with our networks because it's the training is going to be easier 0:07:35.120,0:07:38.960 -nevertheless what's the problem now well - -0:07:37.919,0:07:41.280 -the problem is they +nevertheless what's the problem now well the problem is they 0:07:38.960,0:07:42.800 -they're going to be like they wiggle - -0:07:41.280,0:07:46.080 -like crazy +they're going to be like they wiggle like crazy 0:07:42.800,0:07:48.400 -um another another thing um - -0:07:46.080,0:07:49.199 -so this is point number one point number +um another another thing um so this is point number one point number 0:07:48.400,0:07:52.400 -two - -0:07:49.199,0:07:55.680 -why would you think you actually +two why would you think you actually 0:07:52.400,0:07:59.840 -have to overfit - -0:07:55.680,0:07:59.840 -when writing your script +have to overfit when writing your script 0:08:01.199,0:08:05.840 -second question i know interactive - -0:08:04.400,0:08:09.520 -question today +second question i know interactive question today 0:08:05.840,0:08:13.840 -actually sure there is some trend - -0:08:09.520,0:08:15.120 -you can model okay maybe +actually sure there is some trend you can model okay maybe 0:08:13.840,0:08:17.919 -maybe it's in the right direction but - -0:08:15.120,0:08:21.199 -it's too complicated as an answer +maybe it's in the right direction but it's too complicated as an answer 0:08:17.919,0:08:23.199 -um so are you experts - -0:08:21.199,0:08:24.879 -you're a network trainer you should be +um so are you experts you're a network trainer you should be 0:08:23.199,0:08:28.560 -right because you've been - -0:08:24.879,0:08:31.680 -following these lessons for a bit but um +right because you've been following these lessons for a bit but um 0:08:28.560,0:08:32.479 -at the beginning okay try to answer this - -0:08:31.680,0:08:36.159 -question +at the beginning okay try to answer this question 0:08:32.479,0:08:36.159 -so why would you like to overfit - -0:08:36.479,0:08:42.159 -i even tell you one one bit more i would +so why would you like to overfit i even tell you one one bit more i would 0:08:39.760,0:08:43.039 -always i do always start training my - -0:08:42.159,0:08:45.680 -network on +always i do always start training my network on 0:08:43.039,0:08:45.680 -one batch - -0:08:46.880,0:08:50.959 -if the model has capabilities so this is +one batch if the model has capabilities so this is 0:08:49.519,0:08:54.399 -the number one rule - -0:08:50.959,0:08:56.640 -to debug machine learning code okay +the number one rule to debug machine learning code okay 0:08:54.399,0:08:57.920 -you would like to see whether you [ __ ] - -0:08:56.640,0:09:00.959 -up in your +you would like to see whether you [ __ ] up in your 0:08:57.920,0:09:03.440 -model creation okay so first thing - -0:09:00.959,0:09:04.480 -you can just get a batch of the correct +model creation okay so first thing you can just get a batch of the correct 0:09:03.440,0:09:07.120 -size - -0:09:04.480,0:09:07.600 -uh even with random noise right even you +size uh even with random noise right even you 0:09:07.120,0:09:10.080 -know - -0:09:07.600,0:09:11.440 -torch dot trend something with random +know torch dot trend something with random 0:09:10.080,0:09:13.440 -labels - -0:09:11.440,0:09:15.360 -and then you would like to go over a few +labels and then you would like to go over a few 0:09:13.440,0:09:17.120 -epochs with one batch - -0:09:15.360,0:09:19.200 -with random crap which could be the +epochs with one batch with random crap which could be the 0:09:17.120,0:09:21.680 -first batch of your data set or whatever - -0:09:19.200,0:09:22.320 -just to prove that your model can learn +first batch of your data set or whatever just to prove that your model can learn 0:09:21.680,0:09:25.279 -okay - -0:09:22.320,0:09:25.760 -you can easily make some tiny mistakes +okay you can easily make some tiny mistakes 0:09:25.279,0:09:28.080 -uh - -0:09:25.760,0:09:29.360 -like i made a few times like doing the +uh like i made a few times like doing the 0:09:28.080,0:09:34.560 -zero - -0:09:29.360,0:09:34.560 -zero grad uh after the backward +zero zero grad uh after the backward 0:09:35.279,0:09:39.200 -yeah i know it happens and nothing - -0:09:37.519,0:09:40.800 -happens nothing learns okay so you +yeah i know it happens and nothing happens nothing learns okay so you 0:09:39.200,0:09:43.440 -always want to see - -0:09:40.800,0:09:44.080 -that your model model can learn right +always want to see that your model model can learn right 0:09:43.440,0:09:46.000 -then if - -0:09:44.080,0:09:48.080 -you can memorize yeah fantastic we are +then if you can memorize yeah fantastic we are 0:09:46.000,0:09:50.880 -going to be now learning how to - -0:09:48.080,0:09:51.680 -uh improve performance of a model that +going to be now learning how to uh improve performance of a model that 0:09:50.880,0:09:54.640 -memorizes - -0:09:51.680,0:09:55.920 -uh its own data okay so two reasons +memorizes uh its own data okay so two reasons 0:09:54.640,0:09:57.120 -right first one we said over - -0:09:55.920,0:10:00.000 -parameterize +right first one we said over parameterize 0:09:57.120,0:10:01.920 -uh models are easy to train because the - -0:10:00.000,0:10:04.959 -landscape is much smoother +uh models are easy to train because the landscape is much smoother 0:10:01.920,0:10:06.720 -us and you know if you have a - -0:10:04.959,0:10:08.399 -over parameterized model you're gonna +us and you know if you have a over parameterized model you're gonna 0:10:06.720,0:10:10.000 -have you can - -0:10:08.399,0:10:11.839 -uh ideally start with different +have you can uh ideally start with different 0:10:10.000,0:10:14.240 -initializations so you get initial - -0:10:11.839,0:10:15.360 -points in the parameter space and then +initializations so you get initial points in the parameter space and then 0:10:14.240,0:10:17.120 -whenever you train - -0:10:15.360,0:10:18.640 -these different models all of them will +whenever you train these different models all of them will 0:10:17.120,0:10:21.760 -converge to a different - -0:10:18.640,0:10:25.120 -position because you can +converge to a different position because you can 0:10:21.760,0:10:28.000 -think about like a same model you can - -0:10:25.120,0:10:29.360 -permute all the weights you're gonna get +think about like a same model you can permute all the weights you're gonna get 0:10:28.000,0:10:30.000 -i mean you permeate the weights per - -0:10:29.360,0:10:33.040 -layer +i mean you permeate the weights per layer 0:10:30.000,0:10:35.279 -you can still get the same uh model - -0:10:33.040,0:10:37.440 -at the end so they are comparable in +you can still get the same uh model at the end so they are comparable in 0:10:35.279,0:10:40.000 -terms of the function approximator - -0:10:37.440,0:10:41.680 -you are building nevertheless in the +terms of the function approximator you are building nevertheless in the 0:10:40.000,0:10:42.800 -parameter space they are not the same - -0:10:41.680,0:10:45.519 -right so in the function +parameter space they are not the same right so in the function 0:10:42.800,0:10:46.880 -space they are exactly equivalent models - -0:10:45.519,0:10:48.800 -in the parameter space they are +space they are exactly equivalent models in the parameter space they are 0:10:46.880,0:10:51.760 -absolutely different models - -0:10:48.800,0:10:53.519 -nevertheless they will converge to +absolutely different models nevertheless they will converge to 0:10:51.760,0:10:55.440 -equivalently - -0:10:53.519,0:10:58.560 -equivalent models as in they will +equivalently equivalent models as in they will 0:10:55.440,0:11:01.760 -perform equivalently equivalently - -0:10:58.560,0:11:05.040 -good right are you following right am i +perform equivalently equivalently good right are you following right am i 0:11:01.760,0:11:06.880 -talking about weird stuff today but uh - -0:11:05.040,0:11:08.079 -i guess this counts a bit also from +talking about weird stuff today but uh i guess this counts a bit also from 0:11:06.880,0:11:10.160 -joanne's class - -0:11:08.079,0:11:11.120 -where we talk about parameter space and +joanne's class where we talk about parameter space and 0:11:10.160,0:11:13.120 -functional - -0:11:11.120,0:11:14.880 -uh functional space it's so so cool in +functional uh functional space it's so so cool in 0:11:13.120,0:11:17.600 -that class i think next year - -0:11:14.880,0:11:18.240 -i will try to put it online as well okay +that class i think next year i will try to put it online as well okay 0:11:17.600,0:11:19.680 -okay so - -0:11:18.240,0:11:21.680 -first point over pardon me over +okay so first point over pardon me over 0:11:19.680,0:11:24.079 -parameterization helps with training - -0:11:21.680,0:11:25.120 -second point or over parameterization +parameterization helps with training second point or over parameterization 0:11:24.079,0:11:28.320 -helps you with - -0:11:25.120,0:11:31.279 -math debugging can you repeat the point +helps you with math debugging can you repeat the point 0:11:28.320,0:11:33.680 -about function and parameter space yeah - -0:11:31.279,0:11:35.279 -so if you have a neural net and you +about function and parameter space yeah so if you have a neural net and you 0:11:33.680,0:11:38.320 -permute the rows - -0:11:35.279,0:11:38.880 -in your matrices right and then you +permute the rows in your matrices right and then you 0:11:38.320,0:11:42.399 -permute - -0:11:38.880,0:11:46.000 -the uh the column of the +permute the uh the column of the 0:11:42.399,0:11:47.600 -uh the next layer you can basically - -0:11:46.000,0:11:48.800 -you know you can reorganize the weight +uh the next layer you can basically you know you can reorganize the weight 0:11:47.600,0:11:49.839 -so you can get always the same - -0:11:48.800,0:11:51.360 -performance right +so you can get always the same performance right 0:11:49.839,0:11:53.200 -so if you have the first matrix you have - -0:11:51.360,0:11:54.079 -first element of the hidden layer equal +so if you have the first matrix you have first element of the hidden layer equal 0:11:53.200,0:11:55.920 -some number - -0:11:54.079,0:11:58.240 -let's say the hidden layer has size of +some number let's say the hidden layer has size of 0:11:55.920,0:11:59.200 -two right so you have a matrix with two - -0:11:58.240,0:12:01.040 -rows +two right so you have a matrix with two rows 0:11:59.200,0:12:03.600 -and so you can swap the the rows you're - -0:12:01.040,0:12:06.320 -gonna get a hidden layer that is flipped +and so you can swap the the rows you're gonna get a hidden layer that is flipped 0:12:03.600,0:12:07.120 -and then the last the next next weight - -0:12:06.320,0:12:10.399 -matrix +and then the last the next next weight matrix 0:12:07.120,0:12:13.839 -you can flip the um - -0:12:10.399,0:12:17.519 -columns i guess uh and you would get +you can flip the um columns i guess uh and you would get 0:12:13.839,0:12:19.519 -exactly the same network the same - -0:12:17.519,0:12:20.959 -you would sorry you would get exactly +exactly the same network the same you would sorry you would get exactly 0:12:19.519,0:12:22.560 -the same function - -0:12:20.959,0:12:24.079 -it's gonna give you exactly the same +the same function it's gonna give you exactly the same 0:12:22.560,0:12:26.399 -number as an output - -0:12:24.079,0:12:27.519 -although the parameters the the +number as an output although the parameters the the 0:12:26.399,0:12:29.440 -parameters are actually - -0:12:27.519,0:12:30.720 -different right because you swap them so +parameters are actually different right because you swap them so 0:12:29.440,0:12:36.240 -the same parameter - -0:12:30.720,0:12:37.680 -w11 is going to be w21 right so they are +the same parameter w11 is going to be w21 right so they are 0:12:36.240,0:12:39.200 -different so in the parameter space - -0:12:37.680,0:12:41.360 -these are different models so there are +different so in the parameter space these are different models so there are 0:12:39.200,0:12:42.560 -one point is here in the parameter space - -0:12:41.360,0:12:44.720 -another point is here +one point is here in the parameter space another point is here 0:12:42.560,0:12:46.880 -nevertheless the mapping from the - -0:12:44.720,0:12:49.120 -parameter space to the functional space +nevertheless the mapping from the parameter space to the functional space 0:12:46.880,0:12:50.639 -both of them both these two initial - -0:12:49.120,0:12:53.120 -those two configuration +both of them both these two initial those two configuration 0:12:50.639,0:12:54.079 -will map to the same function right - -0:12:53.120,0:12:55.920 -because the +will map to the same function right because the 0:12:54.079,0:12:58.079 -function connects the input to the - -0:12:55.920,0:13:00.160 -output and they're going to be the same +function connects the input to the output and they're going to be the same 0:12:58.079,0:13:01.200 -even if you do this permutation of the - -0:13:00.160,0:13:05.440 -rows and then and +even if you do this permutation of the rows and then and 0:13:01.200,0:13:08.800 -of the columns right makes sense - -0:13:05.440,0:13:12.560 -so if if we +of the columns right makes sense so if if we 0:13:08.800,0:13:12.560 -if the space of parameters - -0:13:12.720,0:13:17.120 -if the space for parameter space is very +if the space of parameters if the space for parameter space is very 0:13:14.880,0:13:20.320 -big for a given data set can we say - -0:13:17.120,0:13:21.680 -that the model is very uncertain about +big for a given data set can we say that the model is very uncertain about 0:13:20.320,0:13:23.360 -its prediction okay we are going to be - -0:13:21.680,0:13:25.120 -talking about uncertainty in a bit so +its prediction okay we are going to be talking about uncertainty in a bit so 0:13:23.360,0:13:27.839 -i'll address that in a bit - -0:13:25.120,0:13:28.480 -all right so we always start with the +i'll address that in a bit all right so we always start with the 0:13:27.839,0:13:31.600 -third - -0:13:28.480,0:13:33.600 -uh column here with overfitting uh i +third uh column here with overfitting uh i 0:13:31.600,0:13:35.920 -always want to have a model that is over - -0:13:33.600,0:13:37.760 -parameterized because it's easy to learn +always want to have a model that is over parameterized because it's easy to learn 0:13:35.920,0:13:39.120 -and also it's going to be powerful in - -0:13:37.760,0:13:40.160 -terms +and also it's going to be powerful in terms 0:13:39.120,0:13:42.240 -in the sense that it's going to be - -0:13:40.160,0:13:45.600 -learning more than what we +in the sense that it's going to be learning more than what we 0:13:42.240,0:13:47.600 -expect um and so - -0:13:45.600,0:13:48.720 -how do we deal with these overfitting +expect um and so how do we deal with these overfitting 0:13:47.600,0:13:51.199 -how do we - -0:13:48.720,0:13:52.800 -improve now the validation or tasting +how do we improve now the validation or tasting 0:13:51.199,0:13:55.120 -performances right so - -0:13:52.800,0:13:56.560 -we we said that overfitting means uh we +performances right so we we said that overfitting means uh we 0:13:55.120,0:13:57.839 -didn't say we're gonna see that next - -0:13:56.560,0:13:59.920 -slide but +didn't say we're gonna see that next slide but 0:13:57.839,0:14:01.040 -here we see how to fight this kind of - -0:13:59.920,0:14:03.279 -you know overfitting +here we see how to fight this kind of you know overfitting 0:14:01.040,0:14:04.560 -so we start from the right hand side - -0:14:03.279,0:14:06.639 -where we introduce +so we start from the right hand side where we introduce 0:14:04.560,0:14:08.320 -this weak regularizer so there is no - -0:14:06.639,0:14:11.680 -regularization +this weak regularizer so there is no regularization 0:14:08.320,0:14:12.399 -therefore the last plot the sixth plot - -0:14:11.680,0:14:15.440 -here +therefore the last plot the sixth plot here 0:14:12.399,0:14:18.560 -is the same as my third plot okay - -0:14:15.440,0:14:21.600 -then i keep uh adding some +is the same as my third plot okay then i keep uh adding some 0:14:18.560,0:14:23.360 -medium regularizer and so i - -0:14:21.600,0:14:24.959 -i like to think about this as you know +medium regularizer and so i i like to think about this as you know 0:14:23.360,0:14:29.120 -smoothing edges right so my - -0:14:24.959,0:14:31.279 -square gets around edges +smoothing edges right so my square gets around edges 0:14:29.120,0:14:34.399 -and you can tell now that this second - -0:14:31.279,0:14:37.440 -plot here is different from my second +and you can tell now that this second plot here is different from my second 0:14:34.399,0:14:39.279 -window here right so the the - -0:14:37.440,0:14:41.120 -medium regularization is different from +window here right so the the medium regularization is different from 0:14:39.279,0:14:43.199 -this just right fitting - -0:14:41.120,0:14:44.800 -as you can see there are some you know +this just right fitting as you can see there are some you know 0:14:43.199,0:14:48.079 -corners here - -0:14:44.800,0:14:48.959 -finally if you crank up this medicine +corners here finally if you crank up this medicine 0:14:48.079,0:14:50.480 -this kind of - -0:14:48.959,0:14:52.320 -you know it's like a drug you you're +this kind of you know it's like a drug you you're 0:14:50.480,0:14:55.519 -drugging you're hitting you're - -0:14:52.320,0:14:57.760 -poisoning your model for to restrict the +drugging you're hitting you're poisoning your model for to restrict the 0:14:55.519,0:14:59.199 -it's it's power then you get like a very - -0:14:57.760,0:15:01.440 -strong regularizer +it's it's power then you get like a very strong regularizer 0:14:59.199,0:15:02.720 -which gives you the the circular one - -0:15:01.440,0:15:05.440 -that's this this is my +which gives you the the circular one that's this this is my 0:15:02.720,0:15:06.480 -mental image anyhow we we gave you i - -0:15:05.440,0:15:08.480 -think i give you my +mental image anyhow we we gave you i think i give you my 0:15:06.480,0:15:10.000 -uh the big picture first and then let's - -0:15:08.480,0:15:13.040 -go on with the actual +uh the big picture first and then let's go on with the actual 0:15:10.000,0:15:13.920 -definitions right um so there are a few - -0:15:13.040,0:15:16.240 -definitions here +definitions right um so there are a few definitions here 0:15:13.920,0:15:18.079 -they are not quite equivalent but in - -0:15:16.240,0:15:21.440 -deep learning that's what we use +they are not quite equivalent but in deep learning that's what we use 0:15:18.079,0:15:24.160 -so here we go so the regularization - -0:15:21.440,0:15:25.920 -adds prior knowledge to a model a prior +so here we go so the regularization adds prior knowledge to a model a prior 0:15:24.160,0:15:27.120 -distribution is specified for the - -0:15:25.920,0:15:31.120 -parameters +distribution is specified for the parameters 0:15:27.120,0:15:34.000 -so we expect these parameters to be - -0:15:31.120,0:15:35.360 -coming from a specific distribution from +so we expect these parameters to be coming from a specific distribution from 0:15:34.000,0:15:39.040 -a specific - -0:15:35.360,0:15:41.279 -generation generating process okay +a specific generation generating process okay 0:15:39.040,0:15:43.759 -and then whenever we actually think - -0:15:41.279,0:15:46.959 -about regularization we can think about +and then whenever we actually think about regularization we can think about 0:15:43.759,0:15:49.120 -you know uh strongly assuming that these - -0:15:46.959,0:15:52.160 -parameters should be +you know uh strongly assuming that these parameters should be 0:15:49.120,0:15:54.560 -um coming from this specific - -0:15:52.160,0:15:56.639 -process that generates them okay so this +um coming from this specific process that generates them okay so this 0:15:54.560,0:15:58.639 -is talking about parameter space - -0:15:56.639,0:15:59.920 -then we can also talk about the +is talking about parameter space then we can also talk about the 0:15:58.639,0:16:02.240 -functional space - -0:15:59.920,0:16:04.959 -in this case we can be it can be seen a +functional space in this case we can be it can be seen a 0:16:02.240,0:16:08.000 -regularization is a restriction - -0:16:04.959,0:16:10.079 -of the set of possible learnable +regularization is a restriction of the set of possible learnable 0:16:08.000,0:16:11.920 -functions okay so these are again two - -0:16:10.079,0:16:14.720 -perspective one is on the weights +functions okay so these are again two perspective one is on the weights 0:16:11.920,0:16:15.199 -where how are supposed to be what kind - -0:16:14.720,0:16:17.040 -of +where how are supposed to be what kind of 0:16:15.199,0:16:18.720 -weights what kind of animals what kind - -0:16:17.040,0:16:20.800 -of objects +weights what kind of animals what kind of objects 0:16:18.720,0:16:23.680 -these weights are like they should be - -0:16:20.800,0:16:26.320 -somehow over some specific shape +these weights are like they should be somehow over some specific shape 0:16:23.680,0:16:27.759 -uh length or whatever structure there is - -0:16:26.320,0:16:30.800 -there is some structure that +uh length or whatever structure there is there is some structure that 0:16:27.759,0:16:32.959 -i assume uh in advance - -0:16:30.800,0:16:34.160 -that's the prior this means before in +i assume uh in advance that's the prior this means before in 0:16:32.959,0:16:36.240 -latin and in - -0:16:34.160,0:16:37.839 -others in another case instead if you +latin and in others in another case instead if you 0:16:36.240,0:16:39.920 -have all possible function - -0:16:37.839,0:16:41.519 -you'd like to find a restriction of +have all possible function you'd like to find a restriction of 0:16:39.920,0:16:45.279 -those possible functions - -0:16:41.519,0:16:47.759 -such that they are not too +those possible functions such that they are not too 0:16:45.279,0:16:48.399 -uh crazy okay they are not too extreme - -0:16:47.759,0:16:51.600 -as in +uh crazy okay they are not too extreme as in 0:16:48.399,0:16:54.480 -the way they behave ah - -0:16:51.600,0:16:55.600 -there's a question but in that image the +the way they behave ah there's a question but in that image the 0:16:54.480,0:16:58.959 -square is - -0:16:55.600,0:17:03.360 -still in the circle uh +square is still in the circle uh 0:16:58.959,0:17:05.760 -yeah i'm getting back - -0:17:03.360,0:17:06.400 -oh oh i see so maybe the circle should +yeah i'm getting back oh oh i see so maybe the circle should 0:17:05.760,0:17:09.520 -have been - -0:17:06.400,0:17:13.760 -smaller than the square okay +have been smaller than the square okay 0:17:09.520,0:17:16.400 -right good point um okay cool cool - -0:17:13.760,0:17:18.319 -finally that's the last definition of +right good point um okay cool cool finally that's the last definition of 0:17:16.400,0:17:20.480 -regularization which is the real - -0:17:18.319,0:17:21.760 -real deep learning part which is the +regularization which is the real real deep learning part which is the 0:17:20.480,0:17:25.520 -following which is - -0:17:21.760,0:17:28.880 -yeah kind of not it's like you know +following which is yeah kind of not it's like you know 0:17:25.520,0:17:30.880 -as a stretch - -0:17:28.880,0:17:34.160 -okay my google thinks i'm talking +as a stretch okay my google thinks i'm talking 0:17:30.880,0:17:34.160 -italian what the heck - -0:17:34.400,0:17:42.000 -okay regularization is any modification +italian what the heck okay regularization is any modification 0:17:38.880,0:17:44.640 -we make to a learning algorithm that is - -0:17:42.000,0:17:46.640 -intended to reduce its generalization +we make to a learning algorithm that is intended to reduce its generalization 0:17:44.640,0:17:48.000 -error but not its training error okay so - -0:17:46.640,0:17:51.280 -this is actually +error but not its training error okay so this is actually 0:17:48.000,0:17:53.280 -a stretch because it's no longer - -0:17:51.280,0:17:54.400 -talking about prior knowledge and +a stretch because it's no longer talking about prior knowledge and 0:17:53.280,0:17:57.360 -functional space - -0:17:54.400,0:17:59.039 -but actually modification to learning +functional space but actually modification to learning 0:17:57.360,0:18:01.520 -algorithms so this is like - -0:17:59.039,0:18:02.640 -moving towards maybe programming you +algorithms so this is like moving towards maybe programming you 0:18:01.520,0:18:06.000 -know - -0:18:02.640,0:18:08.160 -so parameters function then it's like +know so parameters function then it's like 0:18:06.000,0:18:10.160 -algorithmic implementation right so - -0:18:08.160,0:18:13.360 -these are really three different +algorithmic implementation right so these are really three different 0:18:10.160,0:18:16.400 -perspective of the same thing - -0:18:13.360,0:18:18.640 -cool so first let's start with +perspective of the same thing cool so first let's start with 0:18:16.400,0:18:19.919 -regularizing regularizing techniques a - -0:18:18.640,0:18:22.240 -few examples +regularizing regularizing techniques a few examples 0:18:19.919,0:18:24.400 -so first actually i start with xavier - -0:18:22.240,0:18:27.280 -initialization i told you before that we +so first actually i start with xavier initialization i told you before that we 0:18:24.400,0:18:29.919 -can think about these parameters as - -0:18:27.280,0:18:31.039 -coming from some generation generating +can think about these parameters as coming from some generation generating 0:18:29.919,0:18:32.960 -process right - -0:18:31.039,0:18:34.320 -so whenever you initialize a network you +process right so whenever you initialize a network you 0:18:32.960,0:18:37.600 -can choose to - -0:18:34.320,0:18:41.120 -to you can choose to select one uh +can choose to to you can choose to select one uh 0:18:37.600,0:18:44.799 -regular um one prior right so these are - -0:18:41.120,0:18:46.559 -this is defining where your um +regular um one prior right so these are this is defining where your um 0:18:44.799,0:18:48.400 -your your weights are coming from so in - -0:18:46.559,0:18:51.039 -this case we can choose xavier normal +your your weights are coming from so in this case we can choose xavier normal 0:18:48.400,0:18:55.039 -which is a initialization technique - -0:18:51.039,0:18:56.960 -and this assumes this kind of gaussian +which is a initialization technique and this assumes this kind of gaussian 0:18:55.039,0:18:58.960 -gaussian distribution right so you have - -0:18:56.960,0:19:00.480 -the weight space by weight values and +gaussian distribution right so you have the weight space by weight values and 0:18:58.960,0:19:02.240 -you know the most of them will be peaked - -0:19:00.480,0:19:03.200 -towards the zero and then you have some +you know the most of them will be peaked towards the zero and then you have some 0:19:02.240,0:19:07.280 -kind of - -0:19:03.200,0:19:09.039 -um some some kind of +kind of um some some kind of 0:19:07.280,0:19:12.320 -standard deviation that is based on the - -0:19:09.039,0:19:15.440 -size of the input and output +standard deviation that is based on the size of the input and output 0:19:12.320,0:19:16.400 -size of that specific layer and so from - -0:19:15.440,0:19:18.799 -here we can +size of that specific layer and so from here we can 0:19:16.400,0:19:20.720 -start introducing the weight decay - -0:19:18.799,0:19:21.280 -weight decay is the first regularization +start introducing the weight decay weight decay is the first regularization 0:19:20.720,0:19:23.840 -technique - -0:19:21.280,0:19:24.400 -that is widespread in machine learning +technique that is widespread in machine learning 0:19:23.840,0:19:27.280 -not - -0:19:24.400,0:19:28.320 -be not maybe too much in neural nets +not be not maybe too much in neural nets 0:19:27.280,0:19:31.200 -still relevant - -0:19:28.320,0:19:32.720 -so weight decay uh you can find it in +still relevant so weight decay uh you can find it in 0:19:31.200,0:19:35.440 -directly inside the - -0:19:32.720,0:19:35.840 -optim package like you it's a flag in +directly inside the optim package like you it's a flag in 0:19:35.440,0:19:38.320 -the - -0:19:35.840,0:19:39.039 -the different in the different optimizer +the the different in the different optimizer 0:19:38.320,0:19:41.919 -is also called - -0:19:39.039,0:19:43.039 -l2 regularization ridge regression or +is also called l2 regularization ridge regression or 0:19:41.919,0:19:44.799 -gaussian prior - -0:19:43.039,0:19:47.280 -which basically tells you that things +gaussian prior which basically tells you that things 0:19:44.799,0:19:48.799 -come from this gaussian process - -0:19:47.280,0:19:50.640 -or gaussian you know distribution +come from this gaussian process or gaussian you know distribution 0:19:48.799,0:19:53.200 -generating distribution - -0:19:50.640,0:19:54.559 -nevertheless we call it weight decay so +generating distribution nevertheless we call it weight decay so 0:19:53.200,0:19:57.840 -why do we call it weight decay - -0:19:54.559,0:19:59.360 -uh so this is first thing that you know +why do we call it weight decay uh so this is first thing that you know 0:19:57.840,0:19:59.679 -if you train neural net you're going to - -0:19:59.360,0:20:03.039 -call +if you train neural net you're going to call 0:19:59.679,0:20:05.600 -weight decay not the other things so - -0:20:03.039,0:20:07.520 -we can start with this j train that's +weight decay not the other things so we can start with this j train that's 0:20:05.600,0:20:10.559 -our objective - -0:20:07.520,0:20:12.320 -which is acting upon the parameters and +our objective which is acting upon the parameters and 0:20:10.559,0:20:14.080 -which is equal the old - -0:20:12.320,0:20:16.159 -training the the one without the +which is equal the old training the the one without the 0:20:14.080,0:20:20.240 -regularization - -0:20:16.159,0:20:22.400 -plus a penalty term like um like +regularization plus a penalty term like um like 0:20:20.240,0:20:23.440 -the following so we have the the square - -0:20:22.400,0:20:26.080 -norm +the following so we have the the square norm 0:20:23.440,0:20:27.679 -the square two norm right of these - -0:20:26.080,0:20:30.559 -parameters +the square two norm right of these parameters 0:20:27.679,0:20:30.880 -and so if you uh make the if you compute - -0:20:30.559,0:20:33.120 -the +and so if you uh make the if you compute the 0:20:30.880,0:20:34.320 -the gradient of course you're gonna get - -0:20:33.120,0:20:36.880 -just the +the gradient of course you're gonna get just the 0:20:34.320,0:20:39.760 -uh lambda theta right because the two - -0:20:36.880,0:20:42.559 -comes down simplifies you get that guy +uh lambda theta right because the two comes down simplifies you get that guy 0:20:39.760,0:20:43.200 -so if you think about this um second - -0:20:42.559,0:20:46.720 -equation +so if you think about this um second equation 0:20:43.200,0:20:50.799 -what you see you say that the theta gets - -0:20:46.720,0:20:54.080 -previous theta minus you know the +what you see you say that the theta gets previous theta minus you know the 0:20:50.799,0:20:57.600 -the minus step so like - -0:20:54.080,0:20:58.000 -minus a step towards uh the gradient +the minus step so like minus a step towards uh the gradient 0:20:57.600,0:20:59.679 -some - -0:20:58.000,0:21:01.600 -a step towards the opposite direction of +some a step towards the opposite direction of 0:20:59.679,0:21:03.360 -the gradient such that you can go - -0:21:01.600,0:21:05.280 -down the hill right in your training +the gradient such that you can go down the hill right in your training 0:21:03.360,0:21:09.039 -laws - -0:21:05.280,0:21:12.960 -minus some eta lambda which is a +laws minus some eta lambda which is a 0:21:09.039,0:21:12.960 -a scalar multiplying by - -0:21:13.120,0:21:16.720 -theta right and that means that it's +a scalar multiplying by theta right and that means that it's 0:21:15.360,0:21:17.679 -going to be you know the first part - -0:21:16.720,0:21:21.200 -there is you all go +going to be you know the first part there is you all go 0:21:17.679,0:21:21.840 -down the hill whereas the other one - -0:21:21.200,0:21:24.880 -tells you +down the hill whereas the other one tells you 0:21:21.840,0:21:27.760 -and go also towards where - -0:21:24.880,0:21:29.600 -zero right and so how does this how does +and go also towards where zero right and so how does this how does 0:21:27.760,0:21:31.679 -this look - -0:21:29.600,0:21:32.799 -so this looks like this right in every +this look so this looks like this right in every 0:21:31.679,0:21:35.600 -point so - -0:21:32.799,0:21:37.039 -consider we are already trained and the +point so consider we are already trained and the 0:21:35.600,0:21:39.280 -training loss is zero - -0:21:37.039,0:21:40.880 -and we just consider the second term +training loss is zero and we just consider the second term 0:21:39.280,0:21:44.320 -right so let's consider - -0:21:40.880,0:21:47.200 -we already finished training so +right so let's consider we already finished training so 0:21:44.320,0:21:49.919 -there is no there is not this term we - -0:21:47.200,0:21:53.600 -just have theta +there is no there is not this term we just have theta 0:21:49.919,0:21:56.640 -minus eta lambda theta what does it mean - -0:21:53.600,0:21:57.360 -so if there is no uh at the first term +minus eta lambda theta what does it mean so if there is no uh at the first term 0:21:56.640,0:22:00.240 -here - -0:21:57.360,0:22:01.520 -in any point you are so in theta you're +here in any point you are so in theta you're 0:22:00.240,0:22:04.640 -going to be subtracting - -0:22:01.520,0:22:07.679 -some multiplier some scalar you know i +going to be subtracting some multiplier some scalar you know i 0:22:04.640,0:22:08.400 -told you scalar a scalar is what scales - -0:22:07.679,0:22:12.080 -right +told you scalar a scalar is what scales right 0:22:08.400,0:22:14.400 -so this scalar scales this vector - -0:22:12.080,0:22:16.240 -probably by a factor that is slower than +so this scalar scales this vector probably by a factor that is slower than 0:22:14.400,0:22:18.559 -one and so if you're here - -0:22:16.240,0:22:19.520 -this one is going to take you down on +one and so if you're here this one is going to take you down on 0:22:18.559,0:22:21.520 -the point - -0:22:19.520,0:22:22.559 -that is connecting your head of the +the point that is connecting your head of the 0:22:21.520,0:22:27.840 -theta - -0:22:22.559,0:22:30.320 -towards zero right or this point here +theta towards zero right or this point here 0:22:27.840,0:22:31.600 -this is theta and then it takes you down - -0:22:30.320,0:22:33.520 -to zero okay +this is theta and then it takes you down to zero okay 0:22:31.600,0:22:35.760 -so if you don't have this term here and - -0:22:33.520,0:22:38.159 -you perform a few steps +so if you don't have this term here and you perform a few steps 0:22:35.760,0:22:39.039 -in this uh you know in this parameter - -0:22:38.159,0:22:41.440 -update +in this uh you know in this parameter update 0:22:39.039,0:22:43.120 -you're gonna get that the vector field - -0:22:41.440,0:22:45.200 -that you know results +you're gonna get that the vector field that you know results 0:22:43.120,0:22:47.200 -is something that attracts you towards - -0:22:45.200,0:22:49.039 -zero and that's why it's called weight +is something that attracts you towards zero and that's why it's called weight 0:22:47.200,0:22:54.720 -decay right so if you let it go - -0:22:49.039,0:22:57.200 -this stuff too it's gonna decay to zero +decay right so if you let it go this stuff too it's gonna decay to zero 0:22:54.720,0:22:57.919 -makes sense right so these are very cute - -0:22:57.200,0:23:01.919 -drawings +makes sense right so these are very cute drawings 0:22:57.919,0:23:04.159 -i think cool so - -0:23:01.919,0:23:06.159 -okay now you know about weight decay a +i think cool so okay now you know about weight decay a 0:23:04.159,0:23:08.960 -weight decay is also - -0:23:06.159,0:23:09.760 -we can think about this as adding a +weight decay is also we can think about this as adding a 0:23:08.960,0:23:11.679 -constraint - -0:23:09.760,0:23:13.760 -over the length of a vector so the +constraint over the length of a vector so the 0:23:11.679,0:23:17.039 -length of a vector is the you know - -0:23:13.760,0:23:20.159 -the the the euclidean norm +length of a vector is the you know the the the euclidean norm 0:23:17.039,0:23:21.600 -and so here we basically try to reduce - -0:23:20.159,0:23:24.480 -the length of this vector +and so here we basically try to reduce the length of this vector 0:23:21.600,0:23:26.080 -so weight decay is a way to reduce the - -0:23:24.480,0:23:30.480 -length +so weight decay is a way to reduce the length 0:23:26.080,0:23:33.840 -okay so l1 - -0:23:30.480,0:23:37.200 -what is this l1 so l1 can also be +okay so l1 what is this l1 so l1 can also be 0:23:33.840,0:23:39.120 -used as a flag in the optimizer in torch - -0:23:37.200,0:23:41.520 -it's also called lasso which is least +used as a flag in the optimizer in torch it's also called lasso which is least 0:23:39.120,0:23:46.400 -absolute shrinking selector operator - -0:23:41.520,0:23:49.520 -wow yeah statisticians whatever +absolute shrinking selector operator wow yeah statisticians whatever 0:23:46.400,0:23:52.400 -it's also called a laplacian prior - -0:23:49.520,0:23:54.880 -because it comes from a laplacian +it's also called a laplacian prior because it comes from a laplacian 0:23:52.400,0:23:58.159 -probability distribution - -0:23:54.880,0:24:00.559 -and then also it can be called as a +probability distribution and then also it can be called as a 0:23:58.159,0:24:02.720 -sparsity prior why is that so this is - -0:24:00.559,0:24:04.000 -this is pretty interesting so here in +sparsity prior why is that so this is this is pretty interesting so here in 0:24:02.720,0:24:06.960 -the bottom part - -0:24:04.000,0:24:07.679 -you can see there is the dashed line uh +the bottom part you can see there is the dashed line uh 0:24:06.960,0:24:10.799 -represent - -0:24:07.679,0:24:12.640 -my gaussian prior right and then here i +represent my gaussian prior right and then here i 0:24:10.799,0:24:14.320 -just show you the laplace what's the - -0:24:12.640,0:24:15.600 -difference with laplace laplace is the +just show you the laplace what's the difference with laplace laplace is the 0:24:14.320,0:24:17.120 -same as gaussian so you have the - -0:24:15.600,0:24:19.039 -exponential +same as gaussian so you have the exponential 0:24:17.120,0:24:20.799 -but instead of having the quadratic - -0:24:19.039,0:24:24.159 -square norm you have the +but instead of having the quadratic square norm you have the 0:24:20.799,0:24:27.200 -uh one norm okay and so the - -0:24:24.159,0:24:29.440 -the whereas the the the +uh one norm okay and so the the whereas the the the 0:24:27.200,0:24:31.039 -you know whereas the quadratic is very - -0:24:29.440,0:24:34.480 -shallow like it's very flat +you know whereas the quadratic is very shallow like it's very flat 0:24:31.039,0:24:36.720 -towards zero the the l1 is like a - -0:24:34.480,0:24:37.840 -it's a spiky right so that's why if you +towards zero the the l1 is like a it's a spiky right so that's why if you 0:24:36.720,0:24:39.679 -get the exponential - -0:24:37.840,0:24:41.200 -you get like you get a spike this is +get the exponential you get like you get a spike this is 0:24:39.679,0:24:42.720 -minus the - -0:24:41.200,0:24:44.320 -the absolute value right so you get a +minus the the absolute value right so you get a 0:24:42.720,0:24:47.279 -spike for the laplacian - -0:24:44.320,0:24:48.080 -or you get like a smooth for this square +spike for the laplacian or you get like a smooth for this square 0:24:47.279,0:24:50.000 -because you have the - -0:24:48.080,0:24:51.919 -parabola right which is smooth on the +because you have the parabola right which is smooth on the 0:24:50.000,0:24:55.360 -bottom part - -0:24:51.919,0:24:56.159 -okay so the point is that there is much +bottom part okay so the point is that there is much 0:24:55.360,0:24:59.440 -more mass - -0:24:56.159,0:25:02.400 -now in this region than +more mass now in this region than 0:24:59.440,0:25:04.320 -it was before right so this is pretty - -0:25:02.400,0:25:05.840 -this is like a spike there is much more +it was before right so this is pretty this is like a spike there is much more 0:25:04.320,0:25:08.080 -probability that you get something - -0:25:05.840,0:25:09.919 -towards zero nevertheless maybe this is +probability that you get something towards zero nevertheless maybe this is 0:25:08.080,0:25:10.640 -not too clear as an explanation so i - -0:25:09.919,0:25:13.679 -show you +not too clear as an explanation so i show you 0:25:10.640,0:25:16.080 -the second diagram so in this case - -0:25:13.679,0:25:17.919 -my training loss instead of being the +the second diagram so in this case my training loss instead of being the 0:25:16.080,0:25:18.960 -all train loss i'm going to be summing - -0:25:17.919,0:25:22.159 -lambda +all train loss i'm going to be summing lambda 0:25:18.960,0:25:24.640 -the norm 1 of my theta okay - -0:25:22.159,0:25:27.440 -therefore if you compute the gradient of +the norm 1 of my theta okay therefore if you compute the gradient of 0:25:24.640,0:25:30.559 -the l1 what do you get - -0:25:27.440,0:25:33.120 -l one is going to be +the l1 what do you get l one is going to be 0:25:30.559,0:25:33.679 -just one right if you're positive or - -0:25:33.120,0:25:37.039 -it's going to be +just one right if you're positive or it's going to be 0:25:33.679,0:25:38.320 -minus one in the sine function yeah - -0:25:37.039,0:25:41.279 -exactly +minus one in the sine function yeah exactly 0:25:38.320,0:25:42.880 -so you get it it lambda sine function - -0:25:41.279,0:25:45.440 -and so let's now think +so you get it it lambda sine function and so let's now think 0:25:42.880,0:25:47.039 -the same way what happens uh if you - -0:25:45.440,0:25:49.520 -already finished training you don't have +the same way what happens uh if you already finished training you don't have 0:25:47.039,0:25:54.880 -this term over here and you just get - -0:25:49.520,0:25:54.880 -theta minus eta lambda sine theta +this term over here and you just get theta minus eta lambda sine theta 0:25:55.039,0:26:02.559 -so if you are on the - -0:25:59.039,0:26:04.880 -on the x axis you know the the y +so if you are on the on the x axis you know the the y 0:26:02.559,0:26:07.120 -is completely doesn't have is is already - -0:26:04.880,0:26:08.960 -zero so you're going to get some +is completely doesn't have is is already zero so you're going to get some 0:26:07.120,0:26:10.480 -arrows bringing you in right so if - -0:26:08.960,0:26:11.279 -you're on the axis you're gonna get +arrows bringing you in right so if you're on the axis you're gonna get 0:26:10.480,0:26:14.559 -exactly as - -0:26:11.279,0:26:16.320 -l2 you're gonna go towards zero +exactly as l2 you're gonna go towards zero 0:26:14.559,0:26:18.320 -now what happened if you're in first - -0:26:16.320,0:26:21.919 -quadrant +now what happened if you're in first quadrant 0:26:18.320,0:26:24.799 -so in the first quadrant you get a sign - -0:26:21.919,0:26:26.240 -in both direction right scale by the +so in the first quadrant you get a sign in both direction right scale by the 0:26:24.799,0:26:30.240 -scalar factor there - -0:26:26.240,0:26:34.960 -and so it's going to be pointing down +scalar factor there and so it's going to be pointing down 0:26:30.240,0:26:38.080 -deeply so here i show you - -0:26:34.960,0:26:39.360 -the uh the the gray arrows here +deeply so here i show you the uh the the gray arrows here 0:26:38.080,0:26:41.679 -they're showing you the l2 - -0:26:39.360,0:26:45.039 -regularization which are taking you +they're showing you the l2 regularization which are taking you 0:26:41.679,0:26:47.600 -from the initial point towards zero - -0:26:45.039,0:26:48.559 -is proportional to this vector that is +from the initial point towards zero is proportional to this vector that is 0:26:47.600,0:26:52.159 -here - -0:26:48.559,0:26:54.000 -whereas the l1 which is going to be in a +here whereas the l1 which is going to be in a 0:26:52.159,0:26:57.520 -different color - -0:26:54.000,0:27:00.640 -and color green the l1 instead +different color and color green the l1 instead 0:26:57.520,0:27:03.679 -starting from here it takes you down - -0:27:00.640,0:27:04.240 -40 degrees here and then what happened +starting from here it takes you down 40 degrees here and then what happened 0:27:03.679,0:27:07.200 -here - -0:27:04.240,0:27:08.159 -well you just kill the y component right +here well you just kill the y component right 0:27:07.200,0:27:12.159 -and so - -0:27:08.159,0:27:15.200 -the l1 uh better feel +and so the l1 uh better feel 0:27:12.159,0:27:17.760 -it will quickly kill - -0:27:15.200,0:27:19.440 -components that are close to the axis +it will quickly kill components that are close to the axis 0:27:17.760,0:27:20.559 -right so if you're kind of close to the - -0:27:19.440,0:27:23.279 -axis this one bomb +right so if you're kind of close to the axis this one bomb 0:27:20.559,0:27:24.000 -takes you down to the axis in a view in - -0:27:23.279,0:27:25.679 -a +takes you down to the axis in a view in a 0:27:24.000,0:27:28.240 -few steps right and then if you still - -0:27:25.679,0:27:31.279 -apply this one you're going to go down +few steps right and then if you still apply this one you're going to go down 0:27:28.240,0:27:31.760 -the axis here right so this one allow - -0:27:31.279,0:27:33.600 -you to +the axis here right so this one allow you to 0:27:31.760,0:27:36.080 -quickly go down here and then if you - -0:27:33.600,0:27:38.240 -still apply you can shrink the length +quickly go down here and then if you still apply you can shrink the length 0:27:36.080,0:27:39.840 -but the point is that you're not looking - -0:27:38.240,0:27:43.279 -at the +but the point is that you're not looking at the 0:27:39.840,0:27:46.720 -length shrinking as in the - -0:27:43.279,0:27:50.320 -in in the l2 right so l2 was just +length shrinking as in the in in the l2 right so l2 was just 0:27:46.720,0:27:50.320 -shrinking the length of the vector - -0:27:51.760,0:27:54.880 -in the l1 instead you actually gonna +shrinking the length of the vector in the l1 instead you actually gonna 0:27:54.399,0:27:58.159 -kill - -0:27:54.880,0:28:01.600 -the components that are kind of cl +kill the components that are kind of cl 0:27:58.159,0:28:04.240 -near the axis okay so i think - -0:28:01.600,0:28:05.919 -you can clearly now understand how this +near the axis okay so i think you can clearly now understand how this 0:28:04.240,0:28:08.320 -works right so - -0:28:05.919,0:28:09.919 -uh and this actually is quite relevant +works right so uh and this actually is quite relevant 0:28:08.320,0:28:13.840 -for training - -0:28:09.919,0:28:15.520 -let's say you know our regularized +for training let's say you know our regularized 0:28:13.840,0:28:17.039 -regularized latent variable models - -0:28:15.520,0:28:17.840 -because you can you know you can think +regularized latent variable models because you can you know you can think 0:28:17.039,0:28:20.799 -about you know - -0:28:17.840,0:28:22.720 -a very quick way to regularize this this +about you know a very quick way to regularize this this 0:28:20.799,0:28:24.559 -latent virus is going to be just - -0:28:22.720,0:28:26.480 -killing some of these components such +latent virus is going to be just killing some of these components such 0:28:24.559,0:28:29.279 -that only the information is going to be - -0:28:26.480,0:28:32.799 -restricted in a few of these +that only the information is going to be restricted in a few of these 0:28:29.279,0:28:33.600 -values okay you like this stuff you like - -0:28:32.799,0:28:37.360 -the drawings +values okay you like this stuff you like the drawings 0:28:33.600,0:28:41.039 -no they're cute i think okay - -0:28:37.360,0:28:42.960 -uh okay drop out right so we we talk i +no they're cute i think okay uh okay drop out right so we we talk i 0:28:41.039,0:28:44.240 -think about dropout a few times but i - -0:28:42.960,0:28:48.000 -never show you +think about dropout a few times but i never show you 0:28:44.240,0:28:48.000 -the animation so - -0:28:48.640,0:28:52.240 -boom okay so dropout what does this +the animation so boom okay so dropout what does this 0:28:51.520,0:28:55.360 -dropout - -0:28:52.240,0:28:56.320 -do so i can show you my ninja skills in +dropout do so i can show you my ninja skills in 0:28:55.360,0:28:59.600 -powerpoint - -0:28:56.320,0:29:02.399 -and we have an infinite loop animation +powerpoint and we have an infinite loop animation 0:28:59.600,0:29:04.080 -so the input in the pink is provided to - -0:29:02.399,0:29:06.559 -the network +so the input in the pink is provided to the network 0:29:04.080,0:29:07.600 -uh and then you have that these hidden - -0:29:06.559,0:29:11.679 -layers hidden +uh and then you have that these hidden layers hidden 0:29:07.600,0:29:13.760 -neurons are sometimes set to zero - -0:29:11.679,0:29:15.279 -in this case is you have a dropping rate +neurons are sometimes set to zero in this case is you have a dropping rate 0:29:13.760,0:29:17.440 -of 0.5 so - -0:29:15.279,0:29:18.799 -half of the neurons are gonna be turned +of 0.5 so half of the neurons are gonna be turned 0:29:17.440,0:29:22.640 -to zero - -0:29:18.799,0:29:25.760 -on uh randomly during the training +to zero on uh randomly during the training 0:29:22.640,0:29:26.080 -and so what happens here is that there - -0:29:25.760,0:29:28.640 -is +and so what happens here is that there is 0:29:26.080,0:29:29.679 -no more path between the input and the - -0:29:28.640,0:29:33.120 -output +no more path between the input and the output 0:29:29.679,0:29:36.480 -that is uh you know there is no - -0:29:33.120,0:29:37.600 -learning of a singular path for input to +that is uh you know there is no learning of a singular path for input to 0:29:36.480,0:29:39.679 -output so - -0:29:37.600,0:29:41.039 -every time if you want to try to +output so every time if you want to try to 0:29:39.679,0:29:43.520 -memorize one - -0:29:41.039,0:29:45.679 -specific input you can't because every +memorize one specific input you can't because every 0:29:43.520,0:29:49.279 -time you get a different network - -0:29:45.679,0:29:53.520 -and so again this basically tell you +time you get a different network and so again this basically tell you 0:29:49.279,0:29:56.799 -uh oh scarf - -0:29:53.520,0:29:59.200 -okay so what happens here is that again +uh oh scarf okay so what happens here is that again 0:29:56.799,0:30:00.880 -before if we have like a fully connected - -0:29:59.200,0:30:03.600 -network like this +before if we have like a fully connected network like this 0:30:00.880,0:30:04.799 -you can think about oh i won't like to - -0:30:03.600,0:30:08.000 -memorize this +you can think about oh i won't like to memorize this 0:30:04.799,0:30:10.720 -neuron uh going this path and - -0:30:08.000,0:30:12.000 -then here right so you can try to +neuron uh going this path and then here right so you can try to 0:30:10.720,0:30:15.600 -memorize - -0:30:12.000,0:30:17.520 -uh some specific you know sample you get +memorize uh some specific you know sample you get 0:30:15.600,0:30:19.760 -you can memorize a specific sample in - -0:30:17.520,0:30:20.799 -this case but again if you have the net +you can memorize a specific sample in this case but again if you have the net 0:30:19.760,0:30:23.279 -product that is - -0:30:20.799,0:30:24.320 -taking off neurons sometimes sometimes +product that is taking off neurons sometimes sometimes 0:30:23.279,0:30:26.399 -this neuron here - -0:30:24.320,0:30:27.520 -on the left hand side doesn't exist +this neuron here on the left hand side doesn't exist 0:30:26.399,0:30:32.480 -right - -0:30:27.520,0:30:35.600 -and so if this one doesn't exist +right and so if this one doesn't exist 0:30:32.480,0:30:37.520 -then you cannot memorize a specific path - -0:30:35.600,0:30:39.360 -moreover you can think about this +then you cannot memorize a specific path moreover you can think about this 0:30:37.520,0:30:42.880 -dropout as - -0:30:39.360,0:30:46.799 -training a infinitely infinite +dropout as training a infinitely infinite 0:30:42.880,0:30:49.440 -number of networks that are different - -0:30:46.799,0:30:51.120 -right because every time you you drop +number of networks that are different right because every time you you drop 0:30:49.440,0:30:52.559 -some neurons you basically get a new - -0:30:51.120,0:30:55.520 -network +some neurons you basically get a new network 0:30:52.559,0:30:57.039 -uh they all share the initial kind of - -0:30:55.520,0:30:58.000 -starting position with the initial +uh they all share the initial kind of starting position with the initial 0:30:57.039,0:31:00.960 -weights - -0:30:58.000,0:31:03.200 -but then at the end whenever you use it +weights but then at the end whenever you use it 0:31:00.960,0:31:05.279 -i inference usually you turn off this - -0:31:03.200,0:31:07.360 -dropout +i inference usually you turn off this dropout 0:31:05.279,0:31:09.440 -and then you have to scale the the - -0:31:07.360,0:31:11.440 -weights right because otherwise you get +and then you have to scale the the weights right because otherwise you get 0:31:09.440,0:31:14.559 -a network that is you know - -0:31:11.440,0:31:16.159 -blowing you up this is because if you +a network that is you know blowing you up this is because if you 0:31:14.559,0:31:19.120 -have half of the neurons - -0:31:16.159,0:31:21.120 -off you know the other neurons are doing +have half of the neurons off you know the other neurons are doing 0:31:19.120,0:31:23.600 -the half of the neurons are doing - -0:31:21.120,0:31:25.440 -the whole job and if you turn everyone +the half of the neurons are doing the whole job and if you turn everyone 0:31:23.600,0:31:26.159 -on you're going to have twice as many - -0:31:25.440,0:31:30.320 -more +on you're going to have twice as many more 0:31:26.159,0:31:32.720 -uh values so so you can do two things - -0:31:30.320,0:31:34.480 -or when you actually use dropout you +uh values so so you can do two things or when you actually use dropout you 0:31:32.720,0:31:37.440 -crank up you multiply by - -0:31:34.480,0:31:40.399 -by let's say one over uh the dropping +crank up you multiply by by let's say one over uh the dropping 0:31:37.440,0:31:43.840 -rate so if you have dropping rate of 0.5 - -0:31:40.399,0:31:46.320 -you can multiply by two such that +rate so if you have dropping rate of 0.5 you can multiply by two such that 0:31:43.840,0:31:46.880 -uh your neurons are twice as powerful - -0:31:46.320,0:31:50.559 -right +uh your neurons are twice as powerful right 0:31:46.880,0:31:53.200 -twice is more powerful uh than - -0:31:50.559,0:31:54.480 -one minus 0.5 right one divided one +twice is more powerful uh than one minus 0.5 right one divided one 0:31:53.200,0:31:57.840 -minus 4.5 - -0:31:54.480,0:32:01.600 -so if you have a dropping rate of 0.1 +minus 4.5 so if you have a dropping rate of 0.1 0:31:57.840,0:32:02.080 -uh means you have 90 of your neurons - -0:32:01.600,0:32:04.240 -there +uh means you have 90 of your neurons there 0:32:02.080,0:32:05.360 -and so your neuron should be one over - -0:32:04.240,0:32:08.640 -0.9 +and so your neuron should be one over 0.9 0:32:05.360,0:32:11.200 -stronger right um - -0:32:08.640,0:32:12.080 -to be to have like the same kind of +stronger right um to be to have like the same kind of 0:32:11.200,0:32:15.440 -power right - -0:32:12.080,0:32:16.320 -in terms of values anyhow so you can +power right in terms of values anyhow so you can 0:32:15.440,0:32:19.840 -think about - -0:32:16.320,0:32:21.760 -uh drop dropout as having these multiple +think about uh drop dropout as having these multiple 0:32:19.840,0:32:23.919 -networks during training - -0:32:21.760,0:32:25.760 -but then whenever you use them at +networks during training but then whenever you use them at 0:32:23.919,0:32:28.080 -inference you turn off this dropout - -0:32:25.760,0:32:29.840 -module and you basically average out all +inference you turn off this dropout module and you basically average out all 0:32:28.080,0:32:31.360 -these performance of the singular - -0:32:29.840,0:32:33.919 -network and +these performance of the singular network and 0:32:31.360,0:32:34.799 -these allow you to get you know a much - -0:32:33.919,0:32:37.919 -better +these allow you to get you know a much better 0:32:34.799,0:32:38.640 -reduction of the noise uh which was - -0:32:37.919,0:32:41.440 -introduced +reduction of the noise uh which was introduced 0:32:38.640,0:32:42.320 -like that was arised by the the training - -0:32:41.440,0:32:43.679 -procedure +like that was arised by the the training procedure 0:32:42.320,0:32:45.760 -because again if you have you know - -0:32:43.679,0:32:47.039 -multiple experts you take the average of +because again if you have you know multiple experts you take the average of 0:32:45.760,0:32:48.000 -multiple experts you're going to get a - -0:32:47.039,0:32:50.399 -better +multiple experts you're going to get a better 0:32:48.000,0:32:53.120 -um answer because it's going to be - -0:32:50.399,0:32:56.640 -removing that kind of variability in the +um answer because it's going to be removing that kind of variability in the 0:32:53.120,0:32:59.600 -specific answer right - -0:32:56.640,0:33:01.519 -but perhaps we should keep in mind this +specific answer right but perhaps we should keep in mind this 0:32:59.600,0:33:03.360 -variability of the answers okay - -0:33:01.519,0:33:04.799 -because it can turn out quite +variability of the answers okay because it can turn out quite 0:33:03.360,0:33:08.559 -interesting - -0:33:04.799,0:33:10.880 -anyhow so dropout is amazing way to +interesting anyhow so dropout is amazing way to 0:33:08.559,0:33:11.840 -basically have an automatic model - -0:33:10.880,0:33:15.039 -averaging +basically have an automatic model averaging 0:33:11.840,0:33:18.240 -modeling assembling performance - -0:33:15.039,0:33:20.320 -cool cool cool uh is dropout a good +modeling assembling performance cool cool cool uh is dropout a good 0:33:18.240,0:33:21.200 -technique only for classification task - -0:33:20.320,0:33:24.320 -or also +technique only for classification task or also 0:33:21.200,0:33:27.760 -for other tasks as well - -0:33:24.320,0:33:29.679 -like metric learning and coding learning +for other tasks as well like metric learning and coding learning 0:33:27.760,0:33:32.640 -i would say that dropout gives you a - -0:33:29.679,0:33:32.640 -much more robust +i would say that dropout gives you a much more robust 0:33:33.360,0:33:38.000 -network a much more robust prediction - -0:33:36.000,0:33:40.159 -regardless of the task it doesn't it +network a much more robust prediction regardless of the task it doesn't it 0:33:38.000,0:33:44.000 -doesn't restrict to classification - -0:33:40.159,0:33:46.640 -you basically train uh multiple networks +doesn't restrict to classification you basically train uh multiple networks 0:33:44.000,0:33:49.600 -of reduced size right and then you - -0:33:46.640,0:33:50.880 -average out this reduced size network +of reduced size right and then you average out this reduced size network 0:33:49.600,0:33:52.799 -so although at the end you're going to - -0:33:50.880,0:33:54.320 -have a large network this large network +so although at the end you're going to have a large network this large network 0:33:52.799,0:33:58.080 -is just the - -0:33:54.320,0:34:01.360 -average of small networks performance +is just the average of small networks performance 0:33:58.080,0:34:03.600 -so and also if you think in this way - -0:34:01.360,0:34:04.880 -the small network can no longer overfit +so and also if you think in this way the small network can no longer overfit 0:34:03.600,0:34:06.960 -right because they are - -0:34:04.880,0:34:08.159 -no longer that over parameterized +right because they are no longer that over parameterized 0:34:06.960,0:34:10.879 -perhaps right - -0:34:08.159,0:34:12.480 -and so dropout allows you allows you to +perhaps right and so dropout allows you allows you to 0:34:10.879,0:34:15.359 -fight overfitting - -0:34:12.480,0:34:16.879 -with several by different you know +fight overfitting with several by different you know 0:34:15.359,0:34:20.960 -mechanisms - -0:34:16.879,0:34:23.440 -finally you can think uh if you apply +mechanisms finally you can think uh if you apply 0:34:20.960,0:34:25.040 -let's think about like uh applying - -0:34:23.440,0:34:28.159 -dropout to the input +let's think about like uh applying dropout to the input 0:34:25.040,0:34:30.240 -this is kind of uh sort of like - -0:34:28.159,0:34:31.440 -uh denoising out encoder no i mean you +this is kind of uh sort of like uh denoising out encoder no i mean you 0:34:30.240,0:34:33.919 -perturb the input - -0:34:31.440,0:34:35.839 -right in in this case and then you force +perturb the input right in in this case and then you force 0:34:33.919,0:34:39.119 -still the output to be the same - -0:34:35.839,0:34:40.399 -so if you think about that you are going +still the output to be the same so if you think about that you are going 0:34:39.119,0:34:42.720 -to be - -0:34:40.399,0:34:44.399 -insensitive to some small variations of +to be insensitive to some small variations of 0:34:42.720,0:34:47.200 -the input - -0:34:44.399,0:34:48.159 -uh which are gonna make your network +the input uh which are gonna make your network 0:34:47.200,0:34:50.879 -more robust right - -0:34:48.159,0:34:51.919 -or the same as i was as i wrote you in +more robust right or the same as i was as i wrote you in 0:34:50.879,0:34:54.560 -the midterm - -0:34:51.919,0:34:55.599 -uh how can you get a input that is you +the midterm uh how can you get a input that is you 0:34:54.560,0:34:58.320 -know annoying - -0:34:55.599,0:35:00.160 -you can find some noise in the input +know annoying you can find some noise in the input 0:34:58.320,0:35:03.359 -which is going to be - -0:35:00.160,0:35:05.680 -increasing your uh your loss right so +which is going to be increasing your uh your loss right so 0:35:03.359,0:35:07.599 -you can do some kind of adversarial - -0:35:05.680,0:35:09.599 -generation of noise and then you try to +you can do some kind of adversarial generation of noise and then you try to 0:35:07.599,0:35:13.040 -you train your network on these - -0:35:09.599,0:35:16.000 -um handcrafted samples which were +you train your network on these um handcrafted samples which were 0:35:13.040,0:35:17.760 -um corrected were like perturbed in - -0:35:16.000,0:35:20.560 -order to +um corrected were like perturbed in order to 0:35:17.760,0:35:21.680 -increase your your training loss right - -0:35:20.560,0:35:23.520 -okay so i gave you like +increase your your training loss right okay so i gave you like 0:35:21.680,0:35:27.119 -four different reasons why to use - -0:35:23.520,0:35:30.160 -dropout but then i don't use dropout +four different reasons why to use dropout but then i don't use dropout 0:35:27.119,0:35:31.520 -some not that often i actually do use it - -0:35:30.160,0:35:34.160 -for a different reason which i'm going +some not that often i actually do use it for a different reason which i'm going 0:35:31.520,0:35:38.240 -to be coming to that in a bit - -0:35:34.160,0:35:39.040 -um okay so early stopping so this is +to be coming to that in a bit um okay so early stopping so this is 0:35:38.240,0:35:42.160 -much - -0:35:39.040,0:35:42.560 -one of the most basic techniques uh if +much one of the most basic techniques uh if 0:35:42.160,0:35:45.359 -you're - -0:35:42.560,0:35:45.760 -training your model and your validation +you're training your model and your validation 0:35:45.359,0:35:48.880 -loss - -0:35:45.760,0:35:52.320 -starts starts increasing +loss starts starts increasing 0:35:48.880,0:35:54.320 -then you stop there okay - -0:35:52.320,0:35:55.920 -such that you get the lowest validation +then you stop there okay such that you get the lowest validation 0:35:54.320,0:35:57.440 -score and which - -0:35:55.920,0:35:59.520 -tells you okay you're not yet +score and which tells you okay you're not yet 0:35:57.440,0:36:02.400 -overfitting - -0:35:59.520,0:36:02.880 -uh and that basically doesn't let your +overfitting uh and that basically doesn't let your 0:36:02.400,0:36:04.880 -weights - -0:36:02.880,0:36:06.640 -grow too much right so instead of +weights grow too much right so instead of 0:36:04.880,0:36:08.640 -getting the l2 which is - -0:36:06.640,0:36:10.160 -trying not to get those weights to get +getting the l2 which is trying not to get those weights to get 0:36:08.640,0:36:13.680 -too lengthy too long - -0:36:10.160,0:36:16.960 -too long you just stop whenever they are +too lengthy too long too long you just stop whenever they are 0:36:13.680,0:36:16.960 -not yet that long right - -0:36:17.520,0:36:22.320 -uh fighting overfitting so these are +not yet that long right uh fighting overfitting so these are 0:36:20.079,0:36:25.520 -techniques that end up regularizing - -0:36:22.320,0:36:27.440 -our parameters our models but +techniques that end up regularizing our parameters our models but 0:36:25.520,0:36:29.760 -but they are not they are not - -0:36:27.440,0:36:32.480 -regularizers okay so this is important +but they are not they are not regularizers okay so this is important 0:36:29.760,0:36:33.920 -these are not regularizer although they - -0:36:32.480,0:36:37.040 -do regularize +these are not regularizer although they do regularize 0:36:33.920,0:36:39.280 -the uh network - -0:36:37.040,0:36:40.880 -okay as long as you keep this in mind we +the uh network okay as long as you keep this in mind we 0:36:39.280,0:36:43.680 -can also - -0:36:40.880,0:36:44.400 -see these uh other options but they are +can also see these uh other options but they are 0:36:43.680,0:36:46.880 -not - -0:36:44.400,0:36:47.599 -regularizing techniques right they do +not regularizing techniques right they do 0:36:46.880,0:36:50.240 -act as a - -0:36:47.599,0:36:51.200 -regularizer though first one batch +act as a regularizer though first one batch 0:36:50.240,0:36:53.440 -normalization - -0:36:51.200,0:36:54.320 -okay so we talked about this several +normalization okay so we talked about this several 0:36:53.440,0:36:56.880 -times - -0:36:54.320,0:36:57.599 -we don't know quite how it works too +times we don't know quite how it works too 0:36:56.880,0:37:00.320 -well - -0:36:57.599,0:37:01.440 -there is an article on a blog post that +well there is an article on a blog post that 0:37:00.320,0:37:03.599 -is explaining this i - -0:37:01.440,0:37:04.800 -we put the link in the optimization +is explaining this i we put the link in the optimization 0:37:03.599,0:37:06.880 -lecture - -0:37:04.800,0:37:08.079 -check it out i think it's like lecture +lecture check it out i think it's like lecture 0:37:06.880,0:37:11.359 -seven of some - -0:37:08.079,0:37:12.480 -blog post i really can't remember anyhow +seven of some blog post i really can't remember anyhow 0:37:11.359,0:37:14.720 -so the point is that - -0:37:12.480,0:37:15.520 -you reset the the mu then the mean and +so the point is that you reset the the mu then the mean and 0:37:14.720,0:37:18.079 -the - -0:37:15.520,0:37:20.079 -sigma the the sigma square the variance +the sigma the the sigma square the variance 0:37:18.079,0:37:24.720 -at each layer - -0:37:20.079,0:37:26.480 -and these allow you to +at each layer and these allow you to 0:37:24.720,0:37:29.520 -okay when you reset the mean and the - -0:37:26.480,0:37:31.760 -sigma this is based on the specific +okay when you reset the mean and the sigma this is based on the specific 0:37:29.520,0:37:33.599 -batch you have right because you compute - -0:37:31.760,0:37:36.480 -the mean and the sigma square +batch you have right because you compute the mean and the sigma square 0:37:33.599,0:37:38.800 -over the specific batch but then if you - -0:37:36.480,0:37:42.800 -actually sample uniformly from your +over the specific batch but then if you actually sample uniformly from your 0:37:38.800,0:37:44.960 -training data set you will never have - -0:37:42.800,0:37:46.480 -two identical batches right so every +training data set you will never have two identical batches right so every 0:37:44.960,0:37:50.000 -batch will have a different - -0:37:46.480,0:37:51.280 -configuration of samples therefore if +batch will have a different configuration of samples therefore if 0:37:50.000,0:37:53.520 -you compute the mean and - -0:37:51.280,0:37:54.320 -the standard deviation they will always +you compute the mean and the standard deviation they will always 0:37:53.520,0:37:56.880 -be different - -0:37:54.320,0:37:58.079 -right and therefore again i said five +be different right and therefore again i said five 0:37:56.880,0:37:59.680 -times therefore - -0:37:58.079,0:38:02.880 -you are going to be applying a different +times therefore you are going to be applying a different 0:37:59.680,0:38:04.880 -correction per batch - -0:38:02.880,0:38:06.960 -and the model will never see twice the +correction per batch and the model will never see twice the 0:38:04.880,0:38:10.560 -same input right because they are - -0:38:06.960,0:38:14.160 -altered based on where they happen to +same input right because they are altered based on where they happen to 0:38:10.560,0:38:18.000 -uh appear in your training uh procedure - -0:38:14.160,0:38:21.119 -so because you never +uh appear in your training uh procedure so because you never 0:38:18.000,0:38:23.680 -showed the same uh same input twice - -0:38:21.119,0:38:25.599 -and this is so cool uh i really like it +showed the same uh same input twice and this is so cool uh i really like it 0:38:23.680,0:38:27.040 -and that's all you need usually most of - -0:38:25.599,0:38:30.320 -the time to train your network +and that's all you need usually most of the time to train your network 0:38:27.040,0:38:32.960 -don't drop out - -0:38:30.320,0:38:33.680 -and this technique also speeds up your +don't drop out and this technique also speeds up your 0:38:32.960,0:38:36.480 -training like - -0:38:33.680,0:38:37.839 -crazy before batch norm was introduced +training like crazy before batch norm was introduced 0:38:36.480,0:38:40.880 -it was taking me i think - -0:38:37.839,0:38:44.000 -one week to train uh +it was taking me i think one week to train uh 0:38:40.880,0:38:46.160 -on imagenet i think at least - -0:38:44.000,0:38:47.280 -if it was if it wasn't a month it was +on imagenet i think at least if it was if it wasn't a month it was 0:38:46.160,0:38:50.400 -terrible i think - -0:38:47.280,0:38:51.760 -but again that's like eight years ago uh +terrible i think but again that's like eight years ago uh 0:38:50.400,0:38:54.160 -yeah it was terrible training on - -0:38:51.760,0:38:55.920 -imagenet uh with batch normalization i +yeah it was terrible training on imagenet uh with batch normalization i 0:38:54.160,0:38:59.280 -think you can train in one day so - -0:38:55.920,0:39:01.599 -that's ridiculous do you mean robust in +think you can train in one day so that's ridiculous do you mean robust in 0:38:59.280,0:39:04.000 -terms of adversarial learning as well - -0:39:01.599,0:39:05.920 -i don't understand why we don't see the +terms of adversarial learning as well i don't understand why we don't see the 0:39:04.000,0:39:09.359 -same sample twice - -0:39:05.920,0:39:12.960 -um i'm saying robust here +same sample twice um i'm saying robust here 0:39:09.359,0:39:15.920 -as in uh you're providing different - -0:39:12.960,0:39:16.720 -inputs every time because and so the +as in uh you're providing different inputs every time because and so the 0:39:15.920,0:39:19.119 -network gets - -0:39:16.720,0:39:20.160 -a better coverage what is the training +network gets a better coverage what is the training 0:39:19.119,0:39:22.560 -manifold - -0:39:20.160,0:39:24.800 -uh you don't see the same input twice +manifold uh you don't see the same input twice 0:39:22.560,0:39:27.200 -because the same input - -0:39:24.800,0:39:28.640 -based on how it appears in the in the +because the same input based on how it appears in the in the 0:39:27.200,0:39:32.000 -batch so if you appears - -0:39:28.640,0:39:35.119 -you have you know input 42 +batch so if you appears you have you know input 42 0:39:32.000,0:39:35.760 -and this input 42 happens in a given - -0:39:35.119,0:39:37.359 -batch +and this input 42 happens in a given batch 0:39:35.760,0:39:39.119 -you subtract the mean of the batch and - -0:39:37.359,0:39:41.520 -divide by the standard deviation +you subtract the mean of the batch and divide by the standard deviation 0:39:39.119,0:39:43.280 -and you get the the new you know value - -0:39:41.520,0:39:45.839 -right within the network +and you get the the new you know value right within the network 0:39:43.280,0:39:46.880 -but then if that input 42 happens in a - -0:39:45.839,0:39:48.720 -different batch +but then if that input 42 happens in a different batch 0:39:46.880,0:39:51.040 -then the mean of the different batch is - -0:39:48.720,0:39:52.960 -gonna be a different mean +then the mean of the different batch is gonna be a different mean 0:39:51.040,0:39:54.480 -and therefore you're gonna get a - -0:39:52.960,0:39:56.640 -slightly different +and therefore you're gonna get a slightly different 0:39:54.480,0:39:58.560 -input every time so you never actually - -0:39:56.640,0:40:00.880 -observe the same input because they +input every time so you never actually observe the same input because they 0:39:58.560,0:40:02.800 -happen to be packed in a different batch - -0:40:00.880,0:40:04.720 -and therefore the statistics of that +happen to be packed in a different batch and therefore the statistics of that 0:40:02.800,0:40:06.720 -specific patch - -0:40:04.720,0:40:08.560 -will be just specific to that batch and +specific patch will be just specific to that batch and 0:40:06.720,0:40:10.160 -you know it's going to change every time - -0:40:08.560,0:40:12.350 -you're going to have a different batch +you know it's going to change every time you're going to have a different batch 0:40:10.160,0:40:13.440 -so same input get a different - -0:40:12.350,0:40:16.000 -[Music] +so same input get a different [Music] 0:40:13.440,0:40:18.000 -correction let's say this way if it - -0:40:16.000,0:40:20.000 -appears in a different batch so it +correction let's say this way if it appears in a different batch so it 0:40:18.000,0:40:21.839 -you never see the same input twice so - -0:40:20.000,0:40:24.960 -this technique is all i use usually for +you never see the same input twice so this technique is all i use usually for 0:40:21.839,0:40:27.440 -training my network um - -0:40:24.960,0:40:28.880 -and it works but again recently i've +training my network um and it works but again recently i've 0:40:27.440,0:40:31.440 -been using dropout for a different - -0:40:28.880,0:40:34.960 -reason so we're gonna be +been using dropout for a different reason so we're gonna be 0:40:31.440,0:40:38.160 -um okay we are gonna see this in - -0:40:34.960,0:40:41.200 -a few minutes uh +um okay we are gonna see this in a few minutes uh 0:40:38.160,0:40:43.040 -more data of course just providing more - -0:40:41.200,0:40:44.400 -data you're gonna find all over fitting +more data of course just providing more data you're gonna find all over fitting 0:40:43.040,0:40:47.520 -but then you know - -0:40:44.400,0:40:50.000 -ding ding ding okay +but then you know ding ding ding okay 0:40:47.520,0:40:52.079 -uh finally data augmentation so data - -0:40:50.000,0:40:54.319 -augmentation is also a very valid +uh finally data augmentation so data augmentation is also a very valid 0:40:52.079,0:40:55.680 -technique in order to you know prove - -0:40:54.319,0:40:58.319 -provide some kind of +technique in order to you know prove provide some kind of 0:40:55.680,0:41:00.960 -uh deformed version of the input if - -0:40:58.319,0:41:04.839 -you're talking about images we have +uh deformed version of the input if you're talking about images we have 0:41:00.960,0:41:07.440 -center crop color jitter different crops - -0:41:04.839,0:41:08.400 -transformations like i find random +center crop color jitter different crops transformations like i find random 0:41:07.440,0:41:11.200 -transformations - -0:41:08.400,0:41:13.200 -crops random rotation horizontal flip +transformations crops random rotation horizontal flip 0:41:11.200,0:41:16.560 -right if you see myself like that and - -0:41:13.200,0:41:20.000 -you flip my face i'm still me kind of +right if you see myself like that and you flip my face i'm still me kind of 0:41:16.560,0:41:23.440 -right so uh if it's upside down well - -0:41:20.000,0:41:24.800 -maybe not quite uh nevertheless you can +right so uh if it's upside down well maybe not quite uh nevertheless you can 0:41:23.440,0:41:28.160 -see that if you - -0:41:24.800,0:41:30.000 -provide some alterations that are +see that if you provide some alterations that are 0:41:28.160,0:41:31.680 -perturbation that you are if you like to - -0:41:30.000,0:41:33.680 -be insensitive against +perturbation that you are if you like to be insensitive against 0:41:31.680,0:41:35.359 -then you can improve your performance of - -0:41:33.680,0:41:37.119 -the network which is going to be +then you can improve your performance of the network which is going to be 0:41:35.359,0:41:38.160 -learning how to be insensitive to this - -0:41:37.119,0:41:41.760 -kind of +learning how to be insensitive to this kind of 0:41:38.160,0:41:43.599 -uh you know variations - -0:41:41.760,0:41:45.520 -okay okay okay so quickly quickly +uh you know variations okay okay okay so quickly quickly 0:41:43.599,0:41:47.359 -quickly oh okay transfer learning - -0:41:45.520,0:41:48.880 -we already know about transfer learning +quickly oh okay transfer learning we already know about transfer learning 0:41:47.359,0:41:50.160 -i think but again so you get your - -0:41:48.880,0:41:51.760 -network you already trained on a +i think but again so you get your network you already trained on a 0:41:50.160,0:41:53.599 -specific task you - -0:41:51.760,0:41:54.960 -just leave the first classifier there +specific task you just leave the first classifier there 0:41:53.599,0:41:57.839 -you move everything - -0:41:54.960,0:41:58.720 -you plug a new a new classifier or +you move everything you plug a new a new classifier or 0:41:57.839,0:42:01.280 -whatever - -0:41:58.720,0:42:03.520 -and then if you have you know a few data +whatever and then if you have you know a few data 0:42:01.280,0:42:04.400 -with a similar kind of training - -0:42:03.520,0:42:06.800 -distribution +with a similar kind of training distribution 0:42:04.400,0:42:07.680 -you just do transfer learning which is - -0:42:06.800,0:42:11.040 -again +you just do transfer learning which is again 0:42:07.680,0:42:14.960 -training just the final classifier - -0:42:11.040,0:42:18.079 -uh if you have lots of data +training just the final classifier uh if you have lots of data 0:42:14.960,0:42:18.720 -you should fine-tune because you would - -0:42:18.079,0:42:21.839 -like to +you should fine-tune because you would like to 0:42:18.720,0:42:22.400 -also improve this uh the performance of - -0:42:21.839,0:42:24.960 -the +also improve this uh the performance of the 0:42:22.400,0:42:26.960 -like you would like also to tweak the uh - -0:42:24.960,0:42:29.839 -feature extractor the blue +like you would like also to tweak the uh feature extractor the blue 0:42:26.960,0:42:31.520 -the blue layers and the colors are - -0:42:29.839,0:42:32.640 -flipped here damn the hidden layer +the blue layers and the colors are flipped here damn the hidden layer 0:42:31.520,0:42:35.680 -should have been green and the - -0:42:32.640,0:42:38.079 -output blue okay +should have been green and the output blue okay 0:42:35.680,0:42:39.359 -few data and different from training or - -0:42:38.079,0:42:41.440 -you want to do early +few data and different from training or you want to do early 0:42:39.359,0:42:42.839 -uh transfer learning which means you - -0:42:41.440,0:42:46.240 -know you start +uh transfer learning which means you know you start 0:42:42.839,0:42:49.760 -changing um also you know - -0:42:46.240,0:42:50.240 -a little bit of the the of the other +changing um also you know a little bit of the the of the other 0:42:49.760,0:42:53.280 -layers - -0:42:50.240,0:42:54.800 -as well not all of them and then +layers as well not all of them and then 0:42:53.280,0:42:56.800 -yeah you want to remove a few more - -0:42:54.800,0:42:58.319 -layers actually yeah oh +yeah you want to remove a few more layers actually yeah oh 0:42:56.800,0:43:00.079 -my bad so you would like to remove a few - -0:42:58.319,0:43:01.839 -of those uh +my bad so you would like to remove a few of those uh 0:43:00.079,0:43:05.359 -final hidden layers because they are - -0:43:01.839,0:43:08.000 -kind of already specialized +final hidden layers because they are kind of already specialized 0:43:05.359,0:43:09.520 -so you want to retrain the base features - -0:43:08.000,0:43:11.040 -extractor here +so you want to retrain the base features extractor here 0:43:09.520,0:43:12.640 -and if you have lots of data which are - -0:43:11.040,0:43:16.960 -different from the training the +and if you have lots of data which are different from the training the 0:43:12.640,0:43:20.240 -distribution just train okay um - -0:43:16.960,0:43:21.200 -okay also you can use different +distribution just train okay um okay also you can use different 0:43:20.240,0:43:24.000 -learnings - -0:43:21.200,0:43:24.560 -learning rate for different layers right +learnings learning rate for different layers right 0:43:24.000,0:43:28.000 -to - -0:43:24.560,0:43:31.200 -improve performance so maybe you um +to improve performance so maybe you um 0:43:28.000,0:43:34.400 -you'd like to change um yeah - -0:43:31.200,0:43:36.400 -so you you can you can see that usually +you'd like to change um yeah so you you can you can see that usually 0:43:34.400,0:43:38.160 -these final layers are the ones that are - -0:43:36.400,0:43:39.440 -changing uh quicker because they are +these final layers are the ones that are changing uh quicker because they are 0:43:38.160,0:43:42.480 -close to the - -0:43:39.440,0:43:44.640 -uh to the loss but then again if you use +close to the uh to the loss but then again if you use 0:43:42.480,0:43:45.839 -uh batch norm all these layers are kind - -0:43:44.640,0:43:48.400 -of training the same +uh batch norm all these layers are kind of training the same 0:43:45.839,0:43:50.079 -speed otherwise again you can see - -0:43:48.400,0:43:51.119 -whether you want to change learning rate +speed otherwise again you can see whether you want to change learning rate 0:43:50.079,0:43:54.560 -maybe change - -0:43:51.119,0:43:56.480 -these guys slower or not did you say is +maybe change these guys slower or not did you say is 0:43:54.560,0:43:56.960 -the difference between transfer learning - -0:43:56.480,0:43:59.520 -and fine +the difference between transfer learning and fine 0:43:56.960,0:44:01.920 -tuning uh transfer learning i just train - -0:43:59.520,0:44:03.520 -define a classifier +tuning uh transfer learning i just train define a classifier 0:44:01.920,0:44:05.200 -because i don't have if you have few - -0:44:03.520,0:44:06.960 -data you don't have +because i don't have if you have few data you don't have 0:44:05.200,0:44:09.040 -enough you know you don't want to - -0:44:06.960,0:44:11.040 -overfit so you +enough you know you don't want to overfit so you 0:44:09.040,0:44:12.640 -if you have a few data you want to just - -0:44:11.040,0:44:14.560 -reuse the whole +if you have a few data you want to just reuse the whole 0:44:12.640,0:44:16.880 -network from the previous task and you - -0:44:14.560,0:44:18.480 -just train the final classifier +network from the previous task and you just train the final classifier 0:44:16.880,0:44:21.119 -if you have lots of data then you can - -0:44:18.480,0:44:23.760 -actually even um +if you have lots of data then you can actually even um 0:44:21.119,0:44:25.040 -try to have like some changes you can - -0:44:23.760,0:44:27.760 -also you know you can start +try to have like some changes you can also you know you can start 0:44:25.040,0:44:29.040 -you have a uh lower learning rate you - -0:44:27.760,0:44:32.560 -also change for +you have a uh lower learning rate you also change for 0:44:29.040,0:44:34.640 -this feature extractor if they are - -0:44:32.560,0:44:36.480 -similarly transfer learning you freeze +this feature extractor if they are similarly transfer learning you freeze 0:44:34.640,0:44:38.640 -the the base - -0:44:36.480,0:44:39.520 -network yeah i would say the transfer +the the base network yeah i would say the transfer 0:44:38.640,0:44:41.760 -learning you just - -0:44:39.520,0:44:43.200 -freeze the the blue guy and you just +learning you just freeze the the blue guy and you just 0:44:41.760,0:44:46.640 -train the orange - -0:44:43.200,0:44:48.800 -in uh fine tuning you actually tune +train the orange in uh fine tuning you actually tune 0:44:46.640,0:44:51.520 -all the other parameters as well maybe - -0:44:48.800,0:44:54.720 -with smaller learning rate +all the other parameters as well maybe with smaller learning rate 0:44:51.520,0:44:55.760 -this is the number 12 notebook here i'm - -0:44:54.720,0:44:59.040 -classifying the +this is the number 12 notebook here i'm classifying the 0:44:55.760,0:45:01.200 -sentiment of these reviews on the imdb - -0:44:59.040,0:45:02.839 -data set all right and so i'd like to +sentiment of these reviews on the imdb data set all right and so i'd like to 0:45:01.200,0:45:04.079 -compare different regularization - -0:45:02.839,0:45:07.200 -techniques +compare different regularization techniques 0:45:04.079,0:45:09.920 -so i'm just keeping everything because - -0:45:07.200,0:45:12.240 -i just like to show you the final result +so i'm just keeping everything because i just like to show you the final result 0:45:09.920,0:45:14.880 -let me see where is the optimizer - -0:45:12.240,0:45:16.000 -so you can toggle different things at +let me see where is the optimizer so you can toggle different things at 0:45:14.880,0:45:18.319 -the beginning we have - -0:45:16.000,0:45:20.800 -no weight decay nothing right so we +the beginning we have no weight decay nothing right so we 0:45:18.319,0:45:22.560 -train with this regularizer - -0:45:20.800,0:45:25.040 -let's check what is the model so the +train with this regularizer let's check what is the model so the 0:45:22.560,0:45:28.480 -model is just a feed forward neural net - -0:45:25.040,0:45:30.880 -which is fifo on neural net we have some +model is just a feed forward neural net which is fifo on neural net we have some 0:45:28.480,0:45:32.480 -embeddings a linear a linear - -0:45:30.880,0:45:34.480 -and then my forward is going to be +embeddings a linear a linear and then my forward is going to be 0:45:32.480,0:45:37.680 -getting my embeddings sending to the - -0:45:34.480,0:45:39.920 -forward the fully connected relo +getting my embeddings sending to the forward the fully connected relo 0:45:37.680,0:45:41.839 -uh and then you know you get the output - -0:45:39.920,0:45:43.599 -from this so second fully connected and +uh and then you know you get the output from this so second fully connected and 0:45:41.839,0:45:44.800 -i'm outputting a sigmoid because i'm - -0:45:43.599,0:45:47.680 -just doing +i'm outputting a sigmoid because i'm just doing 0:45:44.800,0:45:49.200 -um i think a two-class classification - -0:45:47.680,0:45:50.800 -problem so we'd like to figure out if +um i think a two-class classification problem so we'd like to figure out if 0:45:49.200,0:45:52.880 -it's a positive review or a negative - -0:45:50.800,0:45:56.560 -review +it's a positive review or a negative review 0:45:52.880,0:46:00.400 -um and so this is the initial training - -0:45:56.560,0:46:03.440 -and we got you know +um and so this is the initial training and we got you know 0:46:00.400,0:46:05.680 -the validation curve climbs up as crazy - -0:46:03.440,0:46:07.119 -whereas the training curve goes down to +the validation curve climbs up as crazy whereas the training curve goes down to 0:46:05.680,0:46:09.599 -zero - -0:46:07.119,0:46:11.680 -and so here you can see uh the +zero and so here you can see uh the 0:46:09.599,0:46:15.520 -validation accuracy which goes up to - -0:46:11.680,0:46:16.560 -64 more or less so and here we just +validation accuracy which goes up to 64 more or less so and here we just 0:46:15.520,0:46:19.599 -store - -0:46:16.560,0:46:21.200 -the weight of the network +store the weight of the network 0:46:19.599,0:46:23.440 -for when there is no kind of - -0:46:21.200,0:46:25.839 -regularization okay +for when there is no kind of regularization okay 0:46:23.440,0:46:28.160 -then first thing i'd like to do is going - -0:46:25.839,0:46:31.599 -to be trying to do the +then first thing i'd like to do is going to be trying to do the 0:46:28.160,0:46:34.800 -weight l1 the l1 regularization - -0:46:31.599,0:46:34.800 -so let's see how to do that +weight l1 the l1 regularization so let's see how to do that 0:46:35.520,0:46:42.079 -so l1 regularization - -0:46:38.960,0:46:44.800 -okay toggle this one to do +so l1 regularization okay toggle this one to do 0:46:42.079,0:46:45.520 -l1 regularization so here i'm extracting - -0:46:44.800,0:46:48.400 -the +l1 regularization so here i'm extracting the 0:46:45.520,0:46:49.040 -model parameters and then i'm going to - -0:46:48.400,0:46:52.160 -be adding +model parameters and then i'm going to be adding 0:46:49.040,0:46:53.839 -some term to the - -0:46:52.160,0:46:56.400 -to the loss okay so the loss is going to +some term to the to the loss okay so the loss is going to 0:46:53.839,0:46:59.680 -be some part of this - -0:46:56.400,0:47:04.079 -uh like i'm gonna sum the the one norm +be some part of this uh like i'm gonna sum the the one norm 0:46:59.680,0:47:06.319 -of the fc1 to the loss okay - -0:47:04.079,0:47:10.079 -because there is no other way to do this +of the fc1 to the loss okay because there is no other way to do this 0:47:06.319,0:47:14.880 -in a pi torch for the moment - -0:47:10.079,0:47:18.160 -okay so let me re-initialize the network +in a pi torch for the moment okay so let me re-initialize the network 0:47:14.880,0:47:21.510 -so i start here - -0:47:18.160,0:47:21.510 -[Music] +so i start here [Music] 0:47:22.079,0:47:25.760 -i get - -0:47:23.870,0:47:29.040 -[Music] +i get [Music] 0:47:25.760,0:47:31.680 -this one and then i start training here - -0:47:29.040,0:47:32.400 -so this guy is training uh how many +this one and then i start training here so this guy is training uh how many 0:47:31.680,0:47:36.000 -iterations - -0:47:32.400,0:47:38.960 -let's check 10 epochs okay one two three +iterations let's check 10 epochs okay one two three 0:47:36.000,0:47:39.760 -four five six all right so before we - -0:47:38.960,0:47:41.920 -were checking +four five six all right so before we were checking 0:47:39.760,0:47:46.160 -we can go down here we had the - -0:47:41.920,0:47:49.680 -validation accuracy was around 64. +we can go down here we had the validation accuracy was around 64. 0:47:46.160,0:47:52.000 -and now we have validation accuracy - -0:47:49.680,0:47:53.119 -went to 66 right so we actually have +and now we have validation accuracy went to 66 right so we actually have 0:47:52.000,0:47:56.240 -improved - -0:47:53.119,0:47:59.280 -the performance by getting these guys +improved the performance by getting these guys 0:47:56.240,0:48:03.119 -uh to be - -0:47:59.280,0:48:07.280 -oh it's getting down down +uh to be oh it's getting down down 0:48:03.119,0:48:09.920 -oh back up 67 looks good 68 - -0:48:07.280,0:48:11.119 -okay it's finished so i can show you in +oh back up 67 looks good 68 okay it's finished so i can show you in 0:48:09.920,0:48:14.079 -this case what happened with - -0:48:11.119,0:48:15.359 -l1 oh it's not yet finished okay it's +this case what happened with l1 oh it's not yet finished okay it's 0:48:14.079,0:48:17.760 -taking forever - -0:48:15.359,0:48:19.119 -okay while this is training okay okay +taking forever okay while this is training okay okay 0:48:17.760,0:48:20.640 -i'm gonna show you the the output of - -0:48:19.119,0:48:24.079 -this guy and then i'm gonna be +i'm gonna show you the the output of this guy and then i'm gonna be 0:48:20.640,0:48:26.640 -showing just briefly the uh - -0:48:24.079,0:48:27.599 -second usage of the dropout should we +showing just briefly the uh second usage of the dropout should we 0:48:26.640,0:48:31.680 -stop this guy - -0:48:27.599,0:48:34.000 -69 so you can see now here we are at 69 +stop this guy 69 so you can see now here we are at 69 0:48:31.680,0:48:36.640 -in validation at 30 right - -0:48:34.000,0:48:37.920 -okay cool and here you can see both the +in validation at 30 right okay cool and here you can see both the 0:48:36.640,0:48:40.319 -training and the validation - -0:48:37.920,0:48:43.040 -they are both losses they go down and +training and the validation they are both losses they go down and 0:48:40.319,0:48:46.480 -then here i show you the validation - -0:48:43.040,0:48:48.640 -which went up to 67 and 68 okay +then here i show you the validation which went up to 67 and 68 okay 0:48:46.480,0:48:50.319 -and so here i just show i gonna be - -0:48:48.640,0:48:53.280 -storing these weights +and so here i just show i gonna be storing these weights 0:48:50.319,0:48:53.280 -for the l1 - -0:48:53.520,0:48:57.760 -so here i just store this l1 over here +for the l1 so here i just store this l1 over here 0:48:56.800,0:49:01.040 -okay - -0:48:57.760,0:49:04.720 -i'm gonna go back here +okay i'm gonna go back here 0:49:01.040,0:49:08.400 -uh we are gonna be undoing - -0:49:04.720,0:49:11.599 -this one right because we don't +uh we are gonna be undoing this one right because we don't 0:49:08.400,0:49:13.040 -want uh l1 we're gonna be choosing now a - -0:49:11.599,0:49:16.480 -l2 regularizer +want uh l1 we're gonna be choosing now a l2 regularizer 0:49:13.040,0:49:18.880 -right so i can toggle this one - -0:49:16.480,0:49:19.920 -and toggle this on alright so now we +right so i can toggle this one and toggle this on alright so now we 0:49:18.880,0:49:23.920 -have - -0:49:19.920,0:49:23.920 -a weight decay of this value +have a weight decay of this value 0:49:24.240,0:49:30.720 -model i execute this one - -0:49:27.599,0:49:33.280 -and i execute these guys all right +model i execute this one and i execute these guys all right 0:49:30.720,0:49:33.760 -so while the l2 is training i'll just - -0:49:33.280,0:49:36.000 -show you +so while the l2 is training i'll just show you 0:49:33.760,0:49:37.520 -a quick uh overview about bayesian - -0:49:36.000,0:49:40.720 -neural nets +a quick uh overview about bayesian neural nets 0:49:37.520,0:49:43.520 -so estimating a predictive distribution - -0:49:40.720,0:49:44.480 -so why to care about uncertainty many +so estimating a predictive distribution so why to care about uncertainty many 0:49:43.520,0:49:46.400 -reasons - -0:49:44.480,0:49:48.240 -uh if you have a cat declassifier and +reasons uh if you have a cat declassifier and 0:49:46.400,0:49:49.839 -you show a hippopotamus - -0:49:48.240,0:49:52.000 -the network is going to tell you oh this +you show a hippopotamus the network is going to tell you oh this 0:49:49.839,0:49:53.920 -is a dog no - -0:49:52.000,0:49:56.079 -it doesn't know i cannot tell you oh +is a dog no it doesn't know i cannot tell you oh 0:49:53.920,0:49:57.920 -this is not of any of the above right - -0:49:56.079,0:49:59.440 -you can think about oh let's make a +this is not of any of the above right you can think about oh let's make a 0:49:57.920,0:50:01.920 -third category - -0:49:59.440,0:50:03.040 -but then how can you show you how can +third category but then how can you show you how can 0:50:01.920,0:50:05.599 -you show the network - -0:50:03.040,0:50:06.960 -not a cat and not a dog uh it doesn't +you show the network not a cat and not a dog uh it doesn't 0:50:05.599,0:50:10.000 -quite work like that - -0:50:06.960,0:50:10.640 -so you can't really find i mean cat is +quite work like that so you can't really find i mean cat is 0:50:10.000,0:50:13.599 -an object - -0:50:10.640,0:50:15.200 -dog is a object not a cat or not a dog +an object dog is a object not a cat or not a dog 0:50:13.599,0:50:16.319 -is not an object so you can't really - -0:50:15.200,0:50:20.240 -train your network +is not an object so you can't really train your network 0:50:16.319,0:50:22.319 -to say everything else um - -0:50:20.240,0:50:24.079 -reliability on steering control let's +to say everything else um reliability on steering control let's 0:50:22.319,0:50:25.200 -say you're training your car to steer - -0:50:24.079,0:50:27.839 -right and left +say you're training your car to steer right and left 0:50:25.200,0:50:28.960 -and then your car say steer to the right - -0:50:27.839,0:50:32.160 -okay hold on +and then your car say steer to the right okay hold on 0:50:28.960,0:50:35.839 -how certain are you about this - -0:50:32.160,0:50:38.319 -action is it is it gonna kill me right +how certain are you about this action is it is it gonna kill me right 0:50:35.839,0:50:39.359 -uh physics simulator prediction if you - -0:50:38.319,0:50:41.359 -know about +uh physics simulator prediction if you know about 0:50:39.359,0:50:42.400 -physics or physicists they always want - -0:50:41.359,0:50:44.960 -to know +physics or physicists they always want to know 0:50:42.400,0:50:46.400 -how certain you are about your value - -0:50:44.960,0:50:48.000 -right so measurements +how certain you are about your value right so measurements 0:50:46.400,0:50:49.920 -uh in physics always have you know you - -0:50:48.000,0:50:51.040 -have the value plus minus the +uh in physics always have you know you have the value plus minus the 0:50:49.920,0:50:52.800 -uncertainty - -0:50:51.040,0:50:54.800 -so you know your network should be able +uncertainty so you know your network should be able 0:50:52.800,0:50:58.000 -to tell you as well how certain - -0:50:54.800,0:50:59.920 -uh some number or what is the +to tell you as well how certain uh some number or what is the 0:50:58.000,0:51:02.480 -in the confidence interval for a - -0:50:59.920,0:51:04.559 -specific prediction +in the confidence interval for a specific prediction 0:51:02.480,0:51:06.720 -moreover you can think to use this for - -0:51:04.559,0:51:09.440 -minimizing action randomness when +moreover you can think to use this for minimizing action randomness when 0:51:06.720,0:51:10.400 -connected to a reward what the heck does - -0:51:09.440,0:51:13.040 -this mean +connected to a reward what the heck does this mean 0:51:10.400,0:51:13.839 -so if there is some uncertainty with - -0:51:13.040,0:51:15.839 -some +so if there is some uncertainty with some 0:51:13.839,0:51:17.839 -associated some to some actions you can - -0:51:15.839,0:51:20.880 -actually exploit that +associated some to some actions you can actually exploit that 0:51:17.839,0:51:24.480 -and train your model to minimize that - -0:51:20.880,0:51:25.440 -uncertainty and this is so cool because +and train your model to minimize that uncertainty and this is so cool because 0:51:24.480,0:51:27.520 -we - -0:51:25.440,0:51:29.599 -use something similar in my in our +we use something similar in my in our 0:51:27.520,0:51:32.480 -project right - -0:51:29.599,0:51:34.640 -so dropout i told you about before uh so +project right so dropout i told you about before uh so 0:51:32.480,0:51:36.559 -how this neural network dropout works - -0:51:34.640,0:51:39.040 -i just gonna be quickly going through +how this neural network dropout works i just gonna be quickly going through 0:51:36.559,0:51:42.800 -this i multiply my input and my - -0:51:39.040,0:51:47.440 -hidden layer with these random +this i multiply my input and my hidden layer with these random 0:51:42.800,0:51:47.440 -zero one masks okay and - -0:51:47.520,0:51:51.359 -you can have the activation function to +zero one masks okay and you can have the activation function to 0:51:49.520,0:51:52.720 -be some non-linearity and then here you - -0:51:51.359,0:51:55.280 -have this bernoulli +be some non-linearity and then here you have this bernoulli 0:51:52.720,0:51:56.000 -with the probability of one minus the - -0:51:55.280,0:51:58.640 -dropping out +with the probability of one minus the dropping out 0:51:56.000,0:52:00.319 -rate so this the dropping out rate and - -0:51:58.640,0:52:03.440 -then you want to scale +rate so this the dropping out rate and then you want to scale 0:52:00.319,0:52:04.800 -the delta such that you know you resize - -0:52:03.440,0:52:07.920 -the amplitude +the delta such that you know you resize the amplitude 0:52:04.800,0:52:09.200 -of those weights the training has just - -0:52:07.920,0:52:11.280 -finished so i'm gonna be +of those weights the training has just finished so i'm gonna be 0:52:09.200,0:52:12.240 -switching that i'm sorry for the context - -0:52:11.280,0:52:16.240 -switching +switching that i'm sorry for the context switching 0:52:12.240,0:52:18.000 -oh okay good call all right uh - -0:52:16.240,0:52:19.520 -calculate the variance yes someone was +oh okay good call all right uh calculate the variance yes someone was 0:52:18.000,0:52:22.000 -saying calculate the variance i know i'm - -0:52:19.520,0:52:25.200 -switching i'm sorry it's the last lesson +saying calculate the variance i know i'm switching i'm sorry it's the last lesson 0:52:22.000,0:52:28.559 -i'm making a mess okay so this is train - -0:52:25.200,0:52:32.160 -and we got 64 uh +i'm making a mess okay so this is train and we got 64 uh 0:52:28.559,0:52:32.720 -which is so these are also going both - -0:52:32.160,0:52:35.520 -down +which is so these are also going both down 0:52:32.720,0:52:38.000 -this is both the the l2 regularization - -0:52:35.520,0:52:39.200 -and before we were getting to 68 with +this is both the the l2 regularization and before we were getting to 68 with 0:52:38.000,0:52:42.640 -the l1 - -0:52:39.200,0:52:44.319 -here we get something else maybe +the l1 here we get something else maybe 0:52:42.640,0:52:46.079 -oh you can see it's still climbing right - -0:52:44.319,0:52:48.160 -so maybe i just stopped too early +oh you can see it's still climbing right so maybe i just stopped too early 0:52:46.079,0:52:50.960 -so if you keep training you're gonna get - -0:52:48.160,0:52:53.520 -a better performance +so if you keep training you're gonna get a better performance 0:52:50.960,0:52:54.640 -it's it's monotonic non-decreasing right - -0:52:53.520,0:52:57.359 -so i think +it's it's monotonic non-decreasing right so i think 0:52:54.640,0:52:59.760 -kind of so i think you can squeeze more - -0:52:57.359,0:53:03.599 -and here i'm gonna be saving +kind of so i think you can squeeze more and here i'm gonna be saving 0:52:59.760,0:53:05.839 -these weights in these l2 weights - -0:53:03.599,0:53:06.640 -okay so i saved that and the last one +these weights in these l2 weights okay so i saved that and the last one 0:53:05.839,0:53:08.720 -then i sh - -0:53:06.640,0:53:09.680 -then it's gonna be exactly the dropout +then i sh then it's gonna be exactly the dropout 0:53:08.720,0:53:13.200 -right so - -0:53:09.680,0:53:16.800 -go back here uh we turn off +right so go back here uh we turn off 0:53:13.200,0:53:19.599 -the l2 so we - -0:53:16.800,0:53:20.640 -turn off this guy we turn back the +the l2 so we turn off this guy we turn back the 0:53:19.599,0:53:22.400 -simple one - -0:53:20.640,0:53:24.720 -but then we have to go back in this +simple one but then we have to go back in this 0:53:22.400,0:53:29.599 -network we would like to turn - -0:53:24.720,0:53:34.800 -on the dropout rate +network we would like to turn on the dropout rate 0:53:29.599,0:53:34.800 -true there you go boom boom boom - -0:53:34.880,0:53:40.839 -okay is it training yeah it's training +true there you go boom boom boom okay is it training yeah it's training 0:53:38.480,0:53:43.040 -all right cool cool cool back to the - -0:53:40.839,0:53:46.720 -presentation +all right cool cool cool back to the presentation 0:53:43.040,0:53:50.559 -i i know i'm sorry i'm going over time - -0:53:46.720,0:53:50.559 -what a bad teacher +i i know i'm sorry i'm going over time what a bad teacher 0:53:51.200,0:53:54.800 -okay so this is actually what we are - -0:53:52.720,0:53:57.440 -doing the dropout part right +okay so this is actually what we are doing the dropout part right 0:53:54.800,0:53:58.000 -okay cool cool all right so this is my - -0:53:57.440,0:53:59.760 -dropout +okay cool cool all right so this is my dropout 0:53:58.000,0:54:01.520 -and i mean i mean i am basically - -0:53:59.760,0:54:03.599 -multiplying these inputs and hidden +and i mean i mean i am basically multiplying these inputs and hidden 0:54:01.520,0:54:06.400 -layers with masks - -0:54:03.599,0:54:07.040 -here you just have like a network which +layers with masks here you just have like a network which 0:54:06.400,0:54:08.880 -is trying - -0:54:07.040,0:54:10.160 -is trying to train that you know uh +is trying is trying to train that you know uh 0:54:08.880,0:54:12.880 -prediction uh - -0:54:10.160,0:54:14.720 -that is weakly prediction is like a co2 +prediction uh that is weakly prediction is like a co2 0:54:12.880,0:54:17.440 -concentration level - -0:54:14.720,0:54:18.240 -uh if you use a gaussian kernel with a +concentration level uh if you use a gaussian kernel with a 0:54:17.440,0:54:20.800 -square - -0:54:18.240,0:54:21.920 -exponential kernel you can get you know +square exponential kernel you can get you know 0:54:20.800,0:54:24.240 -after the - -0:54:21.920,0:54:26.720 -dashed line the network say that you +after the dashed line the network say that you 0:54:24.240,0:54:28.640 -know the the model says i have no clue - -0:54:26.720,0:54:30.480 -so i give you my prediction which is +know the the model says i have no clue so i give you my prediction which is 0:54:28.640,0:54:31.280 -zero but then this is my confidence - -0:54:30.480,0:54:32.880 -level +zero but then this is my confidence level 0:54:31.280,0:54:34.400 -can we do something similar with neural - -0:54:32.880,0:54:37.839 -nets yes we can +can we do something similar with neural nets yes we can 0:54:34.400,0:54:38.799 -so this is a uh uncertainty estimation - -0:54:37.839,0:54:40.880 -we're using the +so this is a uh uncertainty estimation we're using the 0:54:38.799,0:54:42.319 -reload non-linearity in the network and - -0:54:40.880,0:54:45.599 -this is instead +reload non-linearity in the network and this is instead 0:54:42.319,0:54:48.400 -using tanh which is is actually nothing - -0:54:45.599,0:54:49.920 -um if i'd like to do a binary +using tanh which is is actually nothing um if i'd like to do a binary 0:54:48.400,0:54:52.880 -classification - -0:54:49.920,0:54:53.359 -in the first case are gonna be my logic +classification in the first case are gonna be my logic 0:54:52.880,0:54:56.720 -uh - -0:54:53.359,0:54:57.280 -on this section -3 to 2.5 is the +uh on this section -3 to 2.5 is the 0:54:56.720,0:55:01.119 -training - -0:54:57.280,0:55:04.319 -training training interval and then +training training training interval and then 0:55:01.119,0:55:06.000 -if i show you if i show my network uh if - -0:55:04.319,0:55:09.040 -i ask oh what is the prediction for +if i show you if i show my network uh if i ask oh what is the prediction for 0:55:06.000,0:55:09.839 -x hat no x star if i don't use any - -0:55:09.040,0:55:12.640 -uncertainty +x hat no x star if i don't use any uncertainty 0:55:09.839,0:55:13.040 -estimation you're gonna get a very high - -0:55:12.640,0:55:15.280 -value +estimation you're gonna get a very high value 0:55:13.040,0:55:16.319 -right which is corresponding to oh this - -0:55:15.280,0:55:18.880 -is uh +right which is corresponding to oh this is uh 0:55:16.319,0:55:20.000 -one so this is my one class if i just - -0:55:18.880,0:55:22.880 -use the the +one so this is my one class if i just use the the 0:55:20.000,0:55:24.799 -white big thick line instead if you use - -0:55:22.880,0:55:27.599 -this uncertainty estimation +white big thick line instead if you use this uncertainty estimation 0:55:24.799,0:55:28.960 -you get this network to get those logics - -0:55:27.599,0:55:32.319 -here with it kind of +you get this network to get those logics here with it kind of 0:55:28.960,0:55:34.799 -you know blur - -0:55:32.319,0:55:36.880 -foggy shadow and therefore if you apply +you know blur foggy shadow and therefore if you apply 0:55:34.799,0:55:39.119 -the sigmoid you get basically - -0:55:36.880,0:55:40.559 -that to flip down from zero to one right +the sigmoid you get basically that to flip down from zero to one right 0:55:39.119,0:55:43.680 -so you - -0:55:40.559,0:55:46.839 -no longer say it's one you can say +so you no longer say it's one you can say 0:55:43.680,0:55:48.000 -it's one with some specific probability - -0:55:46.839,0:55:51.839 -right +it's one with some specific probability right 0:55:48.000,0:55:53.440 -um and here i'm showing you a network - -0:55:51.839,0:55:54.079 -that is trying to it was trained on +um and here i'm showing you a network that is trying to it was trained on 0:55:53.440,0:55:56.079 -ammunite - -0:55:54.079,0:55:57.680 -and then you provide a one that is you +ammunite and then you provide a one that is you 0:55:56.079,0:55:59.839 -know tilting - -0:55:57.680,0:56:00.880 -and then you can see that it begins with +know tilting and then you can see that it begins with 0:55:59.839,0:56:03.760 -having a high - -0:56:00.880,0:56:04.160 -value for the logits for the purple for +having a high value for the logits for the purple for 0:56:03.760,0:56:06.400 -the - -0:56:04.160,0:56:08.400 -for the one and then as you move across +the for the one and then as you move across 0:56:06.400,0:56:08.960 -it becomes like a five and then becomes - -0:56:08.400,0:56:11.839 -a seven +it becomes like a five and then becomes a seven 0:56:08.960,0:56:12.240 -because it looks like some part of the - -0:56:11.839,0:56:15.280 -one +because it looks like some part of the one 0:56:12.240,0:56:16.240 -like some part of the seven right and - -0:56:15.280,0:56:19.680 -these are the output +like some part of the seven right and these are the output 0:56:16.240,0:56:22.799 -after the uh soft arc max so you see - -0:56:19.680,0:56:25.920 -that uh you know after you tilt they get +after the uh soft arc max so you see that uh you know after you tilt they get 0:56:22.799,0:56:26.640 -very blur and very spread around so how - -0:56:25.920,0:56:28.799 -can we +very blur and very spread around so how can we 0:56:26.640,0:56:30.079 -have something like that and this is the - -0:56:28.799,0:56:33.119 -other notebook +have something like that and this is the other notebook 0:56:30.079,0:56:36.799 -so we are done here with the - -0:56:33.119,0:56:38.640 -regularization let me give you the final +so we are done here with the regularization let me give you the final 0:56:36.799,0:56:40.799 -thing so here we can see with the - -0:56:38.640,0:56:41.599 -dropout you always have the validation +thing so here we can see with the dropout you always have the validation 0:56:40.799,0:56:43.680 -and train - -0:56:41.599,0:56:44.880 -curves they are one on the other and +and train curves they are one on the other and 0:56:43.680,0:56:47.760 -then this was the - -0:56:44.880,0:56:48.480 -l2 regularization i can execute this +then this was the l2 regularization i can execute this 0:56:47.760,0:56:50.480 -other one - -0:56:48.480,0:56:52.400 -which shows you also that this is keep +other one which shows you also that this is keep 0:56:50.480,0:56:53.760 -increasing right so although the model - -0:56:52.400,0:56:56.000 -is over parameterized we are not +increasing right so although the model is over parameterized we are not 0:56:53.760,0:56:58.640 -overfitting which was the case - -0:56:56.000,0:56:59.599 -uh at the beginning finally here let's +overfitting which was the case uh at the beginning finally here let's 0:56:58.640,0:57:02.640 -store - -0:56:59.599,0:57:05.839 -these weights in the dropout version +store these weights in the dropout version 0:57:02.640,0:57:08.480 -okay so i save all of them - -0:57:05.839,0:57:09.440 -uh and so i can start showing you a few +okay so i save all of them uh and so i can start showing you a few 0:57:08.480,0:57:12.480 -things - -0:57:09.440,0:57:15.760 -um for example this one +things um for example this one 0:57:12.480,0:57:19.040 -let's see if it works boom - -0:57:15.760,0:57:22.400 -so here you can see that the red +let's see if it works boom so here you can see that the red 0:57:19.040,0:57:23.119 -are the l1 and the red one are basically - -0:57:22.400,0:57:26.000 -all +are the l1 and the red one are basically all 0:57:23.119,0:57:26.400 -in the center bam and all the other reds - -0:57:26.000,0:57:28.799 -are +in the center bam and all the other reds are 0:57:26.400,0:57:30.240 -to zero right so n1 i just show you the - -0:57:28.799,0:57:31.680 -histogram of the weights +to zero right so n1 i just show you the histogram of the weights 0:57:30.240,0:57:34.079 -when i train the network with the l1 - -0:57:31.680,0:57:35.520 -regularizer you get all of these are +when i train the network with the l1 regularizer you get all of these are 0:57:34.079,0:57:38.720 -here - -0:57:35.520,0:57:40.319 -in the purple case you actually have +here in the purple case you actually have 0:57:38.720,0:57:42.240 -it looks like it's higher i'm not - -0:57:40.319,0:57:46.720 -entirely sure +it looks like it's higher i'm not entirely sure 0:57:42.240,0:57:49.680 -why you have a higher peak at zero in l2 - -0:57:46.720,0:57:51.680 -but then the purple one have some values +why you have a higher peak at zero in l2 but then the purple one have some values 0:57:49.680,0:57:53.440 -as well here in the tails - -0:57:51.680,0:57:55.040 -whereas if there is no regularization +as well here in the tails whereas if there is no regularization 0:57:53.440,0:57:56.480 -you get something that is you know - -0:57:55.040,0:58:00.319 -resembling a much +you get something that is you know resembling a much 0:57:56.480,0:58:02.640 -spread a much spread - -0:58:00.319,0:58:03.839 -gaussian right so you get values that +spread a much spread gaussian right so you get values that 0:58:02.640,0:58:07.040 -are much much more - -0:58:03.839,0:58:09.760 -much larger okay instead the l1 +are much much more much larger okay instead the l1 0:58:07.040,0:58:10.240 -should be all towards you know very very - -0:58:09.760,0:58:12.640 -short +should be all towards you know very very short 0:58:10.240,0:58:14.400 -again i'm not sure why this purple is - -0:58:12.640,0:58:16.319 -taller than the the red here i think +again i'm not sure why this purple is taller than the the red here i think 0:58:14.400,0:58:19.520 -it's an issue - -0:58:16.319,0:58:23.440 -so this i i show you the the the weights +it's an issue so this i i show you the the the weights 0:58:19.520,0:58:26.799 -we can show lastly last individual one - -0:58:23.440,0:58:29.760 -l1 so l1 all are here +we can show lastly last individual one l1 so l1 all are here 0:58:26.799,0:58:29.760 -and this is - -0:58:29.839,0:58:33.359 -these are instead the one with nothing +and this is these are instead the one with nothing 0:58:31.440,0:58:36.880 -right so these are - -0:58:33.359,0:58:39.680 -the one without the regularization +right so these are the one without the regularization 0:58:36.880,0:58:41.839 -and these are the one with the l1 - -0:58:39.680,0:58:44.559 -regularization +and these are the one with the l1 regularization 0:58:41.839,0:58:46.540 -we can also have more bins to have i bet - -0:58:44.559,0:58:48.839 -a better understanding of what's going +we can also have more bins to have i bet a better understanding of what's going 0:58:46.540,0:58:50.000 -[Music] - -0:58:48.839,0:58:53.440 -on +[Music] on 0:58:50.000,0:58:56.559 -okay see boom fantastic right - -0:58:53.440,0:59:02.079 -i can show you also the weights +okay see boom fantastic right i can show you also the weights 0:58:56.559,0:59:02.079 -l2 l2 - -0:59:02.240,0:59:06.240 -l2 and l1 oh you can tell no what's the +l2 l2 l2 and l1 oh you can tell no what's the 0:59:04.559,0:59:07.839 -difference - -0:59:06.240,0:59:11.280 -but again there are a hundred thousand a +difference but again there are a hundred thousand a 0:59:07.839,0:59:11.280 -hundred thousand uh - -0:59:12.160,0:59:17.760 -not entirely sure but in the point the +hundred thousand uh not entirely sure but in the point the 0:59:15.839,0:59:19.040 -point is that in the l1 in the l1 you - -0:59:17.760,0:59:22.319 -have so many more weights +point is that in the l1 in the l1 you have so many more weights 0:59:19.040,0:59:25.760 -a cluster at the zero - -0:59:22.319,0:59:28.480 -but there are a few larger weights +a cluster at the zero but there are a few larger weights 0:59:25.760,0:59:28.960 -in the l2 you have all the weights are - -0:59:28.480,0:59:31.119 -pretty +in the l2 you have all the weights are pretty 0:59:28.960,0:59:32.559 -small can you see right there is no - -0:59:31.119,0:59:35.440 -large weights +small can you see right there is no large weights 0:59:32.559,0:59:36.160 -so l1 doesn't shrink the weight l1 just - -0:59:35.440,0:59:38.240 -get them +so l1 doesn't shrink the weight l1 just get them 0:59:36.160,0:59:39.760 -towards zero okay that's why you had - -0:59:38.240,0:59:43.520 -this big guy here +towards zero okay that's why you had this big guy here 0:59:39.760,0:59:48.720 -boom okay - -0:59:43.520,0:59:52.160 -um finally i know i'm over time +boom okay um finally i know i'm over time 0:59:48.720,0:59:57.520 -the last notebook which is the - -0:59:52.160,1:00:01.200 -one that is computing the uncertainty +the last notebook which is the one that is computing the uncertainty 0:59:57.520,1:00:05.839 -uh through user usage of the - -1:00:01.200,1:00:05.839 -dropout right so kernel execute all +uh through user usage of the dropout right so kernel execute all 1:00:06.160,1:00:09.760 -uh where is it run all - -1:00:08.400,1:00:11.760 -[Music] +uh where is it run all [Music] 1:00:09.760,1:00:13.200 -so what are we doing here how do we - -1:00:11.760,1:00:15.680 -compute the uncertainty +so what are we doing here how do we compute the uncertainty 1:00:13.200,1:00:17.520 -in the previous uh in the in the in the - -1:00:15.680,1:00:20.319 -previous +in the previous uh in the in the in the previous 1:00:17.520,1:00:22.000 -uh in the previous lesson right in the - -1:00:20.319,1:00:24.400 -slides i just showed you +uh in the previous lesson right in the slides i just showed you 1:00:22.000,1:00:25.440 -so here we have some points i try to fit - -1:00:24.400,1:00:28.160 -them +so here we have some points i try to fit them 1:00:25.440,1:00:28.960 -with my network and you get something - -1:00:28.160,1:00:32.079 -like this +with my network and you get something like this 1:00:28.960,1:00:32.640 -can you tell me what network i used what - -1:00:32.079,1:00:35.280 -is the +can you tell me what network i used what is the 1:00:32.640,1:00:37.599 -uh where is the chat can you tell what - -1:00:35.280,1:00:37.599 -is the +uh where is the chat can you tell what is the 1:00:37.680,1:00:41.680 -non-linearity i used you should know - -1:00:40.079,1:00:44.799 -right +non-linearity i used you should know right 1:00:41.680,1:00:44.799 -you don't answer answer - -1:00:44.880,1:00:52.400 -okay um and so here yeah +you don't answer answer okay um and so here yeah 1:00:49.040,1:00:55.119 -and then here i show you how - -1:00:52.400,1:00:55.440 -this uncertainty looks okay so what is +and then here i show you how this uncertainty looks okay so what is 1:00:55.119,1:00:58.240 -this - -1:00:55.440,1:01:01.040 -this i'm using the uh the network with +this this i'm using the uh the network with 1:00:58.240,1:01:03.280 -the dropout and then i actually don't - -1:01:01.040,1:01:04.079 -use the evaluation mode i just use the +the dropout and then i actually don't use the evaluation mode i just use the 1:01:03.280,1:01:06.079 -training mode - -1:01:04.079,1:01:07.440 -such that the dropout is still on and +training mode such that the dropout is still on and 1:01:06.079,1:01:10.319 -then i compute the variance - -1:01:07.440,1:01:11.680 -of the predictions of the network by +then i compute the variance of the predictions of the network by 1:01:10.319,1:01:14.480 -sending multiple times - -1:01:11.680,1:01:15.040 -the data through okay so here you have +sending multiple times the data through okay so here you have 1:01:14.480,1:01:18.000 -range - -1:01:15.040,1:01:18.559 -in hundred you know i just provide 100 +range in hundred you know i just provide 100 1:01:18.000,1:01:22.160 -times - -1:01:18.559,1:01:24.160 -my data inside the network okay so this +times my data inside the network okay so this 1:01:22.160,1:01:26.319 -is a network with the relu - -1:01:24.160,1:01:27.920 -let me show you how a network with it +is a network with the relu let me show you how a network with it 1:01:26.319,1:01:31.040 -hyperbolic dungeon - -1:01:27.920,1:01:34.000 -works so oh yeah +hyperbolic dungeon works so oh yeah 1:01:31.040,1:01:36.559 -let me kill this one so here i create - -1:01:34.000,1:01:36.559 -the network +let me kill this one so here i create the network 1:01:37.680,1:01:41.040 -and this is the network train with the - -1:01:39.760,1:01:44.240 -hyperbolic tangent +and this is the network train with the hyperbolic tangent 1:01:41.040,1:01:44.720 -such it's much nicer right and then i - -1:01:44.240,1:01:47.040 -show you +such it's much nicer right and then i show you 1:01:44.720,1:01:48.079 -the network is in train mode right but - -1:01:47.040,1:01:50.559 -then i i feed +the network is in train mode right but then i i feed 1:01:48.079,1:01:51.359 -several times i feed 100 times my data - -1:01:50.559,1:01:54.240 -points +several times i feed 100 times my data points 1:01:51.359,1:01:56.000 -inside and then i evaluate the mean you - -1:01:54.240,1:01:58.880 -can see now +inside and then i evaluate the mean you can see now 1:01:56.000,1:02:01.119 -that the network mean the network - -1:01:58.880,1:02:04.240 -outputs a uncertainty which is constant +that the network mean the network outputs a uncertainty which is constant 1:02:01.119,1:02:06.960 -even if you move outside this - -1:02:04.240,1:02:07.920 -interval which was the region where the +even if you move outside this interval which was the region where the 1:02:06.960,1:02:10.000 -training data - -1:02:07.920,1:02:12.000 -were coming so you can see now that +training data were coming so you can see now that 1:02:10.000,1:02:12.880 -these uncertainty estimation are a bit - -1:02:12.000,1:02:15.359 -you know funky +these uncertainty estimation are a bit you know funky 1:02:12.880,1:02:17.119 -as in different activation functions - -1:02:15.359,1:02:19.039 -give you different kind of estimation +as in different activation functions give you different kind of estimation 1:02:17.119,1:02:22.559 -they are not even calibrated - -1:02:19.039,1:02:24.079 -nevertheless you have the uncertainty +they are not even calibrated nevertheless you have the uncertainty 1:02:22.559,1:02:26.240 -close to the data points - -1:02:24.079,1:02:28.240 -it's very very very tiny right so you +close to the data points it's very very very tiny right so you 1:02:26.240,1:02:31.039 -can tell how far you are - -1:02:28.240,1:02:32.000 -from the training region and we use this +can tell how far you are from the training region and we use this 1:02:31.039,1:02:35.520 -this trick here - -1:02:32.000,1:02:38.079 -this this this part in order to +this trick here this this this part in order to 1:02:35.520,1:02:39.039 -so again this variance here is like it's - -1:02:38.079,1:02:40.880 -a it's a +so again this variance here is like it's a it's a 1:02:39.039,1:02:42.799 -differentiable function and so you can - -1:02:40.880,1:02:44.960 -run gradient descent +differentiable function and so you can run gradient descent 1:02:42.799,1:02:46.000 -right in this in order to minimize the - -1:02:44.960,1:02:48.240 -variance +right in this in order to minimize the variance 1:02:46.000,1:02:49.920 -and this would allow you to move towards - -1:02:48.240,1:02:53.520 -the region +and this would allow you to move towards the region 1:02:49.920,1:02:53.839 -where the uh where the uh data points - -1:02:53.520,1:02:56.000 -where +where the uh where the uh data points where 1:02:53.839,1:02:58.079 -basically the the training region this - -1:02:56.000,1:03:00.960 -this is what we use for the +basically the the training region this this is what we use for the 1:02:58.079,1:03:02.319 -our policy right in our uh driving - -1:03:00.960,1:03:05.440 -scenario +our policy right in our uh driving scenario 1:03:02.319,1:03:08.079 -so oh - -1:03:05.440,1:03:09.920 -that was it right uh we reached the end +so oh that was it right uh we reached the end 1:03:08.079,1:03:13.440 -of the class the end of the semester - -1:03:09.920,1:03:16.720 -uh it was such a great honor to be +of the class the end of the semester uh it was such a great honor to be 1:03:13.440,1:03:18.880 -your teacher for this semester i - -1:03:16.720,1:03:20.160 -screw up a little bit maybe halfway +your teacher for this semester i screw up a little bit maybe halfway 1:03:18.880,1:03:22.720 -through - -1:03:20.160,1:03:23.520 -thank you for you know helping me +through thank you for you know helping me 1:03:22.720,1:03:27.280 -getting back - -1:03:23.520,1:03:28.799 -uh on my feet uh +getting back uh on my feet uh 1:03:27.280,1:03:30.400 -if you need anything right really - -1:03:28.799,1:03:34.319 -anything just let me know i +if you need anything right really anything just let me know i 1:03:30.400,1:03:34.720 -i'm always open to discuss and help out - -1:03:34.319,1:03:37.280 -and +i'm always open to discuss and help out and 1:03:34.720,1:03:38.319 -explain and again as i told you before - -1:03:37.280,1:03:40.559 -we can even think +explain and again as i told you before we can even think 1:03:38.319,1:03:42.240 -to have one more extra lesson in a month - -1:03:40.559,1:03:45.440 -time if you want +to have one more extra lesson in a month time if you want 1:03:42.240,1:03:49.520 -the same way zoom and whatever uh - -1:03:45.440,1:03:52.000 -we about the energy based models um +the same way zoom and whatever uh we about the energy based models um 1:03:49.520,1:03:53.440 -again if you have any question about all - -1:03:52.000,1:03:56.319 -any of the lessons you can +again if you have any question about all any of the lessons you can 1:03:53.440,1:03:57.520 -write on youtube in the comments below i - -1:03:56.319,1:04:00.000 -will answer +write on youtube in the comments below i will answer 1:03:57.520,1:04:01.200 -if you have like specific uh if you are - -1:04:00.000,1:04:03.680 -interested in making +if you have like specific uh if you are interested in making 1:04:01.200,1:04:04.640 -drawings and visualization uh you can - -1:04:03.680,1:04:06.240 -always +drawings and visualization uh you can always 1:04:04.640,1:04:08.079 -actually should talk to me because i'm - -1:04:06.240,1:04:10.400 -actually uh +actually should talk to me because i'm actually uh 1:04:08.079,1:04:12.079 -creating a group for visualizing machine - -1:04:10.400,1:04:16.720 -learning stuff +creating a group for visualizing machine learning stuff 1:04:12.079,1:04:18.960 -um and we have the website we have - -1:04:16.720,1:04:21.119 -plenty of things to do english has to be +um and we have the website we have plenty of things to do english has to be 1:04:18.960,1:04:24.319 -fixed in many of the - -1:04:21.119,1:04:27.280 -uh in many of the of the of the +fixed in many of the uh in many of the of the of the 1:04:24.319,1:04:29.039 -contributions some math is broken and - -1:04:27.280,1:04:32.319 -you know there is plenty of +contributions some math is broken and you know there is plenty of 1:04:29.039,1:04:33.680 -things uh open source things to do if - -1:04:32.319,1:04:37.200 -you are +things uh open source things to do if you are 1:04:33.680,1:04:37.200 -inclined if you are interested and - -1:04:37.280,1:04:44.960 -and yeah i think pretty much that's it +inclined if you are interested and and yeah i think pretty much that's it 1:04:41.440,1:04:47.039 -um i i'll see you next monday right - -1:04:44.960,1:04:49.359 -again you should submit the three video +um i i'll see you next monday right again you should submit the three video 1:04:47.039,1:04:51.039 -presentation i made a um - -1:04:49.359,1:04:54.079 -i made a tutorial about how to make a +presentation i made a um i made a tutorial about how to make a 1:04:51.039,1:04:56.079 -presentation if you like how i teach and - -1:04:54.079,1:04:58.319 -you may want to hear my opinion about +presentation if you like how i teach and you may want to hear my opinion about 1:04:56.079,1:05:01.839 -how you should present your work - -1:04:58.319,1:05:05.760 -uh it's on again on youtube +how you should present your work uh it's on again on youtube 1:05:01.839,1:05:09.039 -and yeah i think that's it all right - -1:05:05.760,1:05:12.799 -so again thank you so much and +and yeah i think that's it all right so again thank you so much and 1:05:09.039,1:05:13.920 -i can't wait to see all your results for - -1:05:12.799,1:05:16.010 -the +i can't wait to see all your results for the 1:05:13.920,1:05:17.440 -for for the project um - -1:05:16.010,1:05:22.559 -[Music] +for for the project um [Music] 1:05:17.440,1:05:25.680 -see you on monday good luck bye - -1:05:22.559,1:05:28.319 -about the class ah [ __ ] there was one +see you on monday good luck bye about the class ah [ __ ] there was one 1:05:25.680,1:05:28.319 -more notebook - -1:05:28.480,1:05:31.680 -damn okay +more notebook damn okay 1:05:32.160,1:05:36.640 -okay let me ah okay i can't go over i'm - -1:05:34.880,1:05:37.599 -too late right in the extra and there is +okay let me ah okay i can't go over i'm too late right in the extra and there is 1:05:36.640,1:05:41.200 -one more notebook - -1:05:37.599,1:05:44.720 -i wanted to talk about which is the +one more notebook i wanted to talk about which is the 1:05:41.200,1:05:47.839 -so this is the projection notebook yeah - -1:05:44.720,1:05:47.839 -okay so +so this is the projection notebook yeah okay so 1:05:48.240,1:05:52.799 -ah okay maybe we can do an extra lesson - -1:05:50.640,1:05:53.839 -with the projection uh and i talk about +ah okay maybe we can do an extra lesson with the projection uh and i talk about 1:05:52.799,1:05:57.039 -this next week - -1:05:53.839,1:06:01.839 -up to you guys more questions i know i +this next week up to you guys more questions i know i 1:05:57.039,1:06:01.839 -it's late and uh there was this notebook - -1:06:02.839,1:06:06.079 -it's +it's late and uh there was this notebook it's 1:06:04.160,1:06:08.319 -okay yeah you know i want to be teaching - -1:06:06.079,1:06:08.319 -more +okay yeah you know i want to be teaching more 1:06:09.280,1:06:14.079 -okay no no question there is a question - -1:06:12.240,1:06:16.559 -google uses +okay no no question there is a question google uses 1:06:14.079,1:06:17.680 -visor to select either parameters for - -1:06:16.559,1:06:20.720 -its neural for +visor to select either parameters for its neural for 1:06:17.680,1:06:22.480 -its networks those tend to be either - -1:06:20.720,1:06:24.799 -random search or gaussian process for +its networks those tend to be either random search or gaussian process for 1:06:22.480,1:06:27.920 -hyper parameter optimize exactly - -1:06:24.799,1:06:29.359 -uh yeah but i haven't worked like i +hyper parameter optimize exactly uh yeah but i haven't worked like i 1:06:27.920,1:06:32.720 -haven't tried them out so i can't - -1:06:29.359,1:06:36.319 -really give you a opinion so i +haven't tried them out so i can't really give you a opinion so i 1:06:32.720,1:06:36.319 -i know they exist but i'm not - -1:06:36.400,1:06:40.559 -i don't exactly know everything yet +i know they exist but i'm not i don't exactly know everything yet 1:06:41.839,1:06:48.720 -okay uh i think that's it right - -1:06:45.280,1:06:51.839 -okay so see you monday thanks yeah +okay uh i think that's it right okay so see you monday thanks yeah 1:06:48.720,1:06:53.839 -of course boy post - -1:06:51.839,1:06:55.680 -a lasagna oh i put the i put the lemon +of course boy post a lasagna oh i put the i put the lemon 1:06:53.839,1:06:57.280 -cake - -1:06:55.680,1:06:59.280 -right keep the teaching going yeah +cake right keep the teaching going yeah 1:06:57.280,1:07:01.359 -that's for sure - -1:06:59.280,1:07:02.720 -i think we are there jan is teaching +that's for sure i think we are there jan is teaching 1:07:01.359,1:07:06.079 -also in the in the fall - -1:07:02.720,1:07:08.160 -actually jan and jung are pairing up +also in the in the fall actually jan and jung are pairing up 1:07:06.079,1:07:10.000 -and they are teaching in the fall and i - -1:07:08.160,1:07:12.799 -will be also teaching the labs +and they are teaching in the fall and i will be also teaching the labs 1:07:10.000,1:07:14.079 -but i don't know we haven't yet - -1:07:12.799,1:07:17.839 -discussed the content +but i don't know we haven't yet discussed the content 1:07:14.079,1:07:18.319 -i'm like oh boy more teaching but it's - -1:07:17.839,1:07:22.559 -fun +i'm like oh boy more teaching but it's fun 1:07:18.319,1:07:24.880 -but okay - -1:07:22.559,1:07:24.880 -bye +but okay bye 1:07:26.030,1:07:31.039 -[Music] - -1:07:27.520,1:07:34.720 -okay so i think +[Music] okay so i think 1:07:31.039,1:07:36.000 -that was it for today unless there are - -1:07:34.720,1:07:38.880 -some questions for me +that was it for today unless there are some questions for me 1:07:36.000,1:07:39.520 -for jan uh i know you send me emails i - -1:07:38.880,1:07:42.799 -have +for jan uh i know you send me emails i have 1:07:39.520,1:07:43.119 -a few i think a few hundred emails from - -1:07:42.799,1:07:46.799 -you +a few i think a few hundred emails from you 1:07:43.119,1:07:49.440 -i will answer uh - -1:07:46.799,1:07:50.960 -i will answer don't worry uh don't don't +i will answer uh i will answer don't worry uh don't don't 1:07:49.440,1:07:52.319 -don't worry too much we can figure out - -1:07:50.960,1:07:53.440 -what's happening right don't don't freak +don't worry too much we can figure out what's happening right don't don't freak 1:07:52.319,1:07:55.359 -out - -1:07:53.440,1:07:57.119 -as i told you before we can have an +out as i told you before we can have an 1:07:55.359,1:07:59.760 -extra lesson in one month - -1:07:57.119,1:08:00.400 -for the energy based models uh whenever +extra lesson in one month for the energy based models uh whenever 1:07:59.760,1:08:03.520 -i'm done - -1:08:00.400,1:08:04.000 -preparing it uh again this is like up to +i'm done preparing it uh again this is like up to 1:08:03.520,1:08:06.319 -you - -1:08:04.000,1:08:08.079 -voluntary it's not it's completely off +you voluntary it's not it's completely off 1:08:06.319,1:08:10.000 -class right it's like - -1:08:08.079,1:08:11.520 -i was thinking that it makes sense since +class right it's like i was thinking that it makes sense since 1:08:10.000,1:08:13.280 -someone asked to - -1:08:11.520,1:08:15.520 -create like a lab for the energy based +someone asked to create like a lab for the energy based 1:08:13.280,1:08:18.880 -model and i said yes well i - -1:08:15.520,1:08:20.960 -i always keep my word so uh i didn't +model and i said yes well i i always keep my word so uh i didn't 1:08:18.880,1:08:24.080 -manage to do it on time but you know - -1:08:20.960,1:08:27.520 -i will do i will work for this +manage to do it on time but you know i will do i will work for this 1:08:24.080,1:08:27.520 -um questions - -1:08:28.080,1:08:31.600 -nope all right so it was has been an +um questions nope all right so it was has been an 1:08:30.319,1:08:35.279 -honor uh seriously - -1:08:31.600,1:08:37.600 -i i loved being uh been teaching +honor uh seriously i i loved being uh been teaching 1:08:35.279,1:08:38.480 -to you this semester uh you had so many - -1:08:37.600,1:08:40.080 -questions and +to you this semester uh you had so many questions and 1:08:38.480,1:08:41.920 -especially when we switched to this - -1:08:40.080,1:08:45.679 -online format +especially when we switched to this online format 1:08:41.920,1:08:47.600 -i think i personally loved it right so - -1:08:45.679,1:08:49.920 -at least in my opinion before we had jan +i think i personally loved it right so at least in my opinion before we had jan 1:08:47.600,1:08:53.440 -lecturing and maybe you are a bit shy - -1:08:49.920,1:08:55.839 -uh i'm not shy i mean i i don't care +lecturing and maybe you are a bit shy uh i'm not shy i mean i i don't care 1:08:53.440,1:08:58.239 -so i i think this format where you write - -1:08:55.839,1:09:01.120 -questions and i just read out whatever +so i i think this format where you write questions and i just read out whatever 1:08:58.239,1:09:02.640 -uh it's in your mind uh it really worked - -1:09:01.120,1:09:04.719 -well in terms of +uh it's in your mind uh it really worked well in terms of 1:09:02.640,1:09:06.159 -you know figuring out what are those - -1:09:04.719,1:09:09.440 -aspects that are at least +you know figuring out what are those aspects that are at least 1:09:06.159,1:09:12.560 -a little bit uh harder to uh to to - -1:09:09.440,1:09:15.040 -to to catch right uh because again we +a little bit uh harder to uh to to to to catch right uh because again we 1:09:12.560,1:09:16.080 -we may not be able to figure out what is - -1:09:15.040,1:09:19.520 -the part that is +we may not be able to figure out what is the part that is 1:09:16.080,1:09:21.520 -less um less clear - -1:09:19.520,1:09:23.520 -maybe because we've been talking about +less um less clear maybe because we've been talking about 1:09:21.520,1:09:25.359 -these things for a while now - -1:09:23.520,1:09:26.719 -so again i think if you write those +these things for a while now so again i think if you write those 1:09:25.359,1:09:28.480 -questions i read them - -1:09:26.719,1:09:30.799 -and we have like a speaker we have like +questions i read them and we have like a speaker we have like 1:09:28.480,1:09:32.799 -some kind of conversation - -1:09:30.799,1:09:34.159 -presentation it's much more effective in +some kind of conversation presentation it's much more effective in 1:09:32.799,1:09:37.359 -terms of - -1:09:34.159,1:09:39.359 -content and delivery right yeah i want +terms of content and delivery right yeah i want 1:09:37.359,1:09:40.719 -to echo what alfredo said it was a - -1:09:39.359,1:09:42.880 -it was a pleasure teaching the class as +to echo what alfredo said it was a it was a pleasure teaching the class as 1:09:40.719,1:09:46.880 -well you know despite the circumstances - -1:09:42.880,1:09:48.000 -and uh um you know i'm very thankful to +well you know despite the circumstances and uh um you know i'm very thankful to 1:09:46.880,1:09:49.759 -alfredo i think - -1:09:48.000,1:09:52.159 -you know he's putting his heart into +alfredo i think you know he's putting his heart into 1:09:49.759,1:09:55.280 -this as you can tell - -1:09:52.159,1:09:57.679 -and um and and +this as you can tell and um and and 1:09:55.280,1:10:00.840 -you know i'm i'm i'm really i'm really - -1:09:57.679,1:10:03.520 -thankful for for him to do all this job +you know i'm i'm i'm really i'm really thankful for for him to do all this job 1:10:00.840,1:10:05.600 -um because i think it uh it makes a huge - -1:10:03.520,1:10:08.719 -difference in terms of the +um because i think it uh it makes a huge difference in terms of the 1:10:05.600,1:10:10.320 -uh usefulness of the class and um so - -1:10:08.719,1:10:12.320 -thank you alfredo +uh usefulness of the class and um so thank you alfredo 1:10:10.320,1:10:14.159 -thank you and jacquin right justin made - -1:10:12.320,1:10:16.400 -the whole the challenge +thank you and jacquin right justin made the whole the challenge 1:10:14.159,1:10:17.280 -actually did a huge amount oh my god - -1:10:16.400,1:10:19.199 -this last month +actually did a huge amount oh my god this last month 1:10:17.280,1:10:21.040 -that's the biggest competition possible - -1:10:19.199,1:10:24.159 -to put together the data +that's the biggest competition possible to put together the data 1:10:21.040,1:10:26.400 -the basic code the data loader uh - -1:10:24.159,1:10:27.199 -this was i mean he worked on this for +the basic code the data loader uh this was i mean he worked on this for 1:10:26.400,1:10:29.679 -you know a lot - -1:10:27.199,1:10:30.640 -for the last few months and and then you +you know a lot for the last few months and and then you 1:10:29.679,1:10:32.960 -know gathering - -1:10:30.640,1:10:34.159 -gathering all the other results so thank +know gathering gathering all the other results so thank 1:10:32.960,1:10:36.159 -you - -1:10:34.159,1:10:37.360 -yeah i think it's been two months now +you yeah i think it's been two months now 1:10:36.159,1:10:40.560 -he's been working - -1:10:37.360,1:10:42.880 -on this stuff all right guys +he's been working on this stuff all right guys 1:10:40.560,1:10:44.080 -thank you you always get me uh you know - -1:10:42.880,1:10:47.440 -just tweet me +thank you you always get me uh you know just tweet me 1:10:44.080,1:10:49.760 -i answer every time uh uh anything you - -1:10:47.440,1:10:52.000 -need you know you can find me my door +i answer every time uh uh anything you need you know you can find me my door 1:10:49.760,1:10:54.880 -is always open uh or in the office or - -1:10:52.000,1:10:57.679 -here on on zoom right so +is always open uh or in the office or here on on zoom right so 1:10:54.880,1:10:59.920 -as alfredo said this this project uh we - -1:10:57.679,1:11:03.440 -have this uh autonomous driving project +as alfredo said this this project uh we have this uh autonomous driving project 1:10:59.920,1:11:05.600 -and uh you know uh we need all the help - -1:11:03.440,1:11:07.040 -we can get with this so if you +and uh you know uh we need all the help we can get with this so if you 1:11:05.600,1:11:09.199 -are in some of the top teams and you are - -1:11:07.040,1:11:11.920 -interested in participating uh +are in some of the top teams and you are interested in participating uh 1:11:09.199,1:11:12.640 -get in touch with alfredo and you know - -1:11:11.920,1:11:14.960 -you could +get in touch with alfredo and you know you could 1:11:12.640,1:11:17.040 -work on this during the summer or or - -1:11:14.960,1:11:20.880 -perhaps beyond +work on this during the summer or or perhaps beyond 1:11:17.040,1:11:20.880 -all right all right um goodbye guys - -1:11:22.000,1:11:27.840 -all right okay bye bye guys - -1:11:25.440,1:11:27.840 -bye - +all right all right um goodbye guys all right okay bye bye guys diff --git a/docs/en/week15/practicum15A.sbv b/docs/en/week15/practicum15A.sbv index 8d11aa945..d81e370a4 100644 --- a/docs/en/week15/practicum15A.sbv +++ b/docs/en/week15/practicum15A.sbv @@ -1,4599 +1,2297 @@ 0:00:02.720,0:00:06.000 -all right all right all right - -0:00:04.240,0:00:08.080 -so today we're gonna be talking again +all right all right all right so today we're gonna be talking again 0:00:06.000,0:00:10.240 -about foundations of deep learning - -0:00:08.080,0:00:11.040 -that's me alfredo and you can find me on +about foundations of deep learning that's me alfredo and you can find me on 0:00:10.240,0:00:13.840 -twitter - -0:00:11.040,0:00:15.679 -on the handle alfcnz actually if you +twitter on the handle alfcnz actually if you 0:00:13.840,0:00:17.840 -check twitter you can find some - -0:00:15.679,0:00:18.960 -you could find some news about today's +check twitter you can find some you could find some news about today's 0:00:17.840,0:00:21.359 -lesson - -0:00:18.960,0:00:22.560 -since i posted online like yesterday +lesson since i posted online like yesterday 0:00:21.359,0:00:25.359 -night - -0:00:22.560,0:00:27.199 -so the deal is always the same as soon +night so the deal is always the same as soon 0:00:25.359,0:00:28.720 -as you don't understand as - -0:00:27.199,0:00:30.400 -soon as i don't make sense since i +as you don't understand as soon as i don't make sense since i 0:00:28.720,0:00:32.239 -didn't sleep and i've been working on - -0:00:30.400,0:00:34.160 -this stuff for the last 30 hours +didn't sleep and i've been working on this stuff for the last 30 hours 0:00:32.239,0:00:36.079 -it's very likely i'm not gonna be making - -0:00:34.160,0:00:38.399 -much sense at some times +it's very likely i'm not gonna be making much sense at some times 0:00:36.079,0:00:39.520 -so every time something is not clear - -0:00:38.399,0:00:41.920 -just stop me +so every time something is not clear just stop me 0:00:39.520,0:00:42.559 -ask me anything because again if we keep - -0:00:41.920,0:00:44.320 -going +ask me anything because again if we keep going 0:00:42.559,0:00:47.360 -and you're not following then we are not - -0:00:44.320,0:00:47.360 -going anywhere okay +and you're not following then we are not going anywhere okay 0:00:53.520,0:00:56.879 -all right so today we're going to be - -0:00:55.120,0:00:59.520 -talking about inference +all right so today we're going to be talking about inference 0:00:56.879,0:01:01.120 -for latent variable energy-based models - -0:00:59.520,0:01:04.799 -ebns +for latent variable energy-based models ebns 0:01:01.120,0:01:07.920 -for example the ellipse likewise - -0:01:04.799,0:01:10.640 -we have cover only inference only +for example the ellipse likewise we have cover only inference only 0:01:07.920,0:01:12.320 -inference in our first lab - -0:01:10.640,0:01:14.799 -today we're gonna be only covering +inference in our first lab today we're gonna be only covering 0:01:12.320,0:01:18.080 -inference for energy based models - -0:01:14.799,0:01:19.040 -i will not say the word training ever +inference for energy based models i will not say the word training ever 0:01:18.080,0:01:21.680 -again - -0:01:19.040,0:01:22.560 -okay i'll try at least so today we're +again okay i'll try at least so today we're 0:01:21.680,0:01:25.280 -gonna be talking about - -0:01:22.560,0:01:26.159 -inference what is this stuff and where +gonna be talking about inference what is this stuff and where 0:01:25.280,0:01:28.080 -do we start - -0:01:26.159,0:01:29.360 -we're going to be starting from our +do we start we're going to be starting from our 0:01:28.080,0:01:30.960 -training examples - -0:01:29.360,0:01:32.880 -training training samples i said i +training examples training training samples i said i 0:01:30.960,0:01:34.880 -wasn't going to say this word - -0:01:32.880,0:01:36.560 -all right so let's see what kind of data +wasn't going to say this word all right so let's see what kind of data 0:01:34.880,0:01:38.799 -we're going to be working on and why we - -0:01:36.560,0:01:42.240 -need these energy based models +we're going to be working on and why we need these energy based models 0:01:38.799,0:01:45.920 -so we can think about our data why - -0:01:42.240,0:01:49.200 -bold y is uh it's been living +so we can think about our data why bold y is uh it's been living 0:01:45.920,0:01:52.000 -having two components y one y two - -0:01:49.200,0:01:52.960 -y one is gonna be this row one function +having two components y one y two y one is gonna be this row one function 0:01:52.000,0:01:55.840 -of x - -0:01:52.960,0:01:57.439 -which is going to be my input multiplied +of x which is going to be my input multiplied 0:01:55.840,0:01:59.920 -by the cosine of theta - -0:01:57.439,0:02:00.479 -which is some you know angle we don't +by the cosine of theta which is some you know angle we don't 0:01:59.920,0:02:03.040 -know - -0:02:00.479,0:02:04.960 -plus some epsilon noise and then row 2 +know plus some epsilon noise and then row 2 0:02:03.040,0:02:07.680 -is going to be again a function of my - -0:02:04.960,0:02:08.080 -input x and then it's multiplied by a +is going to be again a function of my input x and then it's multiplied by a 0:02:07.680,0:02:11.360 -sine - -0:02:08.080,0:02:14.480 -of this theta we have no axis plus some +sine of this theta we have no axis plus some 0:02:11.360,0:02:18.480 -uh noise epsilon - -0:02:14.480,0:02:21.840 -rho is a function that maps the input +uh noise epsilon rho is a function that maps the input 0:02:18.480,0:02:22.560 -one dimensional r into a r2 and so it's - -0:02:21.840,0:02:25.440 -mapping my +one dimensional r into a r2 and so it's mapping my 0:02:22.560,0:02:26.480 -x into something that is this alpha x - -0:02:25.440,0:02:28.480 -plus beta +x into something that is this alpha x plus beta 0:02:26.480,0:02:29.840 -1 minus x for the first component and - -0:02:28.480,0:02:33.680 -the other is beta +1 minus x for the first component and the other is beta 0:02:29.840,0:02:36.239 -times x plus alpha multiply 1 minus x - -0:02:33.680,0:02:38.720 -and and then everything is multiplied by +times x plus alpha multiply 1 minus x and and then everything is multiplied by 0:02:36.239,0:02:41.840 -this exponential - -0:02:38.720,0:02:45.440 -of x so alpha and beta +this exponential of x so alpha and beta 0:02:41.840,0:02:45.760 -are simply 1.5 and 2 so this is simply - -0:02:45.440,0:02:50.319 -the +are simply 1.5 and 2 so this is simply the 0:02:45.760,0:02:53.760 -equation for a ellipse but then if x - -0:02:50.319,0:02:56.640 -goes from zero to one as i show you here +equation for a ellipse but then if x goes from zero to one as i show you here 0:02:53.760,0:02:57.840 -you're gonna have that this is gonna be - -0:02:56.640,0:03:00.400 -drawing +you're gonna have that this is gonna be drawing 0:02:57.840,0:03:02.000 -some sort of horn that is exponentially - -0:03:00.400,0:03:03.840 -no in the profile +some sort of horn that is exponentially no in the profile 0:03:02.000,0:03:05.280 -and then it starts as like something - -0:03:03.840,0:03:07.360 -like this and then eventually +and then it starts as like something like this and then eventually 0:03:05.280,0:03:09.440 -like horizontal ellipse and eventually - -0:03:07.360,0:03:12.720 -end up as a vertical ellipse +like horizontal ellipse and eventually end up as a vertical ellipse 0:03:09.440,0:03:15.840 -okay x here is going to be sample from - -0:03:12.720,0:03:18.400 -the uniform distribution +okay x here is going to be sample from the uniform distribution 0:03:15.840,0:03:22.640 -similarly theta is also sampled from the - -0:03:18.400,0:03:25.040 -uniform distribution from 0 to 2 pi +similarly theta is also sampled from the uniform distribution from 0 to 2 pi 0:03:22.640,0:03:26.239 -epsilon instead is sampled from a normal - -0:03:25.040,0:03:28.159 -distribution +epsilon instead is sampled from a normal distribution 0:03:26.239,0:03:31.680 -with mean 0 and then a standard - -0:03:28.159,0:03:34.319 -deviation of 1 over 20. +with mean 0 and then a standard deviation of 1 over 20. 0:03:31.680,0:03:34.879 -so again as you might have seen from - -0:03:34.319,0:03:37.440 -twitter +so again as you might have seen from twitter 0:03:34.879,0:03:38.159 -this stuff looks pretty cool and it - -0:03:37.440,0:03:40.959 -looks like +this stuff looks pretty cool and it looks like 0:03:38.159,0:03:42.720 -that but then since we have magic on - -0:03:40.959,0:03:45.200 -this side we can do this +that but then since we have magic on this side we can do this 0:03:42.720,0:03:47.280 -and so you can see here how we're gonna - -0:03:45.200,0:03:50.879 -be having this exponential +and so you can see here how we're gonna be having this exponential 0:03:47.280,0:03:53.840 -uh side right this exponential envelope - -0:03:50.879,0:03:55.680 -we start with the uh ellipse that is +uh side right this exponential envelope we start with the uh ellipse that is 0:03:53.840,0:03:57.360 -like vertical and then we end up with - -0:03:55.680,0:04:01.040 -this horizontal one +like vertical and then we end up with this horizontal one 0:03:57.360,0:04:04.080 -okay um - -0:04:01.040,0:04:07.280 -what we want to pay attention here +okay um what we want to pay attention here 0:04:04.080,0:04:10.480 -is that at a given specific location - -0:04:07.280,0:04:13.760 -x there is no +is that at a given specific location x there is no 0:04:10.480,0:04:16.880 -one y only right so we cannot really - -0:04:13.760,0:04:18.959 -train a neural net that is like a +one y only right so we cannot really train a neural net that is like a 0:04:16.880,0:04:20.799 -vector to vector mapping because there - -0:04:18.959,0:04:23.199 -is no vector to map +vector to vector mapping because there is no vector to map 0:04:20.799,0:04:24.320 -well there is a bunch of vectors right - -0:04:23.199,0:04:27.919 -so given one +well there is a bunch of vectors right so given one 0:04:24.320,0:04:28.639 -single input x there are many many many - -0:04:27.919,0:04:31.520 -possible +single input x there are many many many possible 0:04:28.639,0:04:32.160 -y's there is like a whole uh ellipse - -0:04:31.520,0:04:35.600 -right +y's there is like a whole uh ellipse right 0:04:32.160,0:04:38.080 -uh per given x so we can't really use - -0:04:35.600,0:04:38.960 -normal uh feed forward neural net to do +uh per given x so we can't really use normal uh feed forward neural net to do 0:04:38.080,0:04:41.199 -this - -0:04:38.960,0:04:42.479 -uh similarly if we are just talking +this uh similarly if we are just talking 0:04:41.199,0:04:45.360 -about y's - -0:04:42.479,0:04:45.759 -given one value of y one i cannot even +about y's given one value of y one i cannot even 0:04:45.360,0:04:48.240 -tell - -0:04:45.759,0:04:49.120 -what is the other corresponding y two +tell what is the other corresponding y two 0:04:48.240,0:04:51.680 -because there are - -0:04:49.120,0:04:54.080 -always two almost always two values for +because there are always two almost always two values for 0:04:51.680,0:04:57.120 -y two given one y one right - -0:04:54.080,0:04:59.759 -and so using vectors to vectors mapping +y two given one y one right and so using vectors to vectors mapping 0:04:57.120,0:05:00.320 -as we've been learning so far is not - -0:04:59.759,0:05:02.400 -quite +as we've been learning so far is not quite 0:05:00.320,0:05:04.400 -uh sufficient so today we're going to be - -0:05:02.400,0:05:06.400 -figuring out how to use these latent +uh sufficient so today we're going to be figuring out how to use these latent 0:05:04.400,0:05:07.600 -variable energy-based models to deal - -0:05:06.400,0:05:11.840 -with this kind of +variable energy-based models to deal with this kind of 0:05:07.600,0:05:11.840 -multimodal you know outcome - -0:05:12.320,0:05:16.639 -so to make things simple and make my +multimodal you know outcome so to make things simple and make my 0:05:15.600,0:05:19.120 -life easier - -0:05:16.639,0:05:21.280 -we're gonna do a few simplifications uh +life easier we're gonna do a few simplifications uh 0:05:19.120,0:05:25.360 -the first one i'm gonna be removing the - -0:05:21.280,0:05:28.720 -input so there will be no input data +the first one i'm gonna be removing the input so there will be no input data 0:05:25.360,0:05:32.560 -my model will not have input data - -0:05:28.720,0:05:36.320 -and this is like what anyhow i i fix my +my model will not have input data and this is like what anyhow i i fix my 0:05:32.560,0:05:38.320 -x to zero so by fixing the x to zero i'm - -0:05:36.320,0:05:39.919 -gonna have that my exponential becomes +x to zero so by fixing the x to zero i'm gonna have that my exponential becomes 0:05:38.320,0:05:42.639 -simply 1. - -0:05:39.919,0:05:43.520 -and then basically we turn out having +simply 1. and then basically we turn out having 0:05:42.639,0:05:46.720 -row 1 - -0:05:43.520,0:05:48.560 -that becomes 2 right so alpha +row 1 that becomes 2 right so alpha 0:05:46.720,0:05:50.240 -gets deleted by 0 and then you just have - -0:05:48.560,0:05:53.199 -the beta multiplied by +gets deleted by 0 and then you just have the beta multiplied by 0:05:50.240,0:05:54.080 -a 1 and then row two automatically is - -0:05:53.199,0:05:57.360 -gonna get +a 1 and then row two automatically is gonna get 0:05:54.080,0:06:00.400 -the amplitude of 1.5 right and so - -0:05:57.360,0:06:04.240 -my data points y are going to be simply +the amplitude of 1.5 right and so my data points y are going to be simply 0:06:00.400,0:06:05.520 -points coming from this twice the cosine - -0:06:04.240,0:06:08.160 -of this uniform +points coming from this twice the cosine of this uniform 0:06:05.520,0:06:09.199 -simply uniformly sample theta and then - -0:06:08.160,0:06:13.360 -1.5 +simply uniformly sample theta and then 1.5 0:06:09.199,0:06:16.720 -sine this uniform theta - -0:06:13.360,0:06:18.720 -the collection of all my y's +sine this uniform theta the collection of all my y's 0:06:16.720,0:06:20.319 -will give me capital y so capital y is - -0:06:18.720,0:06:22.400 -going to be the collection of all my +will give me capital y so capital y is going to be the collection of all my 0:06:20.319,0:06:25.600 -sample and here i decided to just use - -0:06:22.400,0:06:27.759 -24 samples so i have 24 +sample and here i decided to just use 24 samples so i have 24 0:06:25.600,0:06:28.800 -different samples from the uniform - -0:06:27.759,0:06:31.440 -distribution +different samples from the uniform distribution 0:06:28.800,0:06:33.600 -okay and per each of these samples there - -0:06:31.440,0:06:35.680 -will be +okay and per each of these samples there will be 0:06:33.600,0:06:39.120 -one epsilon for the first component and - -0:06:35.680,0:06:42.160 -one epsilon for the second component +one epsilon for the first component and one epsilon for the second component 0:06:39.120,0:06:44.800 -all right so what we try to do today - -0:06:42.160,0:06:45.199 -is going to be to learn well to learn +all right so what we try to do today is going to be to learn well to learn 0:06:44.800,0:06:48.160 -wrong - -0:06:45.199,0:06:49.680 -we are not learning anything we um we +wrong we are not learning anything we um we 0:06:48.160,0:06:52.479 -imagine that someone gave us - -0:06:49.680,0:06:53.120 -a already trained already learned +imagine that someone gave us a already trained already learned 0:06:52.479,0:06:54.319 -network - -0:06:53.120,0:06:57.280 -we're going to be learning how to +network we're going to be learning how to 0:06:54.319,0:07:00.960 -perform inference how we can use a model - -0:06:57.280,0:07:02.960 -to figure out if one point it belongs or +perform inference how we can use a model to figure out if one point it belongs or 0:07:00.960,0:07:04.080 -doesn't belong to what was the training - -0:07:02.960,0:07:07.919 -manifold +doesn't belong to what was the training manifold 0:07:04.080,0:07:10.800 -okay so this is my - -0:07:07.919,0:07:11.919 -training data these are my ys which are +okay so this is my training data these are my ys which are 0:07:10.800,0:07:14.800 -again an ellipse - -0:07:11.919,0:07:15.440 -you can see here the major radii radius +again an ellipse you can see here the major radii radius 0:07:14.800,0:07:18.160 -is - -0:07:15.440,0:07:21.120 -two you can see right there are one two +is two you can see right there are one two 0:07:18.160,0:07:24.400 -three four boxes each box is 0.5 - -0:07:21.120,0:07:26.400 -so this radius here is two and then the +three four boxes each box is 0.5 so this radius here is two and then the 0:07:24.400,0:07:29.680 -minor radius is gonna have one two - -0:07:26.400,0:07:32.960 -three boxes uh each box is 0.5 +minor radius is gonna have one two three boxes uh each box is 0.5 0:07:29.680,0:07:36.560 -and so this is the minor radius of 1.5 - -0:07:32.960,0:07:38.240 -when you said there's no input just +and so this is the minor radius of 1.5 when you said there's no input just 0:07:36.560,0:07:40.800 -what is theta do you consider that an - -0:07:38.240,0:07:41.520 -input or so theta we don't have access +what is theta do you consider that an input or so theta we don't have access 0:07:40.800,0:07:44.879 -to right - -0:07:41.520,0:07:47.840 -so theta is something we don't see x +to right so theta is something we don't see x 0:07:44.879,0:07:48.720 -could be the input we provide the model - -0:07:47.840,0:07:51.440 -to figure out +could be the input we provide the model to figure out 0:07:48.720,0:07:51.919 -at what location we are at that kind of - -0:07:51.440,0:07:54.080 -uh +at what location we are at that kind of uh 0:07:51.919,0:07:56.400 -horn allows us to figure out the - -0:07:54.080,0:07:59.759 -dimension of those +horn allows us to figure out the dimension of those 0:07:56.400,0:08:01.440 -ellipse ellipses but then we theta - -0:07:59.759,0:08:03.199 -here is something we don't have access +ellipse ellipses but then we theta here is something we don't have access 0:08:01.440,0:08:06.319 -to so theta was - -0:08:03.199,0:08:09.599 -is simply a variable which is +to so theta was is simply a variable which is 0:08:06.319,0:08:10.400 -uh missing which was used for generating - -0:08:09.599,0:08:12.160 -our data +uh missing which was used for generating our data 0:08:10.400,0:08:13.919 -but we don't have access to so it's a - -0:08:12.160,0:08:18.080 -missing variable it's a missing +but we don't have access to so it's a missing variable it's a missing 0:08:13.919,0:08:21.360 -input okay so we don't have access okay - -0:08:18.080,0:08:25.440 -all right so let's look at what the +input okay so we don't have access okay all right so let's look at what the 0:08:21.360,0:08:28.479 -model manifold is so in this case - -0:08:25.440,0:08:31.360 -i'm gonna have a latent input +model manifold is so in this case i'm gonna have a latent input 0:08:28.479,0:08:33.200 -which is something uh latent mean means - -0:08:31.360,0:08:33.680 -it's missing i don't have access to this +which is something uh latent mean means it's missing i don't have access to this 0:08:33.200,0:08:36.640 -input - -0:08:33.680,0:08:38.080 -still there is some you know potential +input still there is some you know potential 0:08:36.640,0:08:39.839 -input - -0:08:38.080,0:08:41.680 -you notice here is the same color as +input you notice here is the same color as 0:08:39.839,0:08:45.120 -that theta right - -0:08:41.680,0:08:47.920 -anyhow so i have my z z which is uh +that theta right anyhow so i have my z z which is uh 0:08:45.120,0:08:49.040 -i can decide to like take it from zero - -0:08:47.920,0:08:51.839 -to two pi +i can decide to like take it from zero to two pi 0:08:49.040,0:08:53.440 -uh without the so that square bracket - -0:08:51.839,0:08:55.680 -flipped square bracket means +uh without the so that square bracket flipped square bracket means 0:08:53.440,0:08:56.800 -i'm considering a vector that goes from - -0:08:55.680,0:09:00.240 -0 to 2 pi +i'm considering a vector that goes from 0 to 2 pi 0:08:56.800,0:09:03.440 -with 2 pi excluded with a step - -0:09:00.240,0:09:06.560 -you know pi over 24. and so +with 2 pi excluded with a step you know pi over 24. and so 0:09:03.440,0:09:09.760 -this one basically is like a line where - -0:09:06.560,0:09:11.120 -there are many points there are uh 40 48 +this one basically is like a line where there are many points there are uh 40 48 0:09:09.760,0:09:14.320 -points right - -0:09:11.120,0:09:17.680 -from zero to two pi excluded so this +points right from zero to two pi excluded so this 0:09:14.320,0:09:19.839 -latent input goes inside a decoder - -0:09:17.680,0:09:21.040 -and then the decoder is going to give me +latent input goes inside a decoder and then the decoder is going to give me 0:09:19.839,0:09:23.519 -this y - -0:09:21.040,0:09:25.519 -tilde and y is bold because again it +this y tilde and y is bold because again it 0:09:23.519,0:09:27.920 -lives in two dimensions - -0:09:25.519,0:09:28.959 -uh more precisely we're gonna have that +lives in two dimensions uh more precisely we're gonna have that 0:09:27.920,0:09:32.560 -by varying - -0:09:28.959,0:09:33.279 -z over one line y tilde is gonna be +by varying z over one line y tilde is gonna be 0:09:32.560,0:09:37.760 -varying - -0:09:33.279,0:09:39.040 -around a uh ellipse okay +varying around a uh ellipse okay 0:09:37.760,0:09:41.760 -on the other side instead we're going to - -0:09:39.040,0:09:44.560 -have these bold y which are my +on the other side instead we're going to have these bold y which are my 0:09:41.760,0:09:45.920 -observations so how do i know these are - -0:09:44.560,0:09:48.560 -observations +observations so how do i know these are observations 0:09:45.920,0:09:49.120 -because it's uh this circle it's shaded - -0:09:48.560,0:09:51.040 -whereas +because it's uh this circle it's shaded whereas 0:09:49.120,0:09:52.160 -those other circles are simply - -0:09:51.040,0:09:54.080 -transparent +those other circles are simply transparent 0:09:52.160,0:09:56.640 -the bottom one is a little bit gray - -0:09:54.080,0:09:58.399 -which means i have access to this data +the bottom one is a little bit gray which means i have access to this data 0:09:56.640,0:10:01.760 -okay - -0:09:58.399,0:10:02.399 -cool so this is how these points look +okay cool so this is how these points look 0:10:01.760,0:10:05.120 -right the - -0:10:02.399,0:10:06.160 -blue points are the one sample from my +right the blue points are the one sample from my 0:10:05.120,0:10:07.760 -data generation - -0:10:06.160,0:10:09.440 -generated distribution we already +data generation generated distribution we already 0:10:07.760,0:10:12.399 -sampled them we have 24 - -0:10:09.440,0:10:13.839 -and then here i just decided to plot uh +sampled them we have 24 and then here i just decided to plot uh 0:10:12.399,0:10:17.760 -48 - -0:10:13.839,0:10:19.519 -of these values from like +48 of these values from like 0:10:17.760,0:10:21.519 -reconstruction of those latent variables - -0:10:19.519,0:10:24.800 -right such that i can clearly see +reconstruction of those latent variables right such that i can clearly see 0:10:21.519,0:10:25.519 -what the network thinks uh the the true - -0:10:24.800,0:10:28.720 -manifold +what the network thinks uh the the true manifold 0:10:25.519,0:10:30.000 -is okay in the second episode - -0:10:28.720,0:10:32.480 -when we are gonna be learning we're +is okay in the second episode when we are gonna be learning we're 0:10:30.000,0:10:35.600 -gonna be figuring out how to match - -0:10:32.480,0:10:38.480 -my internal belief the the violet one +gonna be figuring out how to match my internal belief the the violet one 0:10:35.600,0:10:39.760 -with actual the the data we have but - -0:10:38.480,0:10:40.880 -we're not going to be seeing that this +with actual the the data we have but we're not going to be seeing that this 0:10:39.760,0:10:42.800 -time this time - -0:10:40.880,0:10:45.040 -we already have this model which is +time this time we already have this model which is 0:10:42.800,0:10:48.160 -pretty bad since it's not - -0:10:45.040,0:10:51.440 -already matching the data and still +pretty bad since it's not already matching the data and still 0:10:48.160,0:10:53.600 -going to be seeing how to use this model - -0:10:51.440,0:10:54.560 -so what what determines the shape of the +going to be seeing how to use this model so what what determines the shape of the 0:10:53.600,0:10:56.560 -red or - -0:10:54.560,0:10:58.000 -orange points is it the alpha and the +red or orange points is it the alpha and the 0:10:56.560,0:11:00.959 -beta - -0:10:58.000,0:11:01.600 -uh alpha and beta are determining the +beta uh alpha and beta are determining the 0:11:00.959,0:11:03.760 -side the - -0:11:01.600,0:11:06.000 -the shape of that blue thing right so +side the the shape of that blue thing right so 0:11:03.760,0:11:07.519 -the overall thing it was that horn - -0:11:06.000,0:11:10.079 -uh i showed you before the one that was +the overall thing it was that horn uh i showed you before the one that was 0:11:07.519,0:11:12.000 -spinning and then we decided to slice it - -0:11:10.079,0:11:14.640 -at a specific value of +spinning and then we decided to slice it at a specific value of 0:11:12.000,0:11:15.040 -x right so this is like a cross section - -0:11:14.640,0:11:18.000 -which +x right so this is like a cross section which 0:11:15.040,0:11:19.279 -gives us this potato the blue potato on - -0:11:18.000,0:11:21.120 -the other side i'm going to be telling +gives us this potato the blue potato on the other side i'm going to be telling 0:11:19.279,0:11:24.959 -you what is inside the decoder - -0:11:21.120,0:11:27.440 -we have a internal belief for what the +you what is inside the decoder we have a internal belief for what the 0:11:24.959,0:11:28.560 -true data manifold is right that's the - -0:11:27.440,0:11:32.079 -net network +true data manifold is right that's the net network 0:11:28.560,0:11:33.839 -that the model believe about the uh - -0:11:32.079,0:11:35.839 -you know the the how the data is +that the model believe about the uh you know the the how the data is 0:11:33.839,0:11:38.800 -supposed to look - -0:11:35.839,0:11:39.839 -okay let me let me show you in the next +supposed to look okay let me let me show you in the next 0:11:38.800,0:11:42.880 -slide a little bit - -0:11:39.839,0:11:45.920 -more information so maybe we can get uh +slide a little bit more information so maybe we can get uh 0:11:42.880,0:11:48.959 -you know sync so here - -0:11:45.920,0:11:49.680 -we're going to be looking at this energy +you know sync so here we're going to be looking at this energy 0:11:48.959,0:11:52.800 -function - -0:11:49.680,0:11:54.000 -so what is this energy function so this +function so what is this energy function so this 0:11:52.800,0:11:57.760 -energy function - -0:11:54.000,0:12:01.200 -it's um something that tells me +energy function it's um something that tells me 0:11:57.760,0:12:04.880 -what is the compatibility between this y - -0:12:01.200,0:12:07.360 -tilde and y the blue y right +what is the compatibility between this y tilde and y the blue y right 0:12:04.880,0:12:09.040 -and so basically in this case here - -0:12:07.360,0:12:12.399 -measures the distance between +and so basically in this case here measures the distance between 0:12:09.040,0:12:16.079 -my given training sample and my - -0:12:12.399,0:12:19.040 -reconstruction my given my best guess +my given training sample and my reconstruction my given my best guess 0:12:16.079,0:12:20.880 -about what i think it should be the real - -0:12:19.040,0:12:23.600 -data point +about what i think it should be the real data point 0:12:20.880,0:12:24.240 -so let's give more context here right so - -0:12:23.600,0:12:27.839 -my +so let's give more context here right so my 0:12:24.240,0:12:31.440 -energy e function of my - -0:12:27.839,0:12:34.639 -y data point and my latent variable z +energy e function of my y data point and my latent variable z 0:12:31.440,0:12:38.079 -it's gonna be the sum of the square - -0:12:34.639,0:12:38.880 -euclidean distances of the two +it's gonna be the sum of the square euclidean distances of the two 0:12:38.079,0:12:42.240 -components - -0:12:38.880,0:12:44.959 -so we have component one of the y minus +components so we have component one of the y minus 0:12:42.240,0:12:46.160 -component one of this g which is our - -0:12:44.959,0:12:48.160 -decoder +component one of this g which is our decoder 0:12:46.160,0:12:50.560 -function of z squared and then we have - -0:12:48.160,0:12:52.560 -the other one is going to be y2 minus +function of z squared and then we have the other one is going to be y2 minus 0:12:50.560,0:12:54.000 -g2 which is the second component of this - -0:12:52.560,0:12:57.200 -output of the decoder +g2 which is the second component of this output of the decoder 0:12:54.000,0:13:01.279 -squared and this importantly - -0:12:57.200,0:13:04.320 -uh happens for every y we pick +squared and this importantly uh happens for every y we pick 0:13:01.279,0:13:07.519 -from capital y so in this case - -0:13:04.320,0:13:10.560 -we have 24 different +from capital y so in this case we have 24 different 0:13:07.519,0:13:11.120 -e's right so we can index 24 different - -0:13:10.560,0:13:15.279 -e's +e's right so we can index 24 different e's 0:13:11.120,0:13:18.560 -based on the specific why you pick - -0:13:15.279,0:13:21.440 -more about this in the next slide so +based on the specific why you pick more about this in the next slide so 0:13:18.560,0:13:22.079 -what is this decoder so this decoder is - -0:13:21.440,0:13:24.639 -a little bit +what is this decoder so this decoder is a little bit 0:13:22.079,0:13:25.920 -cooked as in you know i know what is the - -0:13:24.639,0:13:28.720 -data generating +cooked as in you know i know what is the data generating 0:13:25.920,0:13:29.839 -uh process so i can put inside the g - -0:13:28.720,0:13:32.639 -what is quite +uh process so i can put inside the g what is quite 0:13:29.839,0:13:32.880 -uh you know align with what i think you - -0:13:32.639,0:13:36.000 -know +uh you know align with what i think you know 0:13:32.880,0:13:39.839 -it's a very good guess about how - -0:13:36.000,0:13:42.560 -the output should look uh so my g +it's a very good guess about how the output should look uh so my g 0:13:39.839,0:13:43.519 -which is a two component function g one - -0:13:42.560,0:13:47.120 -g two +which is a two component function g one g two 0:13:43.519,0:13:50.160 -maps uh the real line to this r2 - -0:13:47.120,0:13:52.399 -and therefore it maps my z into these +maps uh the real line to this r2 and therefore it maps my z into these 0:13:50.160,0:13:54.959 -two components which are going to be w1 - -0:13:52.399,0:13:56.720 -cosine cosine of z and then the second +two components which are going to be w1 cosine cosine of z and then the second 0:13:54.959,0:13:59.760 -component is going to be w2 because - -0:13:56.720,0:14:03.120 -the sine of z to notice here +component is going to be w2 because the sine of z to notice here 0:13:59.760,0:14:06.720 -the only parameters we have available - -0:14:03.120,0:14:07.600 -in this network in this decoder are w1 +the only parameters we have available in this network in this decoder are w1 0:14:06.720,0:14:11.040 -and w2 - -0:14:07.600,0:14:13.279 -okay cosine x and sine z +and w2 okay cosine x and sine z 0:14:11.040,0:14:14.880 -sorry cosine z and sine z are you know - -0:14:13.279,0:14:15.680 -knowledge a priori you know i know +sorry cosine z and sine z are you know knowledge a priori you know i know 0:14:14.880,0:14:18.800 -already and i - -0:14:15.680,0:14:21.199 -put there my best guess for that +already and i put there my best guess for that 0:14:18.800,0:14:22.800 -and so again this network has two - -0:14:21.199,0:14:25.440 -parameters nevertheless +and so again this network has two parameters nevertheless 0:14:22.800,0:14:26.480 -with two parameters we can still do many - -0:14:25.440,0:14:30.560 -things +with two parameters we can still do many things 0:14:26.480,0:14:33.920 -so again stress once again uh this e - -0:14:30.560,0:14:37.600 -happens to exist for any peak +so again stress once again uh this e happens to exist for any peak 0:14:33.920,0:14:40.560 -of y in this set of all y's - -0:14:37.600,0:14:42.320 -so let's put uh this e on on the top +of y in this set of all y's so let's put uh this e on on the top 0:14:40.560,0:14:45.440 -here just so i can - -0:14:42.320,0:14:49.440 -i can clear the screen below and so +here just so i can i can clear the screen below and so 0:14:45.440,0:14:52.959 -now i show you all 24 - -0:14:49.440,0:14:55.519 -energies we have how do we +now i show you all 24 energies we have how do we 0:14:52.959,0:14:57.600 -how do i get this stuff right so these - -0:14:55.519,0:14:58.160 -energies are coming from the fact that i +how do i get this stuff right so these energies are coming from the fact that i 0:14:57.600,0:15:02.000 -pick - -0:14:58.160,0:15:04.560 -a specific y so the first one i pick +pick a specific y so the first one i pick 0:15:02.000,0:15:05.760 -y prime which is like my peak of y is - -0:15:04.560,0:15:08.320 -going to be the first +y prime which is like my peak of y is going to be the first 0:15:05.760,0:15:09.040 -of my training sample and therefore i - -0:15:08.320,0:15:11.680 -can call +of my training sample and therefore i can call 0:15:09.040,0:15:12.639 -the first energy my e1 right so i can - -0:15:11.680,0:15:14.240 -index them +the first energy my e1 right so i can index them 0:15:12.639,0:15:16.240 -right now since i have you know a - -0:15:14.240,0:15:18.720 -discrete number of training samples +right now since i have you know a discrete number of training samples 0:15:16.240,0:15:19.519 -i have a discrete number of energies in - -0:15:18.720,0:15:22.399 -this case +i have a discrete number of energies in this case 0:15:19.519,0:15:23.920 -so this is my e1 and then the last one - -0:15:22.399,0:15:25.519 -on the row is going to be in the one +so this is my e1 and then the last one on the row is going to be in the one 0:15:23.920,0:15:28.000 -associated to the sixth - -0:15:25.519,0:15:28.560 -sample of my training sample my training +associated to the sixth sample of my training sample my training 0:15:28.000,0:15:32.480 -set - -0:15:28.560,0:15:34.880 -and therefore i have my e6 +set and therefore i have my e6 0:15:32.480,0:15:35.519 -uh if we go down until the last row - -0:15:34.880,0:15:38.880 -we're gonna be +uh if we go down until the last row we're gonna be 0:15:35.519,0:15:40.959 -seeing uh i'm gonna be picking the 19th - -0:15:38.880,0:15:43.839 -sample from my training set and then i'm +seeing uh i'm gonna be picking the 19th sample from my training set and then i'm 0:15:40.959,0:15:46.880 -going to have this e 19 over there - -0:15:43.839,0:15:49.759 -and finally if i pick my y prime +going to have this e 19 over there and finally if i pick my y prime 0:15:46.880,0:15:50.399 -being the last the 24th example then - -0:15:49.759,0:15:54.079 -i'll be +being the last the 24th example then i'll be 0:15:50.399,0:15:56.560 -ending up with the e24 on the x axis - -0:15:54.079,0:15:58.959 -of each of these little cells you're +ending up with the e24 on the x axis of each of these little cells you're 0:15:56.560,0:16:02.160 -going to be having z - -0:15:58.959,0:16:05.839 -so each of these e's know e 1 e 2 e 3 +going to be having z so each of these e's know e 1 e 2 e 3 0:16:02.160,0:16:08.800 -e until 24 are functions - -0:16:05.839,0:16:10.720 -of my z latent variable which is +e until 24 are functions of my z latent variable which is 0:16:08.800,0:16:14.560 -spanning as we said before - -0:16:10.720,0:16:15.360 -zero to two pi in this uh drawing here i +spanning as we said before zero to two pi in this uh drawing here i 0:16:14.560,0:16:18.880 -just have them - -0:16:15.360,0:16:19.680 -separated by uh pi over 12. so i have +just have them separated by uh pi over 12. so i have 0:16:18.880,0:16:23.680 -nice - -0:16:19.680,0:16:27.199 -separation for drawing this function +nice separation for drawing this function 0:16:23.680,0:16:30.240 -so moreover the range of this energy - -0:16:27.199,0:16:31.920 -in this case is going to be 0 to 12 and +so moreover the range of this energy in this case is going to be 0 to 12 and 0:16:30.240,0:16:33.040 -we are we're going to be computing these - -0:16:31.920,0:16:35.120 -values in a +we are we're going to be computing these values in a 0:16:33.040,0:16:36.480 -just in a short moment such that we can - -0:16:35.120,0:16:39.040 -better understand +just in a short moment such that we can better understand 0:16:36.480,0:16:41.519 -what the heck i'm talking about right so - -0:16:39.040,0:16:45.360 -again until yesterday i had no clue +what the heck i'm talking about right so again until yesterday i had no clue 0:16:41.519,0:16:47.759 -about what these were okay so i am - -0:16:45.360,0:16:48.720 -very new to this topic as well and +about what these were okay so i am very new to this topic as well and 0:16:47.759,0:16:51.279 -therefore we are - -0:16:48.720,0:16:52.240 -exploring together what is this jungle +therefore we are exploring together what is this jungle 0:16:51.279,0:16:55.519 -of very - -0:16:52.240,0:16:57.680 -funny weird wiggly functions okay +of very funny weird wiggly functions okay 0:16:55.519,0:16:58.639 -we are gonna start by cherry picking two - -0:16:57.680,0:17:01.839 -of them +we are gonna start by cherry picking two of them 0:16:58.639,0:17:05.839 -uh for example the e23 - -0:17:01.839,0:17:09.199 -the e23 looks pretty you know +uh for example the e23 the e23 looks pretty you know 0:17:05.839,0:17:12.160 -kind of okay it looks very uh - -0:17:09.199,0:17:12.720 -mostly smooth and i think it looks like +kind of okay it looks very uh mostly smooth and i think it looks like 0:17:12.160,0:17:14.640 -uh - -0:17:12.720,0:17:16.480 -you know even convex in the in the +uh you know even convex in the in the 0:17:14.640,0:17:18.319 -central part - -0:17:16.480,0:17:20.079 -and then i'm gonna be of course if i +central part and then i'm gonna be of course if i 0:17:18.319,0:17:21.839 -pick the nice one and smooth one i'm - -0:17:20.079,0:17:24.880 -gonna be also picking some weird stuff +pick the nice one and smooth one i'm gonna be also picking some weird stuff 0:17:21.839,0:17:25.199 -like like the the double wiggly the one - -0:17:24.880,0:17:27.760 -which +like like the the double wiggly the one which 0:17:25.199,0:17:28.559 -is wiggling yeah but as i said let's - -0:17:27.760,0:17:30.480 -start with +is wiggling yeah but as i said let's start with 0:17:28.559,0:17:33.360 -ease and let's start with the with a - -0:17:30.480,0:17:35.200 -simple version okay so far everything is +ease and let's start with the with a simple version okay so far everything is 0:17:33.360,0:17:36.720 -all right no one is writing anything on - -0:17:35.200,0:17:39.120 -the chat and +all right no one is writing anything on the chat and 0:17:36.720,0:17:40.559 -you know sean just asked a few questions - -0:17:39.120,0:17:46.720 -so so far we are +you know sean just asked a few questions so so far we are 0:17:40.559,0:17:50.480 -all on board or i lost some of you - -0:17:46.720,0:17:54.160 -no yeah so so basically like this square +all on board or i lost some of you no yeah so so basically like this square 0:17:50.480,0:17:54.960 -would be y23 and then the x-axis is - -0:17:54.160,0:17:57.039 -showing +would be y23 and then the x-axis is showing 0:17:54.960,0:17:58.480 -as you vary z you're going to be - -0:17:57.039,0:18:02.400 -evaluating this e +as you vary z you're going to be evaluating this e 0:17:58.480,0:18:03.039 -y 23 and z yeah yeah this is the e23 the - -0:18:02.400,0:18:05.360 -the one i +y 23 and z yeah yeah this is the e23 the the one i 0:18:03.039,0:18:06.080 -show you right now great the lecture's - -0:18:05.360,0:18:08.080 -going +show you right now great the lecture's going 0:18:06.080,0:18:09.760 -great i'm understanding as well okay - -0:18:08.080,0:18:12.880 -okay that's fantastic +great i'm understanding as well okay okay that's fantastic 0:18:09.760,0:18:16.080 -okay so let's look at this first - -0:18:12.880,0:18:16.720 -example on this kind of u shape so how +okay so let's look at this first example on this kind of u shape so how 0:18:16.080,0:18:19.760 -does - -0:18:16.720,0:18:23.039 -this u shape uh arise right +does this u shape uh arise right 0:18:19.760,0:18:26.320 -and so this is the current configuration - -0:18:23.039,0:18:28.160 -we have y prime is going to be the 23rd +and so this is the current configuration we have y prime is going to be the 23rd 0:18:26.320,0:18:31.360 -example from my training - -0:18:28.160,0:18:34.400 -set which is refigured here by that +example from my training set which is refigured here by that 0:18:31.360,0:18:38.320 -green x on the right hand side okay - -0:18:34.400,0:18:41.600 -so over here whenever i start +green x on the right hand side okay so over here whenever i start 0:18:38.320,0:18:43.840 -my z my z and i start with z equals zero - -0:18:41.600,0:18:44.960 -it actually turns out that z zero +my z my z and i start with z equals zero it actually turns out that z zero 0:18:43.840,0:18:47.760 -corresponds - -0:18:44.960,0:18:48.080 -to this location over here so if i send +corresponds to this location over here so if i send 0:18:47.760,0:18:50.400 -z - -0:18:48.080,0:18:52.160 -equals zero inside the decoder i'm gonna +z equals zero inside the decoder i'm gonna 0:18:50.400,0:18:55.520 -get a point over here - -0:18:52.160,0:18:58.240 -why is that oh because simply the w1 +get a point over here why is that oh because simply the w1 0:18:55.520,0:18:58.880 -we just randomly generated is a negative - -0:18:58.240,0:19:02.960 -number +we just randomly generated is a negative number 0:18:58.880,0:19:05.440 -and so this uh this size over here from - -0:19:02.960,0:19:07.520 -zero like this the point from here to +and so this uh this size over here from zero like this the point from here to 0:19:05.440,0:19:10.000 -here this is my w1 - -0:19:07.520,0:19:11.280 -and instead w2 is going to be a positive +here this is my w1 and instead w2 is going to be a positive 0:19:10.000,0:19:14.000 -number over here - -0:19:11.280,0:19:15.600 -so whenever we have z equals 0 you're +number over here so whenever we have z equals 0 you're 0:19:14.000,0:19:17.200 -going to have that the cosine of - -0:19:15.600,0:19:19.039 -0 is going to be equal to 1 so it +going to have that the cosine of 0 is going to be equal to 1 so it 0:19:17.200,0:19:21.440 -becomes 1 multiplied by - -0:19:19.039,0:19:22.320 -a negative number i go down here and +becomes 1 multiplied by a negative number i go down here and 0:19:21.440,0:19:25.360 -then - -0:19:22.320,0:19:27.039 -0 sine of zero is going to be zero so +then 0 sine of zero is going to be zero so 0:19:25.360,0:19:28.400 -you're gonna be you're gonna be on the x - -0:19:27.039,0:19:31.120 -axis +you're gonna be you're gonna be on the x axis 0:19:28.400,0:19:32.080 -so over here this is gonna be my initial - -0:19:31.120,0:19:34.799 -point +so over here this is gonna be my initial point 0:19:32.080,0:19:35.360 -uh how far is this point from the green - -0:19:34.799,0:19:38.480 -x +uh how far is this point from the green x 0:19:35.360,0:19:41.840 -let's count so we have one two boxes - -0:19:38.480,0:19:45.120 -three four boxes five six +let's count so we have one two boxes three four boxes five six 0:19:41.840,0:19:48.480 -six boxes and seven right so - -0:19:45.120,0:19:50.799 -two boxes are one right so seven +six boxes and seven right so two boxes are one right so seven 0:19:48.480,0:19:51.520 -boxes means we have three and a half - -0:19:50.799,0:19:54.799 -right +boxes means we have three and a half right 0:19:51.520,0:19:57.440 -so if i count it correctly one two three - -0:19:54.799,0:19:58.160 -three and a half so the distance between +so if i count it correctly one two three three and a half so the distance between 0:19:57.440,0:20:00.720 -this point - -0:19:58.160,0:20:01.520 -over here and the green guy over here +this point over here and the green guy over here 0:20:00.720,0:20:04.159 -it's roughly - -0:20:01.520,0:20:04.799 -three and a half now if you take three +it's roughly three and a half now if you take three 0:20:04.159,0:20:07.919 -and a half - -0:20:04.799,0:20:10.320 -and you square it you get +and a half and you square it you get 0:20:07.919,0:20:11.919 -yeah you guess it's right it's 12 right - -0:20:10.320,0:20:12.480 -and that's why we get this point over +yeah you guess it's right it's 12 right and that's why we get this point over 0:20:11.919,0:20:13.840 -here - -0:20:12.480,0:20:16.480 -you don't trust me take out the +here you don't trust me take out the 0:20:13.840,0:20:17.039 -calculator and check how much is 3.5 - -0:20:16.480,0:20:20.240 -squared +calculator and check how much is 3.5 squared 0:20:17.039,0:20:23.280 -okay anyhow so that's why we start - -0:20:20.240,0:20:26.159 -at this location uh here 12 right +okay anyhow so that's why we start at this location uh here 12 right 0:20:23.280,0:20:26.480 -as we keep uh increasing z and we go - -0:20:26.159,0:20:29.280 -from +as we keep uh increasing z and we go from 0:20:26.480,0:20:30.880 -zero to pi half we end up at this - -0:20:29.280,0:20:34.080 -location over here +zero to pi half we end up at this location over here 0:20:30.880,0:20:36.000 -and then as we keep going until pi - -0:20:34.080,0:20:37.200 -you're gonna get and ending up in this +and then as we keep going until pi you're gonna get and ending up in this 0:20:36.000,0:20:40.480 -location over here - -0:20:37.200,0:20:44.159 -and as you can tell uh pi +location over here and as you can tell uh pi 0:20:40.480,0:20:46.640 -you're gonna be at one square away from - -0:20:44.159,0:20:47.760 -this green boy and so one square is +you're gonna be at one square away from this green boy and so one square is 0:20:46.640,0:20:52.159 -gonna be - -0:20:47.760,0:20:55.360 -0.5 0.5 square is 0.25 +gonna be 0.5 0.5 square is 0.25 0:20:52.159,0:20:55.840 -and therefore the height of this red - -0:20:55.360,0:20:59.280 -curve +and therefore the height of this red curve 0:20:55.840,0:21:02.640 -at this location over here it's 0.25 - -0:20:59.280,0:21:05.760 -very close to zero okay and then +at this location over here it's 0.25 very close to zero okay and then 0:21:02.640,0:21:06.559 -we still keep cranking up that z and we - -0:21:05.760,0:21:10.080 -go to +we still keep cranking up that z and we go to 0:21:06.559,0:21:12.159 -three three half pi and then you keep - -0:21:10.080,0:21:13.840 -going up to two pi right and two pi +three three half pi and then you keep going up to two pi right and two pi 0:21:12.159,0:21:14.880 -we're gonna be basically getting up to - -0:21:13.840,0:21:18.000 -the same location +we're gonna be basically getting up to the same location 0:21:14.880,0:21:19.760 -where we started okay and then if you - -0:21:18.000,0:21:21.280 -keep going you're gonna repeat this one +where we started okay and then if you keep going you're gonna repeat this one 0:21:19.760,0:21:23.919 -it's gonna be going up and down - -0:21:21.280,0:21:25.360 -up and down up and down all right all +it's gonna be going up and down up and down up and down all right all 0:21:23.919,0:21:28.159 -right so this looks pretty - -0:21:25.360,0:21:30.000 -okay i think no no no crazy stuff but +right so this looks pretty okay i think no no no crazy stuff but 0:21:28.159,0:21:32.000 -then we saw the other one was kind of - -0:21:30.000,0:21:33.679 -wiggly right what happened there so +then we saw the other one was kind of wiggly right what happened there so 0:21:32.000,0:21:36.720 -instead of using the y - -0:21:33.679,0:21:38.720 -23 we're going to be using now the y10 +instead of using the y 23 we're going to be using now the y10 0:21:36.720,0:21:40.480 -which is this - -0:21:38.720,0:21:42.480 -thing right like a signature like yarn +which is this thing right like a signature like yarn 0:21:40.480,0:21:45.039 -signature all right - -0:21:42.480,0:21:45.600 -so what happened here so in this case +signature all right so what happened here so in this case 0:21:45.039,0:21:48.799 -our - -0:21:45.600,0:21:52.400 -y prime which is the peak we have +our y prime which is the peak we have 0:21:48.799,0:21:54.000 -from my possible wise is this guy over - -0:21:52.400,0:21:56.400 -here the top +from my possible wise is this guy over here the top 0:21:54.000,0:21:58.080 -x over here and again as i told you - -0:21:56.400,0:22:00.159 -before whenever z +x over here and again as i told you before whenever z 0:21:58.080,0:22:01.440 -is equal to 0 we start at this location - -0:22:00.159,0:22:03.120 -over here +is equal to 0 we start at this location over here 0:22:01.440,0:22:04.480 -so if you have understood what i'm - -0:22:03.120,0:22:06.559 -talking about and now we're going to be +so if you have understood what i'm talking about and now we're going to be 0:22:04.480,0:22:07.760 -doing an exercise such that you answer - -0:22:06.559,0:22:09.760 -me +doing an exercise such that you answer me 0:22:07.760,0:22:11.679 -can you tell me what is the distance - -0:22:09.760,0:22:15.440 -between this location over here +can you tell me what is the distance between this location over here 0:22:11.679,0:22:17.120 -and this point over here so question for - -0:22:15.440,0:22:21.120 -the people at home +and this point over here so question for the people at home 0:22:17.120,0:22:24.840 -can anyone tell me what is the length - -0:22:21.120,0:22:28.320 -of this segment i just +can anyone tell me what is the length of this segment i just 0:22:24.840,0:22:31.760 -draw and i'm - -0:22:28.320,0:22:35.280 -okay 1.5 times 1.4 +draw and i'm okay 1.5 times 1.4 0:22:31.760,0:22:37.520 -which is square root of 2. yes so - -0:22:35.280,0:22:39.200 -that's correct and if you square it +which is square root of 2. yes so that's correct and if you square it 0:22:37.520,0:22:43.840 -you're gonna have what - -0:22:39.200,0:22:47.760 -is going to be 1.5 times 1.5 times 2 +you're gonna have what is going to be 1.5 times 1.5 times 2 0:22:43.840,0:22:49.280 -right i just squared so you said 1.5 - -0:22:47.760,0:22:50.559 -times square root of 2. i'm just +right i just squared so you said 1.5 times square root of 2. i'm just 0:22:49.280,0:22:50.880 -squaring everything so we're going to - -0:22:50.559,0:22:54.720 -get +squaring everything so we're going to get 0:22:50.880,0:22:55.760 -1.5 squared times 2. so 1.5 times 2 it's - -0:22:54.720,0:23:00.320 -3 +1.5 squared times 2. so 1.5 times 2 it's 3 0:22:55.760,0:23:02.960 -and 3 times 1.4 1.5 is 4.5 right - -0:23:00.320,0:23:03.679 -and so we can determine that my initial +and 3 times 1.4 1.5 is 4.5 right and so we can determine that my initial 0:23:02.960,0:23:05.919 -energy - -0:23:03.679,0:23:06.720 -which is the square length of this +energy which is the square length of this 0:23:05.919,0:23:10.000 -segment - -0:23:06.720,0:23:13.679 -is going to be 4.5 which is exactly what +segment is going to be 4.5 which is exactly what 0:23:10.000,0:23:16.640 -this initial value over here is okay so - -0:23:13.679,0:23:16.640 -this point over here +this initial value over here is okay so this point over here 0:23:16.880,0:23:24.720 -it's 4.5 um - -0:23:21.360,0:23:27.039 -can you just repeat why you know that +it's 4.5 um can you just repeat why you know that 0:23:24.720,0:23:28.159 -z equals zero corresponds to the - -0:23:27.039,0:23:31.760 -leftmost point +z equals zero corresponds to the leftmost point 0:23:28.159,0:23:32.799 -yes so i know that uh this is because i - -0:23:31.760,0:23:37.520 -checked the code +yes so i know that uh this is because i checked the code 0:23:32.799,0:23:40.960 -i know that my w1 it's um - -0:23:37.520,0:23:44.400 -it's equal to something uh that is +i know that my w1 it's um it's equal to something uh that is 0:23:40.960,0:23:47.840 -uh minus 1.5 right - -0:23:44.400,0:23:51.840 -something like that minus one +uh minus 1.5 right something like that minus one 0:23:47.840,0:23:51.840 -point five - -0:23:52.400,0:23:57.200 -okay and then we have the w tool +point five okay and then we have the w tool 0:24:01.039,0:24:04.159 -and i'm drawing with the touchpad so - -0:24:03.200,0:24:07.200 -it's crazy +and i'm drawing with the touchpad so it's crazy 0:24:04.159,0:24:10.880 -uh this is 0.3 - -0:24:07.200,0:24:10.880 -0.4 something like that +uh this is 0.3 0.4 something like that 0:24:11.200,0:24:15.600 -zero point let's say four - -0:24:15.919,0:24:22.240 -that looks like a one but okay +zero point let's say four that looks like a one but okay 0:24:19.520,0:24:22.720 -okay believe me it's a four okay when we - -0:24:22.240,0:24:26.080 -go pi +okay believe me it's a four okay when we go pi 0:24:22.720,0:24:28.720 -half we are roughly uh one unit - -0:24:26.080,0:24:30.000 -away from this point so one square is +half we are roughly uh one unit away from this point so one square is 0:24:28.720,0:24:32.559 -going to be roughly one - -0:24:30.000,0:24:33.679 -right i mean some something roughly one +going to be roughly one right i mean some something roughly one 0:24:32.559,0:24:35.600 -square is gonna still be - -0:24:33.679,0:24:36.960 -there so this height over here is gonna +square is gonna still be there so this height over here is gonna 0:24:35.600,0:24:39.120 -be one - -0:24:36.960,0:24:40.480 -and then we climb up to this location +be one and then we climb up to this location 0:24:39.120,0:24:43.600 -over here - -0:24:40.480,0:24:48.000 -uh in this location over here +over here uh in this location over here 0:24:43.600,0:24:51.120 -we should basically get the same - -0:24:48.000,0:24:52.480 -point over here so then over +we should basically get the same point over here so then over 0:24:51.120,0:24:54.400 -here we're going to get a similar value - -0:24:52.480,0:24:56.960 -a little bit smaller and then we +here we're going to get a similar value a little bit smaller and then we 0:24:54.400,0:24:57.600 -oh what happened here so when we go to - -0:24:56.960,0:25:00.799 -three +oh what happened here so when we go to three 0:24:57.600,0:25:02.720 -three half pi we actually are - -0:25:00.799,0:25:04.080 -at this location over here and we have +three half pi we actually are at this location over here and we have 0:25:02.720,0:25:06.000 -another minimum right - -0:25:04.080,0:25:07.360 -what happened here so basically you had +another minimum right what happened here so basically you had 0:25:06.000,0:25:10.559 -this point is - -0:25:07.360,0:25:13.360 -closer to my green guy than +this point is closer to my green guy than 0:25:10.559,0:25:14.320 -a point over here right and so in this - -0:25:13.360,0:25:16.720 -case +a point over here right and so in this case 0:25:14.320,0:25:17.840 -this function here this energy has a - -0:25:16.720,0:25:20.320 -local minima +this function here this energy has a local minima 0:25:17.840,0:25:23.520 -which is happening at three three half - -0:25:20.320,0:25:26.960 -pi at this location over here +which is happening at three three half pi at this location over here 0:25:23.520,0:25:28.000 -all right cool uh let's go back to the - -0:25:26.960,0:25:29.840 -arrow +all right cool uh let's go back to the arrow 0:25:28.000,0:25:31.440 -okay so now we determine that this - -0:25:29.840,0:25:34.400 -height was 4.5 +okay so now we determine that this height was 4.5 0:25:31.440,0:25:36.000 -this was one and then this something we - -0:25:34.400,0:25:39.760 -can figure this is gonna be like two +this was one and then this something we can figure this is gonna be like two 0:25:36.000,0:25:39.760 -square this is gonna be four okay - -0:25:40.159,0:25:44.880 -okay so what happened now oh all this +square this is gonna be four okay okay so what happened now oh all this 0:25:43.600,0:25:48.000 -stuff is still here - -0:25:44.880,0:25:49.360 -okay clean all right free energy so what +stuff is still here okay clean all right free energy so what 0:25:48.000,0:25:51.840 -is this free energy - -0:25:49.360,0:25:53.679 -so we're gonna figure that out right now +is this free energy so we're gonna figure that out right now 0:25:51.840,0:25:56.320 -so the free energy - -0:25:53.679,0:25:58.159 -um actually this is the zero temperature +so the free energy um actually this is the zero temperature 0:25:56.320,0:26:00.400 -limit of the free energy - -0:25:58.159,0:26:01.279 -it's going to be simply the mean minimum +limit of the free energy it's going to be simply the mean minimum 0:26:00.400,0:26:04.480 -value of this - -0:26:01.279,0:26:07.840 -e function with respect to z +value of this e function with respect to z 0:26:04.480,0:26:11.200 -so we can compute this - -0:26:07.840,0:26:13.440 -z check which is gonna be equal +so we can compute this z check which is gonna be equal 0:26:11.200,0:26:15.520 -we can define it as being the arc mean - -0:26:13.440,0:26:18.559 -of this energy function +we can define it as being the arc mean of this energy function 0:26:15.520,0:26:21.039 -uh with respect to z why the check - -0:26:18.559,0:26:23.600 -well because the check is pointing +uh with respect to z why the check well because the check is pointing 0:26:21.039,0:26:26.320 -downwards right so whenever i minimize - -0:26:23.600,0:26:27.520 -my energy i found the location that is +downwards right so whenever i minimize my energy i found the location that is 0:26:26.320,0:26:29.279 -the lowest one - -0:26:27.520,0:26:30.559 -and theref that's why i'm gonna put the +the lowest one and theref that's why i'm gonna put the 0:26:29.279,0:26:33.840 -check means means - -0:26:30.559,0:26:36.880 -that z is the location where the uh +check means means that z is the location where the uh 0:26:33.840,0:26:40.880 -the energy is the lowest okay - -0:26:36.880,0:26:41.279 -um and how do can we find that set right +the energy is the lowest okay um and how do can we find that set right 0:26:40.880,0:26:44.720 -so - -0:26:41.279,0:26:47.360 -if um if z it's basically +so if um if z it's basically 0:26:44.720,0:26:48.320 -uh discrete like let's say we have like - -0:26:47.360,0:26:50.240 -k means +uh discrete like let's say we have like k means 0:26:48.320,0:26:51.520 -uh we have we can do exhaustive search - -0:26:50.240,0:26:55.120 -we can check every z +uh we have we can do exhaustive search we can check every z 0:26:51.520,0:26:57.120 -we have otherwise we can use - -0:26:55.120,0:26:59.039 -techniques like gradient based +we have otherwise we can use techniques like gradient based 0:26:57.120,0:27:02.080 -techniques such as - -0:26:59.039,0:27:03.919 -gradient descent and keep +techniques such as gradient descent and keep 0:27:02.080,0:27:05.760 -like pay attention i didn't say - -0:27:03.919,0:27:07.919 -stochastic gradient descent +like pay attention i didn't say stochastic gradient descent 0:27:05.760,0:27:09.840 -because here we are not doing any - -0:27:07.919,0:27:13.440 -stochastic something right +because here we are not doing any stochastic something right 0:27:09.840,0:27:15.760 -uh e is a function of what we know - -0:27:13.440,0:27:17.679 -everything uh when we do stochastic a in +uh e is a function of what we know everything uh when we do stochastic a in 0:27:15.760,0:27:19.760 -the same we are minimizing that - -0:27:17.679,0:27:21.200 -loss function which is expressed as an +the same we are minimizing that loss function which is expressed as an 0:27:19.760,0:27:24.000 -average of those pair - -0:27:21.200,0:27:25.120 -sample loss functions right here instead +average of those pair sample loss functions right here instead 0:27:24.000,0:27:28.320 -we are minimizing - -0:27:25.120,0:27:30.159 -this specific value of e there is no +we are minimizing this specific value of e there is no 0:27:28.320,0:27:32.000 -average so it's not stochastic - -0:27:30.159,0:27:33.919 -and therefore you're going to be using +average so it's not stochastic and therefore you're going to be using 0:27:32.000,0:27:35.279 -you can use algorithms such that - -0:27:33.919,0:27:39.279 -conjugate gradient +you can use algorithms such that conjugate gradient 0:27:35.279,0:27:42.480 -line search lbf gs and so on okay - -0:27:39.279,0:27:43.279 -so let's look at and let's figure out +line search lbf gs and so on okay so let's look at and let's figure out 0:27:42.480,0:27:45.919 -what is this - -0:27:43.279,0:27:47.520 -free energy right how it works so given +what is this free energy right how it works so given 0:27:45.919,0:27:51.440 -that we have defined this - -0:27:47.520,0:27:53.200 -uh z check this uh free energy +that we have defined this uh z check this uh free energy 0:27:51.440,0:27:54.960 -the the zero limit for the free energy - -0:27:53.200,0:27:57.679 -is going to simply be this +the the zero limit for the free energy is going to simply be this 0:27:54.960,0:27:58.000 -energy e computed in the location of my - -0:27:57.679,0:28:02.159 -z +energy e computed in the location of my z 0:27:58.000,0:28:05.279 -check so let's visualize here - -0:28:02.159,0:28:07.600 -this uh e uh so this +check so let's visualize here this uh e uh so this 0:28:05.279,0:28:09.760 -e10 all the energy for the sample when i - -0:28:07.600,0:28:12.799 -pick the sample pen +e10 all the energy for the sample when i pick the sample pen 0:28:09.760,0:28:17.120 -i initialize my latent variable z - -0:28:12.799,0:28:19.360 -tilde the orange one with some volume +i initialize my latent variable z tilde the orange one with some volume 0:28:17.120,0:28:20.640 -and then i'm gonna be running a gradient - -0:28:19.360,0:28:23.440 -base method for +and then i'm gonna be running a gradient base method for 0:28:20.640,0:28:25.360 -minimization therefore i end up in the - -0:28:23.440,0:28:27.600 -blue location which is my z +minimization therefore i end up in the blue location which is my z 0:28:25.360,0:28:28.720 -check and it's blue because it's cold so - -0:28:27.600,0:28:30.640 -it's like low +check and it's blue because it's cold so it's like low 0:28:28.720,0:28:32.240 -i usually think about this energy as - -0:28:30.640,0:28:35.200 -being like a temperature right +i usually think about this energy as being like a temperature right 0:28:32.240,0:28:37.360 -i mean if you multiply by the boltzmann - -0:28:35.200,0:28:40.000 -boltzmann constant no k +i mean if you multiply by the boltzmann boltzmann constant no k 0:28:37.360,0:28:40.880 -kt uh you're gonna get like the some - -0:28:40.000,0:28:43.279 -energy right +kt uh you're gonna get like the some energy right 0:28:40.880,0:28:45.440 -so energy and temperature are very uh - -0:28:43.279,0:28:48.640 -very closely related +so energy and temperature are very uh very closely related 0:28:45.440,0:28:49.840 -um and so again i use the the blue to - -0:28:48.640,0:28:53.279 -show you that is low +um and so again i use the the blue to show you that is low 0:28:49.840,0:28:55.600 -and cold and so at that location that z - -0:28:53.279,0:28:56.480 -check the yeah at that location we +and cold and so at that location that z check the yeah at that location we 0:28:55.600,0:28:59.520 -reached the minimum - -0:28:56.480,0:29:02.640 -of this energy and that is my uh +reached the minimum of this energy and that is my uh 0:28:59.520,0:29:05.039 -free energy the zero limit for the free - -0:29:02.640,0:29:05.039 -energy +free energy the zero limit for the free energy 0:29:05.200,0:29:10.960 -cool cool cool so so in practice - -0:29:08.640,0:29:12.000 -this could depend on the initialization +cool cool cool so so in practice this could depend on the initialization 0:29:10.960,0:29:15.679 -then - -0:29:12.000,0:29:18.960 -oh yeah oh yeah so well +then oh yeah oh yeah so well 0:29:15.679,0:29:20.240 -yeah the initialization uh so your - -0:29:18.960,0:29:23.039 -algorithm +yeah the initialization uh so your algorithm 0:29:20.240,0:29:25.440 -will screw up depending on the - -0:29:23.039,0:29:29.360 -initialization for sure +will screw up depending on the initialization for sure 0:29:25.440,0:29:31.120 -so i can show you later on that lbfgs - -0:29:29.360,0:29:34.240 -actually gets the wrong +so i can show you later on that lbfgs actually gets the wrong 0:29:31.120,0:29:36.320 -minimum but nevertheless the free energy - -0:29:34.240,0:29:39.440 -is the global minimum right +minimum but nevertheless the free energy is the global minimum right 0:29:36.320,0:29:42.000 -so i'm telling you here that - -0:29:39.440,0:29:43.039 -the value the minimum value of e is the +so i'm telling you here that the value the minimum value of e is the 0:29:42.000,0:29:44.880 -free energy - -0:29:43.039,0:29:46.159 -if we don't get there because we don't +free energy if we don't get there because we don't 0:29:44.880,0:29:48.720 -know how to get there and then - -0:29:46.159,0:29:50.640 -it's a different issue right so it's not +know how to get there and then it's a different issue right so it's not 0:29:48.720,0:29:54.399 -dependent on the initialization - -0:29:50.640,0:29:57.200 -uh the initialization will make +dependent on the initialization uh the initialization will make 0:29:54.399,0:29:58.480 -your algorithm more or less likely to - -0:29:57.200,0:30:01.600 -converge to the actual +your algorithm more or less likely to converge to the actual 0:29:58.480,0:30:04.880 -correct solution all right - -0:30:01.600,0:30:08.080 -so what happens here +correct solution all right so what happens here 0:30:04.880,0:30:10.480 -so in this case here we have uh the blue - -0:30:08.080,0:30:11.760 -points are my points from the training +so in this case here we have uh the blue points are my points from the training 0:30:10.480,0:30:15.200 -distribution - -0:30:11.760,0:30:18.880 -the tilde one are you know same poles +distribution the tilde one are you know same poles 0:30:15.200,0:30:22.559 -from my distrib from my model - -0:30:18.880,0:30:25.200 -then my y prime is the peak i have +from my distrib from my model then my y prime is the peak i have 0:30:22.559,0:30:26.799 -chosen right so this is my 10th item in - -0:30:25.200,0:30:31.039 -the training set +chosen right so this is my 10th item in the training set 0:30:26.799,0:30:33.760 -then my z-tilde which is the initialized - -0:30:31.039,0:30:34.799 -the value i initialize z with if i send +then my z-tilde which is the initialized the value i initialize z with if i send 0:30:33.760,0:30:36.559 -it through the - -0:30:34.799,0:30:38.320 -decoder i showed you before it's going +it through the decoder i showed you before it's going 0:30:36.559,0:30:41.760 -to generate this point - -0:30:38.320,0:30:44.720 -here this this location over here +to generate this point here this this location over here 0:30:41.760,0:30:46.000 -then i can run some some some - -0:30:44.720,0:30:48.799 -minimization algorithm +then i can run some some some minimization algorithm 0:30:46.000,0:30:50.960 -and then you end up in that location the - -0:30:48.799,0:30:53.360 -blue location and the blue x +and then you end up in that location the blue location and the blue x 0:30:50.960,0:30:54.399 -is going to be the you know decoded - -0:30:53.360,0:30:58.080 -version of the z +is going to be the you know decoded version of the z 0:30:54.399,0:31:01.440 -check which is the closest item to this - -0:30:58.080,0:31:03.519 -green boy over here so +check which is the closest item to this green boy over here so 0:31:01.440,0:31:05.120 -why are we doing this stuff here how can - -0:31:03.519,0:31:07.679 -we use this model what we use +why are we doing this stuff here how can we use this model what we use 0:31:05.120,0:31:08.960 -can you what can we use this model for - -0:31:07.679,0:31:11.519 -so we can think about +can you what can we use this model for so we can think about 0:31:08.960,0:31:13.200 -you know if we have someone has trained - -0:31:11.519,0:31:17.360 -this model and has given +you know if we have someone has trained this model and has given 0:31:13.200,0:31:20.960 -that to us we can potentially - -0:31:17.360,0:31:24.799 -find what is the closest value +that to us we can potentially find what is the closest value 0:31:20.960,0:31:26.559 -in our possible you know - -0:31:24.799,0:31:28.640 -set of all possible values we can +in our possible you know set of all possible values we can 0:31:26.559,0:31:32.000 -generate which is the closest - -0:31:28.640,0:31:34.240 -to your sample but so we can use this +generate which is the closest to your sample but so we can use this 0:31:32.000,0:31:36.799 -for performing denoising for example so - -0:31:34.240,0:31:38.480 -if i have an image which is corrupted +for performing denoising for example so if i have an image which is corrupted 0:31:36.799,0:31:41.360 -which is going to be there for far from - -0:31:38.480,0:31:44.080 -my manifold the model manifold +which is going to be there for far from my manifold the model manifold 0:31:41.360,0:31:45.120 -then i can ask my model hey model can - -0:31:44.080,0:31:48.399 -you tell me what +then i can ask my model hey model can you tell me what 0:31:45.120,0:31:51.919 -is the latent which is gonna give you - -0:31:48.399,0:31:54.640 -the uh decoded version the decoded +is the latent which is gonna give you the uh decoded version the decoded 0:31:51.919,0:31:56.320 -uh item here which is the closest as - -0:31:54.640,0:31:58.960 -possible to the +uh item here which is the closest as possible to the 0:31:56.320,0:32:00.480 -uh image i'm looking at and then - -0:31:58.960,0:32:03.279 -potentially we could just +uh image i'm looking at and then potentially we could just 0:32:00.480,0:32:04.559 -pick this value over here as you know uh - -0:32:03.279,0:32:08.080 -cleaned up version +pick this value over here as you know uh cleaned up version 0:32:04.559,0:32:10.799 -of my uh corrupted input - -0:32:08.080,0:32:12.159 -what is the energy uh the free energy +of my uh corrupted input what is the energy uh the free energy 0:32:10.799,0:32:14.559 -the free energy now - -0:32:12.159,0:32:15.600 -it's simply the square distance between +the free energy now it's simply the square distance between 0:32:14.559,0:32:18.399 -the green point - -0:32:15.600,0:32:20.240 -and the blue x right so if you take this +the green point and the blue x right so if you take this 0:32:18.399,0:32:21.200 -location these two boxes which is - -0:32:20.240,0:32:24.000 -basically one +location these two boxes which is basically one 0:32:21.200,0:32:24.559 -one square which is rough again one is - -0:32:24.000,0:32:27.600 -going to be +one square which is rough again one is going to be 0:32:24.559,0:32:31.279 -the free energy corresponding to this x - -0:32:27.600,0:32:33.919 -over here okay so every x every location +the free energy corresponding to this x over here okay so every x every location 0:32:31.279,0:32:35.279 -here in the training manifold will have - -0:32:33.919,0:32:38.720 -a +here in the training manifold will have a 0:32:35.279,0:32:40.240 -free energy which is determining what is - -0:32:38.720,0:32:43.279 -the +free energy which is determining what is the 0:32:40.240,0:32:44.559 -distance to the the what is the closest - -0:32:43.279,0:32:47.600 -distance to the manifold +distance to the the what is the closest distance to the manifold 0:32:44.559,0:32:50.799 -okay so you can see in this case - -0:32:47.600,0:32:51.519 -that let's say uh our model is well +okay so you can see in this case that let's say uh our model is well 0:32:50.799,0:32:54.159 -trained - -0:32:51.519,0:32:54.960 -we can tell that this location over here +trained we can tell that this location over here 0:32:54.159,0:32:58.000 -has much - -0:32:54.960,0:32:58.880 -a much lower free energy than a point +has much a much lower free energy than a point 0:32:58.000,0:33:01.600 -over here - -0:32:58.880,0:33:02.480 -and so these points could be more likely +over here and so these points could be more likely 0:33:01.600,0:33:05.600 -coming from - -0:33:02.480,0:33:08.000 -these you know uh could be compatible +coming from these you know uh could be compatible 0:33:05.600,0:33:11.200 -with what this model has been trained on - -0:33:08.000,0:33:12.159 -or like we show in this case this model +with what this model has been trained on or like we show in this case this model 0:33:11.200,0:33:14.880 -is definitely - -0:33:12.159,0:33:16.159 -not well trained uh so what do you mean +is definitely not well trained uh so what do you mean 0:33:14.880,0:33:18.799 -by well trained - -0:33:16.159,0:33:19.840 -uh in this case here just for you know +by well trained uh in this case here just for you know 0:33:18.799,0:33:23.440 -pedagogical - -0:33:19.840,0:33:24.000 -uh sake i didn't train fully in this +pedagogical uh sake i didn't train fully in this 0:33:23.440,0:33:27.440 -model - -0:33:24.000,0:33:30.559 -such that there are errors in ideally +model such that there are errors in ideally 0:33:27.440,0:33:33.120 -those purple points should be exactly - -0:33:30.559,0:33:34.080 -matching those blue points and that +those purple points should be exactly matching those blue points and that 0:33:33.120,0:33:36.559 -would be you know - -0:33:34.080,0:33:37.840 -a well-trained model which is capturing +would be you know a well-trained model which is capturing 0:33:36.559,0:33:40.480 -all the dependencies - -0:33:37.840,0:33:42.000 -between those y variables and this is +all the dependencies between those y variables and this is 0:33:40.480,0:33:44.799 -again one cross section - -0:33:42.000,0:33:45.360 -right of that horn this is a not well +again one cross section right of that horn this is a not well 0:33:44.799,0:33:47.679 -trained - -0:33:45.360,0:33:48.480 -model which means i stopped training +trained model which means i stopped training 0:33:47.679,0:33:51.120 -after - -0:33:48.480,0:33:53.039 -a few epochs and therefore the model +after a few epochs and therefore the model 0:33:51.120,0:33:54.799 -tried to get there but it didn't quite - -0:33:53.039,0:33:58.399 -manage to get yet there +tried to get there but it didn't quite manage to get yet there 0:33:54.799,0:34:00.480 -uh we can think about that - -0:33:58.399,0:34:02.799 -or we can think about this is a +uh we can think about that or we can think about this is a 0:34:00.480,0:34:05.039 -well-trained model so you actually learn - -0:34:02.799,0:34:06.159 -properly and then these points here are +well-trained model so you actually learn properly and then these points here are 0:34:05.039,0:34:08.879 -much further away - -0:34:06.159,0:34:10.720 -so by computing the free energy of these +much further away so by computing the free energy of these 0:34:08.879,0:34:14.000 -points you can have like a - -0:34:10.720,0:34:17.359 -measure of how far they are from the +points you can have like a measure of how far they are from the 0:34:14.000,0:34:19.359 -learned distribution okay - -0:34:17.359,0:34:20.800 -all right so let's move on and let's +learned distribution okay all right so let's move on and let's 0:34:19.359,0:34:24.320 -look now instead - -0:34:20.800,0:34:27.040 -of the the 23rd right and the 23rd u +look now instead of the the 23rd right and the 23rd u 0:34:24.320,0:34:28.639 -shape and so in this case instead oh - -0:34:27.040,0:34:30.960 -it's much easier we just have a global +shape and so in this case instead oh it's much easier we just have a global 0:34:28.639,0:34:34.079 -minimum and a global maximum - -0:34:30.960,0:34:36.159 -so there is a question if this were +minimum and a global maximum so there is a question if this were 0:34:34.079,0:34:37.200 -for denoising and the model was trained - -0:34:36.159,0:34:40.079 -to the point +for denoising and the model was trained to the point 0:34:37.200,0:34:41.280 -where the t-points were on top of the - -0:34:40.079,0:34:46.879 -till points +where the t-points were on top of the till points 0:34:41.280,0:34:46.879 -wouldn't it be not do any denoising - -0:34:47.280,0:34:51.599 -so i believe that you're saying if the +wouldn't it be not do any denoising so i believe that you're saying if the 0:34:48.960,0:34:54.560 -till points are far away from the - -0:34:51.599,0:34:56.720 -uh from the from the purple one right so +till points are far away from the uh from the from the purple one right so 0:34:54.560,0:34:59.280 -that would mean that the model would - -0:34:56.720,0:35:01.119 -would be not well trained right so if we +that would mean that the model would would be not well trained right so if we 0:34:59.280,0:35:03.119 -yeah if these blue points were like up - -0:35:01.119,0:35:03.760 -here and all of them would have been you +yeah if these blue points were like up here and all of them would have been you 0:35:03.119,0:35:07.520 -know - -0:35:03.760,0:35:10.160 -uh closer to some point over over here +know uh closer to some point over over here 0:35:07.520,0:35:11.599 -that means that this model is bad is - -0:35:10.160,0:35:13.520 -badly trained right +that means that this model is bad is badly trained right 0:35:11.599,0:35:15.839 -so again today we don't talk about - -0:35:13.520,0:35:16.800 -training so this is simply what has been +so again today we don't talk about training so this is simply what has been 0:35:15.839,0:35:19.040 -given to us - -0:35:16.800,0:35:20.560 -and we just play uh with what we have +given to us and we just play uh with what we have 0:35:19.040,0:35:23.359 -and try to figure out - -0:35:20.560,0:35:23.920 -what this energy and what is free energy +and try to figure out what this energy and what is free energy 0:35:23.359,0:35:26.720 -mean - -0:35:23.920,0:35:27.839 -okay so this is how to use this stuff +mean okay so this is how to use this stuff 0:35:26.720,0:35:31.520 -rather than - -0:35:27.839,0:35:33.520 -to learn this stuff learning next time +rather than to learn this stuff learning next time 0:35:31.520,0:35:36.160 -it's enough to understand how to use - -0:35:33.520,0:35:38.800 -this trust me +it's enough to understand how to use this trust me 0:35:36.160,0:35:40.000 -all right so let's figure out what's - -0:35:38.800,0:35:42.720 -going on with this u +all right so let's figure out what's going on with this u 0:35:40.000,0:35:43.119 -shape so the u shape instead comes from - -0:35:42.720,0:35:46.240 -this +shape so the u shape instead comes from this 0:35:43.119,0:35:49.680 -kind of example here so again here - -0:35:46.240,0:35:54.079 -we initialize to the location in +kind of example here so again here we initialize to the location in 0:35:49.680,0:35:55.680 -orange and then by running some you know - -0:35:54.079,0:35:57.280 -gradient based method or whatever +orange and then by running some you know gradient based method or whatever 0:35:55.680,0:36:00.640 -minimization process - -0:35:57.280,0:36:02.800 -we find these blue x which is my z +minimization process we find these blue x which is my z 0:36:00.640,0:36:04.079 -check so we go from the z tilde which is - -0:36:02.800,0:36:07.040 -the initialized +check so we go from the z tilde which is the initialized 0:36:04.079,0:36:08.400 -uh value for my latent to this z check - -0:36:07.040,0:36:11.599 -which is the +uh value for my latent to this z check which is the 0:36:08.400,0:36:15.040 -value at which i find the minimum - -0:36:11.599,0:36:16.640 -for my uh energy +value at which i find the minimum for my uh energy 0:36:15.040,0:36:19.040 -since this is periodic i'm going to show - -0:36:16.640,0:36:22.720 -you just on the next repetition so i +since this is periodic i'm going to show you just on the next repetition so i 0:36:19.040,0:36:25.119 -don't clutter too much the chart - -0:36:22.720,0:36:26.160 -and this came from this configuration +don't clutter too much the chart and this came from this configuration 0:36:25.119,0:36:28.720 -over here - -0:36:26.160,0:36:30.079 -we start with these training points uh +over here we start with these training points uh 0:36:28.720,0:36:32.480 -these are points from - -0:36:30.079,0:36:33.680 -you know i just sampled them from my +these are points from you know i just sampled them from my 0:36:32.480,0:36:36.720 -mother - -0:36:33.680,0:36:40.079 -my peak was this green over here +mother my peak was this green over here 0:36:36.720,0:36:41.200 -and in this case perhaps i can tell that - -0:36:40.079,0:36:43.680 -the model +and in this case perhaps i can tell that the model 0:36:41.200,0:36:44.560 -we initialized the the latent with the - -0:36:43.680,0:36:46.240 -orange +we initialized the the latent with the orange 0:36:44.560,0:36:48.480 -and then he actually went a little bit - -0:36:46.240,0:36:50.800 -too much i think he didn't choose +and then he actually went a little bit too much i think he didn't choose 0:36:48.480,0:36:54.160 -the exact best right so this is like a - -0:36:50.800,0:36:54.720 -bit it overshoot a little bit i think uh +the exact best right so this is like a bit it overshoot a little bit i think uh 0:36:54.160,0:36:57.280 -anyhow - -0:36:54.720,0:36:58.079 -this over free energy of this location +anyhow this over free energy of this location 0:36:57.280,0:37:03.359 -here - -0:36:58.079,0:37:03.359 -is going to be 0.25 right 0.5 square +here is going to be 0.25 right 0.5 square 0:37:03.520,0:37:07.520 -cool cool cool so what's left to show - -0:37:07.040,0:37:09.680 -you +cool cool cool so what's left to show you 0:37:07.520,0:37:11.040 -well just a few more things but we are - -0:37:09.680,0:37:12.880 -almost finished +well just a few more things but we are almost finished 0:37:11.040,0:37:14.160 -and then i'm looking for all your - -0:37:12.880,0:37:17.920 -questions because i +and then i'm looking for all your questions because i 0:37:14.160,0:37:23.839 -i i really i really know you have - -0:37:17.920,0:37:23.839 -questions i i have questions +i i really i really know you have questions i i have questions 0:37:28.400,0:37:33.280 -so let's in this case compute the free - -0:37:32.079,0:37:35.920 -energy +so let's in this case compute the free energy 0:37:33.280,0:37:36.640 -for every location i show you in this - -0:37:35.920,0:37:39.200 -grid +for every location i show you in this grid 0:37:36.640,0:37:41.040 -what does computing the free energy for - -0:37:39.200,0:37:44.160 -every location means +what does computing the free energy for every location means 0:37:41.040,0:37:46.880 -so just for sake of you know - -0:37:44.160,0:37:48.320 -clarity i'm gonna just repeat myself uh +so just for sake of you know clarity i'm gonna just repeat myself uh 0:37:46.880,0:37:51.440 -because i like to listen - -0:37:48.320,0:37:54.560 -or to talk right so i i like to talk so +because i like to listen or to talk right so i i like to talk so 0:37:51.440,0:37:57.599 -uh so let's let's select in - -0:37:54.560,0:38:01.839 -in green here let's say i'm picking this +uh so let's let's select in in green here let's say i'm picking this 0:37:57.599,0:38:04.079 -sample over here as my first location - -0:38:01.839,0:38:05.599 -so given that location there i'm gonna +sample over here as my first location so given that location there i'm gonna 0:38:04.079,0:38:07.680 -be picking a - -0:38:05.599,0:38:10.000 -orange do we have orange there is no +be picking a orange do we have orange there is no 0:38:07.680,0:38:13.920 -orange okay i have to pick red - -0:38:10.000,0:38:17.520 -sorry so let's say i initialize +orange okay i have to pick red sorry so let's say i initialize 0:38:13.920,0:38:20.880 -my my latent variable such that - -0:38:17.520,0:38:24.079 -the g the decoded version of the +my my latent variable such that the g the decoded version of the 0:38:20.880,0:38:24.400 -z tilde is this point over here then we - -0:38:24.079,0:38:28.160 -run +z tilde is this point over here then we run 0:38:24.400,0:38:30.320 -our minimization process to perform - -0:38:28.160,0:38:32.800 -inference right to find out what is that +our minimization process to perform inference right to find out what is that 0:38:30.320,0:38:35.119 -check so whenever we - -0:38:32.800,0:38:36.079 -find z check that process is called +check so whenever we find z check that process is called 0:38:35.119,0:38:40.000 -inference - -0:38:36.079,0:38:43.119 -given an energy given a sample +inference given an energy given a sample 0:38:40.000,0:38:44.320 -y not given a location y i do inference - -0:38:43.119,0:38:47.839 -to figure out +y not given a location y i do inference to figure out 0:38:44.320,0:38:49.920 -what was the most likely latent variable - -0:38:47.839,0:38:51.440 -missing variable that generated that +what was the most likely latent variable missing variable that generated that 0:38:49.920,0:38:54.079 -point over there - -0:38:51.440,0:38:55.359 -so inference again means we are doing +point over there so inference again means we are doing 0:38:54.079,0:38:58.960 -minimization - -0:38:55.359,0:39:00.800 -and therefore we are moving around our +minimization and therefore we are moving around our 0:38:58.960,0:39:03.440 -model manifold until i get to this - -0:39:00.800,0:39:05.280 -location over here +model manifold until i get to this location over here 0:39:03.440,0:39:07.119 -what is this location over there this - -0:39:05.280,0:39:08.480 -location is the location that is the +what is this location over there this location is the location that is the 0:39:07.119,0:39:11.680 -closest to my - -0:39:08.480,0:39:14.880 -uh sample y that i picked +closest to my uh sample y that i picked 0:39:11.680,0:39:16.720 -therefore what is my free energy so my - -0:39:14.880,0:39:17.359 -free energy is going to be simply the +therefore what is my free energy so my free energy is going to be simply the 0:39:16.720,0:39:21.200 -square - -0:39:17.359,0:39:23.920 -distance from this green guy +square distance from this green guy 0:39:21.200,0:39:24.560 -and the red one right so this segment - -0:39:23.920,0:39:26.640 -over here +and the red one right so this segment over here 0:39:24.560,0:39:28.480 -squared is going to be the free energy - -0:39:26.640,0:39:31.839 -of this point over here +squared is going to be the free energy of this point over here 0:39:28.480,0:39:34.640 -so question from for you how do - -0:39:31.839,0:39:36.079 -the free energy of the point on top left +so question from for you how do the free energy of the point on top left 0:39:34.640,0:39:38.210 -compares - -0:39:36.079,0:39:41.269 -with the energy of +compares with the energy of 0:39:38.210,0:39:41.269 -[Music] - -0:39:41.520,0:39:49.440 -the point i circle in yellow over here +[Music] the point i circle in yellow over here 0:39:46.800,0:39:50.480 -which one is larger which one is smaller - -0:39:49.440,0:39:53.839 -and +which one is larger which one is smaller and 0:39:50.480,0:39:54.720 -where where is my z check for the second - -0:39:53.839,0:39:58.079 -example +where where is my z check for the second example 0:39:54.720,0:39:59.839 -green is larger yes green is larger - -0:39:58.079,0:40:03.920 -because this distance here +green is larger yes green is larger because this distance here 0:39:59.839,0:40:07.200 -square it's gonna be much larger than - -0:40:03.920,0:40:10.000 -the which distance so similarly +square it's gonna be much larger than the which distance so similarly 0:40:07.200,0:40:10.960 -if we initialize you know with luck and - -0:40:10.000,0:40:13.440 -we run great in +if we initialize you know with luck and we run great in 0:40:10.960,0:40:15.520 -the descent like gradient based methods - -0:40:13.440,0:40:17.040 -we may end up in a location that is over +the descent like gradient based methods we may end up in a location that is over 0:40:15.520,0:40:19.520 -here - -0:40:17.040,0:40:20.960 -and therefore the free energy is going +here and therefore the free energy is going 0:40:19.520,0:40:21.359 -to be the square distance between that - -0:40:20.960,0:40:24.000 -point +to be the square distance between that point 0:40:21.359,0:40:25.599 -and that point here so definitely this - -0:40:24.000,0:40:27.920 -point would be much larger +and that point here so definitely this point would be much larger 0:40:25.599,0:40:29.040 -uh the energy free energy with respect - -0:40:27.920,0:40:31.040 -to this point +uh the energy free energy with respect to this point 0:40:29.040,0:40:32.880 -so some other question someone can make - -0:40:31.040,0:40:36.640 -is gonna be uh +so some other question someone can make is gonna be uh 0:40:32.880,0:40:38.960 -how far is the green point from my - -0:40:36.640,0:40:40.079 -distribution right how how far is the +how far is the green point from my distribution right how how far is the 0:40:38.960,0:40:42.000 -green point from my - -0:40:40.079,0:40:44.079 -learned distribution and the learn +green point from my learned distribution and the learn 0:40:42.000,0:40:46.720 -distribution here is represented by the - -0:40:44.079,0:40:48.400 -those blue points right so you can tell +distribution here is represented by the those blue points right so you can tell 0:40:46.720,0:40:50.400 -that that point in the top left - -0:40:48.400,0:40:52.000 -it's it's going to have a higher energy +that that point in the top left it's it's going to have a higher energy 0:40:50.400,0:40:55.200 -so it's further away it's - -0:40:52.000,0:40:58.319 -less compatible with you know +so it's further away it's less compatible with you know 0:40:55.200,0:41:00.480 -uh with respect to the other guy right - -0:40:58.319,0:41:01.839 -all right so we are almost almost done +uh with respect to the other guy right all right so we are almost almost done 0:41:00.480,0:41:06.160 -here - -0:41:01.839,0:41:09.040 -so let's to make like some exercise +here so let's to make like some exercise 0:41:06.160,0:41:10.400 -pay attention to those five values right - -0:41:09.040,0:41:13.200 -so i'm picking that +pay attention to those five values right so i'm picking that 0:41:10.400,0:41:14.319 -uh row there just below the x axis and - -0:41:13.200,0:41:17.359 -i'm picking the first +uh row there just below the x axis and i'm picking the first 0:41:14.319,0:41:19.680 -and then the fourth uh and so on right - -0:41:17.359,0:41:20.640 -example and so i'm going to be plotting +and then the fourth uh and so on right example and so i'm going to be plotting 0:41:19.680,0:41:23.520 -now - -0:41:20.640,0:41:23.839 -these energy functions they look pretty +now these energy functions they look pretty 0:41:23.520,0:41:26.960 -much - -0:41:23.839,0:41:30.240 -like this so for the +much like this so for the 0:41:26.960,0:41:33.520 -blue one as you can expect we extend - -0:41:30.240,0:41:36.880 -up to 20 and then we go +blue one as you can expect we extend up to 20 and then we go 0:41:33.520,0:41:38.720 -down to 2.5 roughly 20 is going to be in - -0:41:36.880,0:41:40.640 -this location like the distance between +down to 2.5 roughly 20 is going to be in this location like the distance between 0:41:38.720,0:41:40.960 -this point and this further point away - -0:41:40.640,0:41:44.000 -here +this point and this further point away here 0:41:40.960,0:41:44.800 -squared and then instead 2.5 square is - -0:41:44.000,0:41:47.760 -going to be +squared and then instead 2.5 square is going to be 0:41:44.800,0:41:49.599 -you know this distance here square - -0:41:47.760,0:41:51.440 -similarly you're gonna have you know an +you know this distance here square similarly you're gonna have you know an 0:41:49.599,0:41:55.520 -energy function for the red one - -0:41:51.440,0:41:57.760 -for the purple green and orange +energy function for the red one for the purple green and orange 0:41:55.520,0:41:59.599 -then given that i compute all these - -0:41:57.760,0:42:01.599 -values for the energy +then given that i compute all these values for the energy 0:41:59.599,0:42:03.760 -i can now compute what is the free - -0:42:01.599,0:42:06.160 -energy so the free energy +i can now compute what is the free energy so the free energy 0:42:03.760,0:42:07.280 -instead of being a function is going to - -0:42:06.160,0:42:10.640 -be a +instead of being a function is going to be a 0:42:07.280,0:42:12.480 -value when i pick a specific location - -0:42:10.640,0:42:14.560 -right so it's no longer a function of +value when i pick a specific location right so it's no longer a function of 0:42:12.480,0:42:15.200 -the latent whenever we compute the free - -0:42:14.560,0:42:18.240 -energy +the latent whenever we compute the free energy 0:42:15.200,0:42:21.040 -the latent disappears and i - -0:42:18.240,0:42:22.880 -get that z check which is the optimal +the latent disappears and i get that z check which is the optimal 0:42:21.040,0:42:26.000 -latent on the latent that is - -0:42:22.880,0:42:30.079 -the most likely giving me uh +latent on the latent that is the most likely giving me uh 0:42:26.000,0:42:32.079 -that that point so here we have that the - -0:42:30.079,0:42:34.240 -z check for the blue curve happens over +that that point so here we have that the z check for the blue curve happens over 0:42:32.079,0:42:36.400 -here similarly the z check for the - -0:42:34.240,0:42:39.280 -orange green and purple and red +here similarly the z check for the orange green and purple and red 0:42:36.400,0:42:39.920 -are happening in these locations uh - -0:42:39.280,0:42:42.079 -there +are happening in these locations uh there 0:42:39.920,0:42:44.480 -for sure we could have ended up caught - -0:42:42.079,0:42:47.200 -in this local minimum right that +for sure we could have ended up caught in this local minimum right that 0:42:44.480,0:42:47.920 -definitely could be a pitfall of you - -0:42:47.200,0:42:52.240 -know +definitely could be a pitfall of you know 0:42:47.920,0:42:56.720 -of using some gradient-based methods - -0:42:52.240,0:43:00.400 -so question now for the audience +of using some gradient-based methods so question now for the audience 0:42:56.720,0:43:04.319 -i'm removing everything what is f - -0:43:00.400,0:43:05.680 -infinity so how many dimensions does +i'm removing everything what is f infinity so how many dimensions does 0:43:04.319,0:43:08.000 -this stuff okay - -0:43:05.680,0:43:09.839 -can someone remind me this right so can +this stuff okay can someone remind me this right so can 0:43:08.000,0:43:10.640 -someone tell me what is the domain and - -0:43:09.839,0:43:15.359 -the image +someone tell me what is the domain and the image 0:43:10.640,0:43:15.359 -of these function on the chart - -0:43:15.680,0:43:19.280 -where does f infinity live +of these function on the chart where does f infinity live 0:43:17.800,0:43:24.640 -[Music] - -0:43:19.280,0:43:24.640 -anyone run sir is anyone listening +[Music] anyone run sir is anyone listening 0:43:25.280,0:43:31.680 -hello okay so y - -0:43:28.560,0:43:34.960 -is uh on r24 yeah but that's +hello okay so y is uh on r24 yeah but that's 0:43:31.680,0:43:38.079 -uh there is just an r i don't know what - -0:43:34.960,0:43:41.760 -just r means um +uh there is just an r i don't know what just r means um 0:43:38.079,0:43:44.880 -the capital y is 20 - -0:43:41.760,0:43:45.920 -capital y has 24 items each item in +the capital y is 20 capital y has 24 items each item in 0:43:44.880,0:43:47.920 -capital y - -0:43:45.920,0:43:50.160 -are two dimensional right so y is a +capital y are two dimensional right so y is a 0:43:47.920,0:43:52.160 -matrix but i'm not asking the capital y - -0:43:50.160,0:43:55.680 -i'm asking capital f +matrix but i'm not asking the capital y i'm asking capital f 0:43:52.160,0:43:57.599 -infinity right so capital f infinity is - -0:43:55.680,0:43:59.440 -uh someone mentioned here it's +infinity right so capital f infinity is uh someone mentioned here it's 0:43:57.599,0:44:01.920 -definitely a real value - -0:43:59.440,0:44:02.960 -uh in our case is actually positively +definitely a real value uh in our case is actually positively 0:44:01.920,0:44:05.359 -non-negatively - -0:44:02.960,0:44:06.720 -value right because it's a it's a square +non-negatively value right because it's a it's a square 0:44:05.359,0:44:08.880 -sum of squares - -0:44:06.720,0:44:12.160 -and the domain instead what is the +sum of squares and the domain instead what is the 0:44:08.880,0:44:12.160 -domain of capital f - -0:44:16.240,0:44:23.200 -the domain is going to be the uh the +domain of capital f the domain is going to be the uh the 0:44:19.280,0:44:26.160 -basically the where y uh - -0:44:23.200,0:44:28.160 -the bold y belongs to no so the ball y +basically the where y uh the bold y belongs to no so the ball y 0:44:26.160,0:44:30.960 -it's a vector in two dimensions so - -0:44:28.160,0:44:31.520 -that's gonna be r two right so again +it's a vector in two dimensions so that's gonna be r two right so again 0:44:30.960,0:44:33.520 -these f - -0:44:31.520,0:44:35.760 -are scalar values so i'm gonna be +these f are scalar values so i'm gonna be 0:44:33.520,0:44:36.560 -representing the different intensities - -0:44:35.760,0:44:39.920 -of this +representing the different intensities of this 0:44:36.560,0:44:43.040 -scalar value with this color bar here - -0:44:39.920,0:44:45.920 -i will represent in a violet very dark +scalar value with this color bar here i will represent in a violet very dark 0:44:43.040,0:44:46.720 -maybe not even able to see in this free - -0:44:45.920,0:44:50.079 -energy +maybe not even able to see in this free energy 0:44:46.720,0:44:52.160 -uh equals zero then in aqua - -0:44:50.079,0:44:53.200 -i'm gonna be representing this zero +uh equals zero then in aqua i'm gonna be representing this zero 0:44:52.160,0:44:56.880 -temperature limit - -0:44:53.200,0:44:59.440 -uh free energy for uh equal one and then +temperature limit uh free energy for uh equal one and then 0:44:56.880,0:45:00.079 -everything that is above and beyond the - -0:44:59.440,0:45:03.520 -value two +everything that is above and beyond the value two 0:45:00.079,0:45:06.560 -is going to be yellow and so this is - -0:45:03.520,0:45:10.240 -how that grid looks +is going to be yellow and so this is how that grid looks 0:45:06.560,0:45:12.319 -okay so each location in that grid here - -0:45:10.240,0:45:14.160 -and i show you before and those green +okay so each location in that grid here and i show you before and those green 0:45:12.319,0:45:17.119 -points have a - -0:45:14.160,0:45:18.160 -free energy which is here represented by +points have a free energy which is here represented by 0:45:17.119,0:45:19.839 -this color - -0:45:18.160,0:45:22.400 -in this location over here in the bottom +this color in this location over here in the bottom 0:45:19.839,0:45:24.480 -side you can see it's yellow - -0:45:22.400,0:45:25.599 -which means it has a free energy which +side you can see it's yellow which means it has a free energy which 0:45:24.480,0:45:29.280 -is - -0:45:25.599,0:45:32.720 -equal or larger than 2. moreover +is equal or larger than 2. moreover 0:45:29.280,0:45:34.319 -those arrows are pointing - -0:45:32.720,0:45:36.720 -are the gradient right so these are +those arrows are pointing are the gradient right so these are 0:45:34.319,0:45:39.920 -pointing in the direction of maximum - -0:45:36.720,0:45:43.119 -uh ascend as we move closer +pointing in the direction of maximum uh ascend as we move closer 0:45:39.920,0:45:43.440 -to the uh this region here you're gonna - -0:45:43.119,0:45:45.599 -get +to the uh this region here you're gonna get 0:45:43.440,0:45:47.200 -finally you're gonna see some colors and - -0:45:45.599,0:45:48.640 -here you can tell the free energy is +finally you're gonna see some colors and here you can tell the free energy is 0:45:47.200,0:45:51.599 -getting lower lower lower - -0:45:48.640,0:45:53.359 -until we hit the location where this +getting lower lower lower until we hit the location where this 0:45:51.599,0:45:55.280 -reconstruction happen - -0:45:53.359,0:45:58.160 -which is the location the region where +reconstruction happen which is the location the region where 0:45:55.280,0:46:01.760 -my free energy is zero - -0:45:58.160,0:46:05.599 -when we train this modern we try to get +my free energy is zero when we train this modern we try to get 0:46:01.760,0:46:08.400 -this zero energy level to be matching - -0:46:05.599,0:46:08.800 -the location of these blue points of +this zero energy level to be matching the location of these blue points of 0:46:08.400,0:46:11.280 -course - -0:46:08.800,0:46:12.560 -as you can tell this model is very +course as you can tell this model is very 0:46:11.280,0:46:16.000 -poorly trained - -0:46:12.560,0:46:18.560 -and therefore this energy surface is not +poorly trained and therefore this energy surface is not 0:46:16.000,0:46:20.640 -well matching my training point it's - -0:46:18.560,0:46:22.640 -getting close but it's not yet there +well matching my training point it's getting close but it's not yet there 0:46:20.640,0:46:24.640 -so next time we're gonna see how to - -0:46:22.640,0:46:28.240 -stretch this energy +so next time we're gonna see how to stretch this energy 0:46:24.640,0:46:32.240 -such that it's gonna be you know nicely - -0:46:28.240,0:46:35.599 -fitting on these blue points okay +such that it's gonna be you know nicely fitting on these blue points okay 0:46:32.240,0:46:37.119 -uh why is the energy surface single - -0:46:35.599,0:46:40.240 -value +uh why is the energy surface single value 0:46:37.119,0:46:43.280 -so the energy surface - -0:46:40.240,0:46:46.319 -which is the value of f infinity right +so the energy surface which is the value of f infinity right 0:46:43.280,0:46:47.119 -and f infinity is the minimum of my - -0:46:46.319,0:46:51.280 -energy so +and f infinity is the minimum of my energy so 0:46:47.119,0:46:54.800 -energy the capital e it's a function - -0:46:51.280,0:46:57.280 -over all possible latent but then +energy the capital e it's a function over all possible latent but then 0:46:54.800,0:46:59.440 -given that we have this function we're - -0:46:57.280,0:47:01.920 -going to find what is the minimum value +given that we have this function we're going to find what is the minimum value 0:46:59.440,0:47:03.119 -minimum value that this energy can take - -0:47:01.920,0:47:06.960 -that minimum value +minimum value that this energy can take that minimum value 0:47:03.119,0:47:10.560 -is the uh zero temperature limit - -0:47:06.960,0:47:13.839 -of the free energy which is this f +is the uh zero temperature limit of the free energy which is this f 0:47:10.560,0:47:16.880 -infinity okay and so e - -0:47:13.839,0:47:19.680 -y z is a function of y and z +infinity okay and so e y z is a function of y and z 0:47:16.880,0:47:21.200 -but then whenever we take out the z with - -0:47:19.680,0:47:24.240 -the minimization +but then whenever we take out the z with the minimization 0:47:21.200,0:47:25.359 -we get this f which is going to be a - -0:47:24.240,0:47:28.960 -function +we get this f which is going to be a function 0:47:25.359,0:47:31.760 -of y right so every time i move across - -0:47:28.960,0:47:32.640 -the y space here we have y1 and y2 the +of y right so every time i move across the y space here we have y1 and y2 the 0:47:31.760,0:47:34.960 -two components - -0:47:32.640,0:47:37.119 -you're gonna have that f will have you +two components you're gonna have that f will have you 0:47:34.960,0:47:38.440 -know larger than two larger than two - -0:47:37.119,0:47:41.839 -blah blah blah then +know larger than two larger than two blah blah blah then 0:47:38.440,0:47:42.400 -1.75 1.50 and so on lower values until - -0:47:41.839,0:47:44.640 -we get +1.75 1.50 and so on lower values until we get 0:47:42.400,0:47:46.160 -f roughly zero and then actually it - -0:47:44.640,0:47:47.920 -increases a little bit +f roughly zero and then actually it increases a little bit 0:47:46.160,0:47:50.480 -so maybe next time i also gonna show you - -0:47:47.920,0:47:51.359 -this chart uh in a 3d version also +so maybe next time i also gonna show you this chart uh in a 3d version also 0:47:50.480,0:47:54.720 -rotating - -0:47:51.359,0:47:57.040 -i didn't have time to do that um +rotating i didn't have time to do that um 0:47:54.720,0:47:57.839 -did i answer your question is it clear - -0:47:57.040,0:48:00.800 -why this +did i answer your question is it clear why this 0:47:57.839,0:48:02.559 -energy function is single value like as - -0:48:00.800,0:48:03.520 -in a scalar value right you mean single +energy function is single value like as in a scalar value right you mean single 0:48:02.559,0:48:05.200 -value - -0:48:03.520,0:48:07.839 -am i understanding the question +value am i understanding the question 0:48:05.200,0:48:07.839 -correctly - -0:48:08.480,0:48:14.960 -but we have 24 y's so the +correctly but we have 24 y's so the 0:48:11.520,0:48:18.000 -capital the y's are these blue points - -0:48:14.960,0:48:21.839 -right now my y's i'm using +capital the y's are these blue points right now my y's i'm using 0:48:18.000,0:48:24.160 -are this one so they're not 24 there are - -0:48:21.839,0:48:26.000 -so if you count from here let me go a +are this one so they're not 24 there are so if you count from here let me go a 0:48:24.160,0:48:29.119 -bit larger i can see - -0:48:26.000,0:48:32.240 -so here we have blah blah blah 12 +bit larger i can see so here we have blah blah blah 12 0:48:29.119,0:48:35.839 -and 12 here plus one we have 25 - -0:48:32.240,0:48:38.640 -and then here we had eight +and 12 here plus one we have 25 and then here we had eight 0:48:35.839,0:48:41.280 -and eight sixteen plus one seventeen so - -0:48:38.640,0:48:42.800 -right now we have 17 times 25 +and eight sixteen plus one seventeen so right now we have 17 times 25 0:48:41.280,0:48:46.839 -i don't know how much it is someone can - -0:48:42.800,0:48:49.839 -compute okay google how much is 17 times +i don't know how much it is someone can compute okay google how much is 17 times 0:48:46.839,0:48:53.960 -25 - -0:48:49.839,0:48:57.359 -okay she's not listening oh 120 +25 okay she's not listening oh 120 0:48:53.960,0:49:00.400 -425 uh there you go - -0:48:57.359,0:49:03.760 -so right now we have 425 +425 uh there you go so right now we have 425 0:49:00.400,0:49:08.079 -points right so we have 424 - -0:49:03.760,0:49:11.440 -24 425 energy functions +points right so we have 424 24 425 energy functions 0:49:08.079,0:49:13.200 -of which function of y so - -0:49:11.440,0:49:15.359 -given that i pick a y i have an energy +of which function of y so given that i pick a y i have an energy 0:49:13.200,0:49:16.960 -function those are functions in z - -0:49:15.359,0:49:18.400 -given that you pick the minimum value of +function those are functions in z given that you pick the minimum value of 0:49:16.960,0:49:19.839 -this energy function that's going to be - -0:49:18.400,0:49:23.280 -your free energy +this energy function that's going to be your free energy 0:49:19.839,0:49:24.720 -for a specific y so you remove that - -0:49:23.280,0:49:28.720 -latent variable so we have +for a specific y so you remove that latent variable so we have 0:49:24.720,0:49:31.040 -an internal possible way of spending our - -0:49:28.720,0:49:32.400 -uh manifold right so you want to think +an internal possible way of spending our uh manifold right so you want to think 0:49:31.040,0:49:35.280 -about this as you know - -0:49:32.400,0:49:37.440 -you have like uh your potato in your +about this as you know you have like uh your potato in your 0:49:35.280,0:49:38.079 -model like your model thinks about the - -0:49:37.440,0:49:41.200 -data is +model like your model thinks about the data is 0:49:38.079,0:49:41.680 -distributed as this kind of shape and - -0:49:41.200,0:49:44.319 -then +distributed as this kind of shape and then 0:49:41.680,0:49:46.319 -your latent variable allows you to go - -0:49:44.319,0:49:48.800 -all around this potato +your latent variable allows you to go all around this potato 0:49:46.319,0:49:49.760 -so right now if you add if you ask me oh - -0:49:48.800,0:49:52.559 -is this +so right now if you add if you ask me oh is this 0:49:49.760,0:49:54.480 -point here on your manifold or not so if - -0:49:52.559,0:49:57.200 -this point is on my manifold +point here on your manifold or not so if this point is on my manifold 0:49:54.480,0:49:58.400 -i know that by going around my manifold - -0:49:57.200,0:50:01.599 -and find out if +i know that by going around my manifold and find out if 0:49:58.400,0:50:05.280 -oh i get there right and so if - -0:50:01.599,0:50:07.440 -the free energy of that point is zero +oh i get there right and so if the free energy of that point is zero 0:50:05.280,0:50:09.280 -therefore it means that that point you - -0:50:07.440,0:50:11.680 -are asking me about +therefore it means that that point you are asking me about 0:50:09.280,0:50:12.800 -leaves on the manifold that the model - -0:50:11.680,0:50:15.359 -has learned +leaves on the manifold that the model has learned 0:50:12.800,0:50:16.720 -if your free energy is not zero then - -0:50:15.359,0:50:19.839 -it's gonna be simply +if your free energy is not zero then it's gonna be simply 0:50:16.720,0:50:20.720 -equal to the quadratic uh euclidean - -0:50:19.839,0:50:23.599 -distance +equal to the quadratic uh euclidean distance 0:50:20.720,0:50:24.240 -from that location from your point and - -0:50:23.599,0:50:28.960 -my +from that location from your point and my 0:50:24.240,0:50:33.680 -manifold right did i answer the question - -0:50:28.960,0:50:36.880 -yeah okay uh more questions for me +manifold right did i answer the question yeah okay uh more questions for me 0:50:33.680,0:50:40.160 -oh was everything clear i i this stuff i - -0:50:36.880,0:50:40.559 -really just digested it uh like in the +oh was everything clear i i this stuff i really just digested it uh like in the 0:50:40.160,0:50:43.119 -past - -0:50:40.559,0:50:45.839 -30 hours so again i might not have done +past 30 hours so again i might not have done 0:50:43.119,0:50:48.880 -a very good job - -0:50:45.839,0:50:52.640 -let's see how do we choose a function to +a very good job let's see how do we choose a function to 0:50:48.880,0:50:52.640 -represent the data manifold - -0:50:53.040,0:50:59.040 -in this case it seemed like we chose a +represent the data manifold in this case it seemed like we chose a 0:50:56.000,0:51:02.400 -ellipse based on the data but how about - -0:50:59.040,0:51:04.400 -other scenarios yeah definitely uh +ellipse based on the data but how about other scenarios yeah definitely uh 0:51:02.400,0:51:05.680 -there is a lot of you know research - -0:51:04.400,0:51:07.040 -going in uh +there is a lot of you know research going in uh 0:51:05.680,0:51:10.160 -architectures right network - -0:51:07.040,0:51:13.599 -architectures so +architectures right network architectures so 0:51:10.160,0:51:14.319 -but again uh right now yeah we we chose - -0:51:13.599,0:51:16.720 -that +but again uh right now yeah we we chose that 0:51:14.319,0:51:18.559 -next time i'm gonna be trying to learn - -0:51:16.720,0:51:20.960 -the level of compatibility +next time i'm gonna be trying to learn the level of compatibility 0:51:18.559,0:51:22.559 -like i'm gonna try to learn this energy - -0:51:20.960,0:51:26.079 -for the x y +like i'm gonna try to learn this energy for the x y 0:51:22.559,0:51:27.599 -z the the triple right and so - -0:51:26.079,0:51:29.359 -we're gonna be just using neural nets +z the the triple right and so we're gonna be just using neural nets 0:51:27.599,0:51:31.599 -right even the sine and cosine - -0:51:29.359,0:51:33.359 -you can somehow approximate them with a +right even the sine and cosine you can somehow approximate them with a 0:51:31.599,0:51:34.319 -few layers right so instead of having - -0:51:33.359,0:51:38.319 -these +few layers right so instead of having these 0:51:34.319,0:51:40.720 -uh z function uh the g function - -0:51:38.319,0:51:41.760 -over here instead of having this very +uh z function uh the g function over here instead of having this very 0:51:40.720,0:51:43.440 -simple thing - -0:51:41.760,0:51:45.520 -we can think about having you know a few +simple thing we can think about having you know a few 0:51:43.440,0:51:47.920 -layers of a neural net right - -0:51:45.520,0:51:49.280 -so you can still use a few layers of a +layers of a neural net right so you can still use a few layers of a 0:51:47.920,0:51:51.280 -neural net - -0:51:49.280,0:51:52.720 -but you're not going to be using the +neural net but you're not going to be using the 0:51:51.280,0:51:54.559 -neural net to do vector - -0:51:52.720,0:51:56.160 -vector mapping you're going to be using +neural net to do vector vector mapping you're going to be using 0:51:54.559,0:51:59.599 -a neural net to do - -0:51:56.160,0:52:01.359 -a bunch of vectors to scalars right +a neural net to do a bunch of vectors to scalars right 0:51:59.599,0:52:02.960 -so bunch of vectors to scalars is going - -0:52:01.359,0:52:05.520 -to be this +so bunch of vectors to scalars is going to be this 0:52:02.960,0:52:06.800 -energy-based you know way of thinking - -0:52:05.520,0:52:09.839 -about things +energy-based you know way of thinking about things 0:52:06.800,0:52:10.960 -uh because again how do you - -0:52:09.839,0:52:12.559 -let's say you want to translate +uh because again how do you let's say you want to translate 0:52:10.960,0:52:13.280 -something from one language to another - -0:52:12.559,0:52:15.280 -language right +something from one language to another language right 0:52:13.280,0:52:16.559 -so i have one sentence but i may - -0:52:15.280,0:52:18.240 -translate that sentence +so i have one sentence but i may translate that sentence 0:52:16.559,0:52:20.240 -in different ways in another language - -0:52:18.240,0:52:22.079 -right so how do you train this you +in different ways in another language right so how do you train this you 0:52:20.240,0:52:24.079 -cannot really say - -0:52:22.079,0:52:26.079 -i do soft marks because first of all +cannot really say i do soft marks because first of all 0:52:24.079,0:52:29.520 -there is an infinite number - -0:52:26.079,0:52:31.200 -of sentences so you can't do that +there is an infinite number of sentences so you can't do that 0:52:29.520,0:52:33.280 -but then there might be even multiple - -0:52:31.200,0:52:35.440 -sentences that are correctly associated +but then there might be even multiple sentences that are correctly associated 0:52:33.280,0:52:37.839 -to your first sentence - -0:52:35.440,0:52:39.200 -so this energy based model allow you to +to your first sentence so this energy based model allow you to 0:52:37.839,0:52:41.680 -end up with a - -0:52:39.200,0:52:42.319 -score scoring mechanism which is this +end up with a score scoring mechanism which is this 0:52:41.680,0:52:46.720 -energy - -0:52:42.319,0:52:49.839 -which is telling you how compatible are +energy which is telling you how compatible are 0:52:46.720,0:52:50.880 -points right so here x y and z are all - -0:52:49.839,0:52:52.640 -interchangeable +points right so here x y and z are all interchangeable 0:52:50.880,0:52:54.079 -given one i can find the other right so - -0:52:52.640,0:52:57.280 -if i have the energy +given one i can find the other right so if i have the energy 0:52:54.079,0:52:58.559 -if my model learned the energy i can - -0:52:57.280,0:53:00.960 -find x given y +if my model learned the energy i can find x given y 0:52:58.559,0:53:01.920 -i can find y given z i can find z given - -0:53:00.960,0:53:04.400 -x i can find +i can find y given z i can find z given x i can find 0:53:01.920,0:53:06.240 -all kind of combination those x y z i - -0:53:04.400,0:53:07.599 -don't even have to write them x y and z +all kind of combination those x y z i don't even have to write them x y and z 0:53:06.240,0:53:08.480 -i can just write all the components - -0:53:07.599,0:53:10.640 -right and then i can +i can just write all the components right and then i can 0:53:08.480,0:53:12.640 -as long as my model learns that right it - -0:53:10.640,0:53:16.000 -learns all the +as long as my model learns that right it learns all the 0:53:12.640,0:53:16.480 -uh how do you call them um interactions - -0:53:16.000,0:53:20.000 -that +uh how do you call them um interactions that 0:53:16.480,0:53:21.760 -exist in my data that's why uh jan likes - -0:53:20.000,0:53:23.040 -them so much and they're super powerful +exist in my data that's why uh jan likes them so much and they're super powerful 0:53:21.760,0:53:26.000 -because they don't make too many - -0:53:23.040,0:53:26.000 -assumptions i think +because they don't make too many assumptions i think 0:53:26.160,0:53:32.880 -uh did i okay i answer your question uh - -0:53:29.200,0:53:36.640 -we are over time so i think +uh did i okay i answer your question uh we are over time so i think 0:53:32.880,0:53:38.079 -this lesson was kind of - -0:53:36.640,0:53:40.400 -fine i don't know you had to tell me +this lesson was kind of fine i don't know you had to tell me 0:53:38.079,0:53:42.640 -because i really don't know - -0:53:40.400,0:53:44.319 -i hope you like this yeah that was great +because i really don't know i hope you like this yeah that was great 0:53:42.640,0:53:47.040 -okay because people are very - -0:53:44.319,0:53:48.960 -quiet today i wanted to make also a +okay because people are very quiet today i wanted to make also a 0:53:47.040,0:53:49.599 -notebook but then the notebook is really - -0:53:48.960,0:53:52.000 -ugly +notebook but then the notebook is really ugly 0:53:49.599,0:53:52.720 -because i use the notebook to make very - -0:53:52.000,0:53:54.559 -pretty +because i use the notebook to make very pretty 0:53:52.720,0:53:56.640 -visualization but the code is really - -0:53:54.559,0:53:58.800 -ugly maybe next time +visualization but the code is really ugly maybe next time 0:53:56.640,0:54:00.000 -i can share with you a cleanup version - -0:53:58.800,0:54:02.880 -of the notebook for +i can share with you a cleanup version of the notebook for 0:54:00.000,0:54:03.440 -pedagogical purpose right and especially - -0:54:02.880,0:54:06.000 -going to be +pedagogical purpose right and especially going to be 0:54:03.440,0:54:07.599 -showing you this network which doesn't - -0:54:06.000,0:54:08.640 -have an input doesn't have a forward +showing you this network which doesn't have an input doesn't have a forward 0:54:07.599,0:54:10.720 -function - -0:54:08.640,0:54:12.240 -which is so funny uh and then we're +function which is so funny uh and then we're 0:54:10.720,0:54:14.559 -gonna be learning perhaps - -0:54:12.240,0:54:16.079 -what is the free energy without this +gonna be learning perhaps what is the free energy without this 0:54:14.559,0:54:18.880 -beta that goes to - -0:54:16.079,0:54:19.680 -um to infinity and we're gonna be +beta that goes to um to infinity and we're gonna be 0:54:18.880,0:54:21.760 -learning how to do - -0:54:19.680,0:54:23.920 -learning okay so today again we just +learning how to do learning okay so today again we just 0:54:21.760,0:54:25.359 -learned so let me get to the beginning - -0:54:23.920,0:54:28.319 -so we can end up +learned so let me get to the beginning so we can end up 0:54:25.359,0:54:29.760 -here so today we talk about inference - -0:54:28.319,0:54:32.000 -okay +here so today we talk about inference okay 0:54:29.760,0:54:33.440 -we do inference by doing minimization of - -0:54:32.000,0:54:34.960 -an energy function +we do inference by doing minimization of an energy function 0:54:33.440,0:54:36.559 -learning is something we're going to be - -0:54:34.960,0:54:37.040 -talking about next time they don't they +learning is something we're going to be talking about next time they don't they 0:54:36.559,0:54:39.280 -don't - -0:54:37.040,0:54:41.359 -they don't have anything to share well +don't they don't have anything to share well 0:54:39.280,0:54:43.520 -it's two different topics right - -0:54:41.359,0:54:45.280 -next time the other one and then the +it's two different topics right next time the other one and then the 0:54:43.520,0:54:45.760 -other part so it was inference for - -0:54:45.280,0:54:48.000 -latent +other part so it was inference for latent 0:54:45.760,0:54:49.119 -variable energy based model which allow - -0:54:48.000,0:54:51.920 -you to capture +variable energy based model which allow you to capture 0:54:49.119,0:54:53.040 -this multi multi modality of you know - -0:54:51.920,0:54:56.720 -multi +this multi multi modality of you know multi 0:54:53.040,0:54:58.000 -multi modality of coexistence of things - -0:54:56.720,0:54:59.200 -right you can you don't have simply +multi modality of coexistence of things right you can you don't have simply 0:54:58.000,0:55:01.760 -vector to vector you have - -0:54:59.200,0:55:02.960 -one too many options right and then we +vector to vector you have one too many options right and then we 0:55:01.760,0:55:06.400 -talk about - -0:55:02.960,0:55:09.599 -uh we talk about this stuff here now +talk about uh we talk about this stuff here now 0:55:06.400,0:55:12.799 -and how we can possibly try to learn - -0:55:09.599,0:55:15.359 -this combination of x y uh x y +and how we can possibly try to learn this combination of x y uh x y 0:55:12.799,0:55:18.000 -combination right - -0:55:15.359,0:55:20.480 -uh so there's a question so minimizing +combination right uh so there's a question so minimizing 0:55:18.000,0:55:24.480 -energy regarding to - -0:55:20.480,0:55:28.480 -train manifold basically means denoising +energy regarding to train manifold basically means denoising 0:55:24.480,0:55:29.760 -uh i think you can think about that as - -0:55:28.480,0:55:33.680 -yeah in that way +uh i think you can think about that as yeah in that way 0:55:29.760,0:55:35.040 -so the real manifold - -0:55:33.680,0:55:37.119 -okay depends which one is the real +so the real manifold okay depends which one is the real 0:55:35.040,0:55:40.079 -manifold right so if the model - -0:55:37.119,0:55:42.240 -has learned the the real manifold then +manifold right so if the model has learned the the real manifold then 0:55:40.079,0:55:44.240 -you know by minimizing the energy - -0:55:42.240,0:55:46.079 -you can find what is the denoised +you know by minimizing the energy you can find what is the denoised 0:55:44.240,0:55:48.000 -version of your input - -0:55:46.079,0:55:49.200 -another option you have to denoise this +version of your input another option you have to denoise this 0:55:48.000,0:55:50.079 -stuff is going to be if you find - -0:55:49.200,0:55:51.520 -yourself here +stuff is going to be if you find yourself here 0:55:50.079,0:55:53.280 -you can compute this energy you can - -0:55:51.520,0:55:54.799 -follow the gradient and then here you +you can compute this energy you can follow the gradient and then here you 0:55:53.280,0:55:56.960 -can recompute the energy - -0:55:54.799,0:55:58.960 -you can still go and follow the energy +can recompute the energy you can still go and follow the energy 0:55:56.960,0:56:01.119 -the gradient so you can end up boom - -0:55:58.960,0:56:03.680 -down on the manifold right so you can +the gradient so you can end up boom down on the manifold right so you can 0:56:01.119,0:56:06.079 -make little steps so you can just go - -0:56:03.680,0:56:07.599 -uh you know you can find out where to go +make little steps so you can just go uh you know you can find out where to go 0:56:06.079,0:56:10.640 -or you can use the - -0:56:07.599,0:56:13.599 -uh you know the z check +or you can use the uh you know the z check 0:56:10.640,0:56:15.520 -to find out what is the uh best - -0:56:13.599,0:56:17.839 -approximation of your point over here +to find out what is the uh best approximation of your point over here 0:56:15.520,0:56:17.839 -right - -0:56:18.160,0:56:22.799 -okay all right so that was it um thank +right okay all right so that was it um thank 0:56:21.280,0:56:23.520 -you for listening you have a nice - -0:56:22.799,0:56:25.839 -evening +you for listening you have a nice evening 0:56:23.520,0:56:27.200 -i see on friday i feel free to ask jan - -0:56:25.839,0:56:30.240 -questions about this +i see on friday i feel free to ask jan questions about this 0:56:27.200,0:56:30.640 -uh this this practicum he he was you - -0:56:30.240,0:56:33.920 -know +uh this this practicum he he was you know 0:56:30.640,0:56:34.400 -helping a lot as well right have a good - -0:56:33.920,0:56:36.880 -night +helping a lot as well right have a good night 0:56:34.400,0:56:36.880 -bye bye - -0:56:37.760,0:56:41.520 -so how can you get more out of this +bye bye so how can you get more out of this 0:56:39.680,0:56:44.400 -lesson today - -0:56:41.520,0:56:44.960 -comprehension if something was not yet +lesson today comprehension if something was not yet 0:56:44.400,0:56:47.760 -clear - -0:56:44.960,0:56:49.440 -you should really ask me uh anything in +clear you should really ask me uh anything in 0:56:47.760,0:56:51.440 -the comment section below okay - -0:56:49.440,0:56:53.440 -i will answer every comment that you +the comment section below okay i will answer every comment that you 0:56:51.440,0:56:55.520 -write over there - -0:56:53.440,0:56:56.960 -news if you would like to keep up with +write over there news if you would like to keep up with 0:56:55.520,0:56:57.440 -everything i post online you should - -0:56:56.960,0:57:01.599 -check +everything i post online you should check 0:56:57.440,0:57:03.839 -my twitter account under alph cnz - -0:57:01.599,0:57:05.200 -if you also would like youtube to notify +my twitter account under alph cnz if you also would like youtube to notify 0:57:03.839,0:57:08.160 -about you the latest - -0:57:05.200,0:57:09.839 -videos i upload then press that +about you the latest videos i upload then press that 0:57:08.160,0:57:11.119 -subscribe button and turn on the - -0:57:09.839,0:57:12.640 -notification bell +subscribe button and turn on the notification bell 0:57:11.119,0:57:15.280 -such that you're not going to be missing - -0:57:12.640,0:57:19.599 -any content if you like this video +such that you're not going to be missing any content if you like this video 0:57:15.280,0:57:22.720 -put a like on it it means a lot to me - -0:57:19.599,0:57:25.200 -searching we have a companion website +put a like on it it means a lot to me searching we have a companion website 0:57:22.720,0:57:26.000 -where you can find each and every video - -0:57:25.200,0:57:29.200 -transcribed +where you can find each and every video transcribed 0:57:26.000,0:57:29.839 -by students that volunteered for example - -0:57:29.200,0:57:32.880 -here +by students that volunteered for example here 0:57:29.839,0:57:35.839 -you can see this lesson transcribed - -0:57:32.880,0:57:36.400 -as you can tell the titles are links +you can see this lesson transcribed as you can tell the titles are links 0:57:35.839,0:57:38.880 -which are - -0:57:36.400,0:57:40.400 -redirecting you to the correct section +which are redirecting you to the correct section 0:57:38.880,0:57:43.599 -in the video - -0:57:40.400,0:57:47.359 -so here we have this lesson transcribed +in the video so here we have this lesson transcribed 0:57:43.599,0:57:49.359 -to you in english moreover - -0:57:47.359,0:57:51.119 -not only english is available as you can +to you in english moreover not only english is available as you can 0:57:49.359,0:57:55.040 -tell here there is the - -0:57:51.119,0:57:56.559 -english flag you can go up on top for +tell here there is the english flag you can go up on top for 0:57:55.040,0:57:59.680 -example and i show you - -0:57:56.559,0:58:01.920 -the home page here you can find that +example and i show you the home page here you can find that 0:57:59.680,0:58:03.040 -many languages are available arabic - -0:58:01.920,0:58:05.280 -spanish version +many languages are available arabic spanish version 0:58:03.040,0:58:06.240 -french italian japanese korean russian - -0:58:05.280,0:58:08.720 -turkish +french italian japanese korean russian turkish 0:58:06.240,0:58:10.400 -and chinese and more are coming if you - -0:58:08.720,0:58:11.119 -would like to contribute and add your +and chinese and more are coming if you would like to contribute and add your 0:58:10.400,0:58:13.520 -own language - -0:58:11.119,0:58:14.720 -don't hesitate to contact me on twitter +own language don't hesitate to contact me on twitter 0:58:13.520,0:58:17.839 -or by email - -0:58:14.720,0:58:19.280 -this is the language part moreover it +or by email this is the language part moreover it 0:58:17.839,0:58:21.599 -really really helps - -0:58:19.280,0:58:22.319 -if you implement things that we cover in +really really helps if you implement things that we cover in 0:58:21.599,0:58:24.960 -class - -0:58:22.319,0:58:25.920 -with file torch and you know a notebook +class with file torch and you know a notebook 0:58:24.960,0:58:28.640 -perhaps - -0:58:25.920,0:58:30.160 -in some patient today class didn't have +perhaps in some patient today class didn't have 0:58:28.640,0:58:32.480 -a companion notebook - -0:58:30.160,0:58:33.440 -but nevertheless i would really +a companion notebook but nevertheless i would really 0:58:32.480,0:58:36.559 -recommend you to - -0:58:33.440,0:58:39.440 -try to put together a few trials +recommend you to try to put together a few trials 0:58:36.559,0:58:40.400 -yourself such that you can test your - -0:58:39.440,0:58:43.119 -knowledge +yourself such that you can test your knowledge 0:58:40.400,0:58:44.240 -finally if you find any bug in the - -0:58:43.119,0:58:46.640 -previous notebooks +finally if you find any bug in the previous notebooks 0:58:44.240,0:58:47.839 -in the website anywhere you're really - -0:58:46.640,0:58:50.160 -encouraged to +in the website anywhere you're really encouraged to 0:58:47.839,0:58:51.359 -point them out on github or if you find - -0:58:50.160,0:58:53.680 -yourself inclined +point them out on github or if you find yourself inclined 0:58:51.359,0:58:55.680 -you can also send a pull request such - -0:58:53.680,0:58:57.760 -that you can be an official contributor +you can also send a pull request such that you can be an official contributor 0:58:55.680,0:59:00.240 -to this project - -0:58:57.760,0:59:01.119 -and don't forget to like share and +to this project and don't forget to like share and 0:59:00.240,0:59:04.240 -subscribe - -0:59:01.119,0:59:06.319 -bye bye - -0:59:04.240,0:59:06.319 -you - +subscribe bye bye diff --git a/docs/en/week15/practicum15B.sbv b/docs/en/week15/practicum15B.sbv index 6b83859d3..a97a511e8 100644 --- a/docs/en/week15/practicum15B.sbv +++ b/docs/en/week15/practicum15B.sbv @@ -1,4794 +1,2396 @@ 0:00:01.920,0:00:08.160 -so we share the screen - -0:00:05.040,0:00:10.240 -and i'm opening the chat +so we share the screen and i'm opening the chat 0:00:08.160,0:00:12.160 -all right so i have the chat open so you - -0:00:10.240,0:00:14.240 -can interact with me +all right so i have the chat open so you can interact with me 0:00:12.160,0:00:16.480 -and so a small recap from last time last - -0:00:14.240,0:00:19.359 -time we've been talking about energy +and so a small recap from last time last time we've been talking about energy 0:00:16.480,0:00:20.080 -uh and actually we've been talking about - -0:00:19.359,0:00:23.119 -inference +uh and actually we've been talking about inference 0:00:20.080,0:00:24.560 -how to find set how to find y check uh - -0:00:23.119,0:00:28.160 -how to compute y +how to find set how to find y check uh how to compute y 0:00:24.560,0:00:30.080 -f and e okay and so let me just start - -0:00:28.160,0:00:32.160 -i guess with the the last slide from +f and e okay and so let me just start i guess with the the last slide from 0:00:30.080,0:00:35.120 -last time so we - -0:00:32.160,0:00:35.920 -had computed this f infinity uh which is +last time so we had computed this f infinity uh which is 0:00:35.120,0:00:39.680 -called - -0:00:35.920,0:00:42.480 -uh zero temperature limit uh free energy +called uh zero temperature limit uh free energy 0:00:39.680,0:00:44.239 -uh as a function of my y and y is going - -0:00:42.480,0:00:46.399 -to be a two dimensional +uh as a function of my y and y is going to be a two dimensional 0:00:44.239,0:00:47.360 -vector right so whenever i'm going to be - -0:00:46.399,0:00:49.520 -plotting this f +vector right so whenever i'm going to be plotting this f 0:00:47.360,0:00:51.520 -infinity of y it's going to be a scalar - -0:00:49.520,0:00:55.120 -field means this a height +infinity of y it's going to be a scalar field means this a height 0:00:51.520,0:00:57.840 -over like a 2d region okay - -0:00:55.120,0:00:59.039 -so we saw already this stuff that since +over like a 2d region okay so we saw already this stuff that since 0:00:57.840,0:01:00.800 -it's gonna have different height i'm - -0:00:59.039,0:01:03.840 -gonna represent with the +it's gonna have different height i'm gonna represent with the 0:01:00.800,0:01:07.760 -color purple the height equals zero - -0:01:03.840,0:01:09.760 -and then color equal green for +color purple the height equals zero and then color equal green for 0:01:07.760,0:01:10.960 -equal one and then everything that is - -0:01:09.760,0:01:14.240 -above and beyond +equal one and then everything that is above and beyond 0:01:10.960,0:01:14.799 -the free energy equal tool is going to - -0:01:14.240,0:01:18.720 -be in +the free energy equal tool is going to be in 0:01:14.799,0:01:22.159 -yellow okay and so - -0:01:18.720,0:01:23.920 -this is how this stuff looks i +yellow okay and so this is how this stuff looks i 0:01:22.159,0:01:26.240 -would like to remind you that this free - -0:01:23.920,0:01:29.280 -energy was the quadratic +would like to remind you that this free energy was the quadratic 0:01:26.240,0:01:31.600 -uh a cliton distance from the model - -0:01:29.280,0:01:35.119 -manifold right so all points that are +uh a cliton distance from the model manifold right so all points that are 0:01:31.600,0:01:36.720 -within the um within the model manifold - -0:01:35.119,0:01:39.360 -they have zero cost right +within the um within the model manifold they have zero cost right 0:01:36.720,0:01:40.240 -this is sorry zero energy free energy - -0:01:39.360,0:01:41.759 -because again the +this is sorry zero energy free energy because again the 0:01:40.240,0:01:44.240 -the distance between them and the - -0:01:41.759,0:01:45.840 -manifold is zero so zero squared is zero +the distance between them and the manifold is zero so zero squared is zero 0:01:44.240,0:01:47.360 -and then as you move away it's gonna - -0:01:45.840,0:01:50.799 -it's gonna increase up +and then as you move away it's gonna it's gonna increase up 0:01:47.360,0:01:54.159 -uh quadratically so - -0:01:50.799,0:01:55.920 -uh so far everything should be uh +uh quadratically so uh so far everything should be uh 0:01:54.159,0:01:58.000 -known understood and you know you you - -0:01:55.920,0:01:58.719 -took yeah you had one week to to go over +known understood and you know you you took yeah you had one week to to go over 0:01:58.000,0:02:01.759 -this stuff so - -0:01:58.719,0:02:04.079 -i i assume everyone is quite familiar +this stuff so i i assume everyone is quite familiar 0:02:01.759,0:02:06.079 -so something that you may notice right - -0:02:04.079,0:02:07.840 -now is gonna be in the side +so something that you may notice right now is gonna be in the side 0:02:06.079,0:02:11.039 -of these ellipse you're going to have - -0:02:07.840,0:02:13.200 -like a region that is slightly +of these ellipse you're going to have like a region that is slightly 0:02:11.039,0:02:14.879 -slightly lighter right you can see a - -0:02:13.200,0:02:17.680 -lighter degree of +slightly lighter right you can see a lighter degree of 0:02:14.879,0:02:18.879 -purple so what's going on over there so - -0:02:17.680,0:02:22.000 -let me show you this +purple so what's going on over there so let me show you this 0:02:18.879,0:02:23.440 -uh image here uh with the height - -0:02:22.000,0:02:25.280 -you know proportional to the actual +uh image here uh with the height you know proportional to the actual 0:02:23.440,0:02:28.160 -height of this um - -0:02:25.280,0:02:30.080 -of this free energy okay so i'm gonna +height of this um of this free energy okay so i'm gonna 0:02:28.160,0:02:32.319 -change the color map such that - -0:02:30.080,0:02:34.239 -uh you can clearly see what's going on +change the color map such that uh you can clearly see what's going on 0:02:32.319,0:02:34.720 -and i'm gonna be using this one which is - -0:02:34.239,0:02:37.840 -called +and i'm gonna be using this one which is called 0:02:34.720,0:02:40.400 -cold warm so cold means like - -0:02:37.840,0:02:42.080 -f infinity equals zero i'm going to be +cold warm so cold means like f infinity equals zero i'm going to be 0:02:40.400,0:02:45.120 -using the blue color - -0:02:42.080,0:02:45.680 -for f infinity equal 0.5 i'm gonna be +using the blue color for f infinity equal 0.5 i'm gonna be 0:02:45.120,0:02:48.400 -using - -0:02:45.680,0:02:50.480 -a gray color and then for everything +using a gray color and then for everything 0:02:48.400,0:02:55.120 -that is above and beyond - -0:02:50.480,0:02:57.680 -f infinity one is going to be in red +that is above and beyond f infinity one is going to be in red 0:02:55.120,0:02:59.680 -and so this is going to be um the the - -0:02:57.680,0:03:00.720 -image you saw before now that was like +and so this is going to be um the the image you saw before now that was like 0:02:59.680,0:03:03.120 -simply saw from - -0:03:00.720,0:03:04.080 -uh from top here i'm gonna show you the +simply saw from uh from top here i'm gonna show you the 0:03:03.120,0:03:07.360 -contour - -0:03:04.080,0:03:08.959 -so each uh line here they share the same +contour so each uh line here they share the same 0:03:07.360,0:03:11.519 -value of the free energy - -0:03:08.959,0:03:13.760 -okay so let me spin this little guy so +value of the free energy okay so let me spin this little guy so 0:03:11.519,0:03:16.560 -it's that you can see all around - -0:03:13.760,0:03:17.120 -as you can tell all the regions like the +it's that you can see all around as you can tell all the regions like the 0:03:16.560,0:03:20.400 -height - -0:03:17.120,0:03:22.239 -around the the the ellipse that is +height around the the the ellipse that is 0:03:20.400,0:03:23.440 -with the the manifold ellipse is gonna - -0:03:22.239,0:03:25.280 -have zero energy +with the the manifold ellipse is gonna have zero energy 0:03:23.440,0:03:27.440 -and as you move away from that you're - -0:03:25.280,0:03:29.599 -gonna have like a quadratic thing right +and as you move away from that you're gonna have like a quadratic thing right 0:03:27.440,0:03:33.519 -so you're gonna have like a parabola - -0:03:29.599,0:03:35.120 -uh what you notice is that in the center +so you're gonna have like a parabola uh what you notice is that in the center 0:03:33.519,0:03:36.799 -so on the outside of course is going to - -0:03:35.120,0:03:38.400 -be like a parabola but in the center +so on the outside of course is going to be like a parabola but in the center 0:03:36.799,0:03:39.840 -those two things are going to be going - -0:03:38.400,0:03:43.200 -up on a peak +those two things are going to be going up on a peak 0:03:39.840,0:03:46.720 -right and this might - -0:03:43.200,0:03:48.879 -or might not be wanted and so the +right and this might or might not be wanted and so the 0:03:46.720,0:03:50.879 -this we're gonna start today lesson by - -0:03:48.879,0:03:53.920 -learning how to relax +this we're gonna start today lesson by learning how to relax 0:03:50.879,0:03:56.000 -this uh free energy this infinite zero - -0:03:53.920,0:03:58.879 -temperature limit free energy +this uh free energy this infinite zero temperature limit free energy 0:03:56.000,0:03:59.599 -to a more you know uh a free energy - -0:03:58.879,0:04:01.599 -without +to a more you know uh a free energy without 0:03:59.599,0:04:03.920 -local minima such that you know it's a - -0:04:01.599,0:04:06.080 -bit more smooth +local minima such that you know it's a bit more smooth 0:04:03.920,0:04:07.599 -let me take here a cross section of this - -0:04:06.080,0:04:10.159 -you know bathtub +let me take here a cross section of this you know bathtub 0:04:07.599,0:04:10.640 -for y one equals zero so i'm gonna be - -0:04:10.159,0:04:13.680 -chaff +for y one equals zero so i'm gonna be chaff 0:04:10.640,0:04:15.519 -chopping it in a correspondence of y one - -0:04:13.680,0:04:17.359 -equals zero +chopping it in a correspondence of y one equals zero 0:04:15.519,0:04:19.359 -so what we get is going to be the - -0:04:17.359,0:04:22.240 -following you're gonna see now +so what we get is going to be the following you're gonna see now 0:04:19.359,0:04:24.080 -that those two branches are gonna be my - -0:04:22.240,0:04:26.400 -parabolic branches right +that those two branches are gonna be my parabolic branches right 0:04:24.080,0:04:27.520 -so again what is this free energy free - -0:04:26.400,0:04:30.800 -energy +so again what is this free energy free energy 0:04:27.520,0:04:31.919 -was the square distance of your given - -0:04:30.800,0:04:34.720 -point +was the square distance of your given point 0:04:31.919,0:04:35.360 -to the closest point on the manifold - -0:04:34.720,0:04:37.600 -right +to the closest point on the manifold right 0:04:35.360,0:04:38.759 -so if you're on the manifold which is - -0:04:37.600,0:04:42.400 -like location +so if you're on the manifold which is like location 0:04:38.759,0:04:43.120 -0.4 for example then the distance - -0:04:42.400,0:04:45.840 -between you +0.4 for example then the distance between you 0:04:43.120,0:04:47.280 -and the manifold is going to be zero and - -0:04:45.840,0:04:48.160 -therefore the square of zero is going to +and the manifold is going to be zero and therefore the square of zero is going to 0:04:47.280,0:04:50.240 -be zero - -0:04:48.160,0:04:52.240 -as you move away let's say we move to +be zero as you move away let's say we move to 0:04:50.240,0:04:55.520 -the right hand side of this - -0:04:52.240,0:04:56.080 -0.4 as you move linearly to the right +the right hand side of this 0.4 as you move linearly to the right 0:04:55.520,0:04:57.759 -hand side - -0:04:56.080,0:04:59.360 -you're going to be increasing +hand side you're going to be increasing 0:04:57.759,0:05:00.320 -quadratically right that's why we - -0:04:59.360,0:05:02.960 -observe +quadratically right that's why we observe 0:05:00.320,0:05:04.320 -this energy free energy going up - -0:05:02.960,0:05:06.720 -quadratically +this energy free energy going up quadratically 0:05:04.320,0:05:07.919 -similarly what happens on the other side - -0:05:06.720,0:05:09.919 -of course +similarly what happens on the other side of course 0:05:07.919,0:05:11.919 -the same happens as you move towards the - -0:05:09.919,0:05:13.120 -zero right and so as you move towards +the same happens as you move towards the zero right and so as you move towards 0:05:11.919,0:05:16.560 -the zero you're gonna get - -0:05:13.120,0:05:19.120 -that you try to climb up that parabola +the zero you're gonna get that you try to climb up that parabola 0:05:16.560,0:05:19.840 -and we have this peak over here and so - -0:05:19.120,0:05:21.680 -in the next +and we have this peak over here and so in the next 0:05:19.840,0:05:24.160 -slide we're gonna be learning how to - -0:05:21.680,0:05:26.720 -smooth that peak +slide we're gonna be learning how to smooth that peak 0:05:24.160,0:05:27.840 -i'll let you i tell you later why we uh - -0:05:26.720,0:05:30.560 -what is very why +i'll let you i tell you later why we uh what is very why 0:05:27.840,0:05:31.039 -this is very useful like why we why my - -0:05:30.560,0:05:34.560 -why +this is very useful like why we why my why 0:05:31.039,0:05:38.479 -we might want to do so okay - -0:05:34.560,0:05:39.120 -so free energy we we know right the the +we might want to do so okay so free energy we we know right the the 0:05:38.479,0:05:42.400 -minimum - -0:05:39.120,0:05:45.440 -value of the energy e that +minimum value of the energy e that 0:05:42.400,0:05:47.759 -is spanning across y and z right so - -0:05:45.440,0:05:50.639 -you have this energy we saw that uh for +is spanning across y and z right so you have this energy we saw that uh for 0:05:47.759,0:05:53.600 -a given y we have like an energy over z - -0:05:50.639,0:05:55.199 -and then the free energy was the value +a given y we have like an energy over z and then the free energy was the value 0:05:53.600,0:05:58.000 -of the energy correspondent - -0:05:55.199,0:05:59.600 -to the location where we have the +of the energy correspondent to the location where we have the 0:05:58.000,0:06:00.479 -minimum value right so the minimum value - -0:05:59.600,0:06:04.080 -of this +minimum value right so the minimum value of this 0:06:00.479,0:06:05.759 -e is going to be my free energy - -0:06:04.080,0:06:08.720 -now i'm going to be introducing a +e is going to be my free energy now i'm going to be introducing a 0:06:05.759,0:06:12.479 -relaxed version which is going to be - -0:06:08.720,0:06:14.880 -this uh purple f so +relaxed version which is going to be this uh purple f so 0:06:12.479,0:06:15.600 -this purple f function parameterized by - -0:06:14.880,0:06:19.919 -beta +this purple f function parameterized by beta 0:06:15.600,0:06:21.199 -is going to be simply this expression uh - -0:06:19.919,0:06:24.400 -what is this beta +is going to be simply this expression uh what is this beta 0:06:21.199,0:06:26.479 -right so this beta it's in - -0:06:24.400,0:06:27.680 -physics it's called the inverse +right so this beta it's in physics it's called the inverse 0:06:26.479,0:06:29.759 -temperature - -0:06:27.680,0:06:31.360 -the thermo thermodynamic inverse +temperature the thermo thermodynamic inverse 0:06:29.759,0:06:34.560 -temperature or the - -0:06:31.360,0:06:37.360 -coldness and it's simply one over +temperature or the coldness and it's simply one over 0:06:34.560,0:06:40.080 -uh kb which is the boltzmann constant - -0:06:37.360,0:06:42.560 -multiplied by the temperature okay +uh kb which is the boltzmann constant multiplied by the temperature okay 0:06:40.080,0:06:44.720 -so again if t that capital t the - -0:06:42.560,0:06:46.479 -temperature is very very very high +so again if t that capital t the temperature is very very very high 0:06:44.720,0:06:48.080 -like it's very warm like you're on the - -0:06:46.479,0:06:52.160 -sun beta is gonna be +like it's very warm like you're on the sun beta is gonna be 0:06:48.080,0:06:54.639 -extremely small right it's gonna be zero - -0:06:52.160,0:06:55.919 -instead if temperature the temperature +extremely small right it's gonna be zero instead if temperature the temperature 0:06:54.639,0:06:58.400 -is cold like - -0:06:55.919,0:06:59.599 -zero kelvin then automatically you're +is cold like zero kelvin then automatically you're 0:06:58.400,0:07:03.520 -gonna get that beta - -0:06:59.599,0:07:06.720 -it's plus infinity right and so +gonna get that beta it's plus infinity right and so 0:07:03.520,0:07:10.240 -now you can understand why - -0:07:06.720,0:07:12.080 -i call my f infinity the zero +now you can understand why i call my f infinity the zero 0:07:10.240,0:07:15.039 -temperature limit - -0:07:12.080,0:07:16.160 -free energy so it's zero temperature +temperature limit free energy so it's zero temperature 0:07:15.039,0:07:18.800 -it's super cold right - -0:07:16.160,0:07:19.840 -so capital t is zero meaning beta is +it's super cold right so capital t is zero meaning beta is 0:07:18.800,0:07:22.080 -plus infinity - -0:07:19.840,0:07:24.240 -so again if you have this free energy +plus infinity so again if you have this free energy 0:07:22.080,0:07:25.680 -with so-called free energy - -0:07:24.240,0:07:28.160 -the free energy is going to be exactly +with so-called free energy the free energy is going to be exactly 0:07:25.680,0:07:30.160 -the minimum otherwise if you relax - -0:07:28.160,0:07:31.840 -this constraint as you warm up a little +the minimum otherwise if you relax this constraint as you warm up a little 0:07:30.160,0:07:33.840 -bit this free energy - -0:07:31.840,0:07:35.680 -the free energy is going to be a +bit this free energy the free energy is going to be a 0:07:33.840,0:07:38.800 -summation of multiple - -0:07:35.680,0:07:40.960 -things right so this s here is the s for +summation of multiple things right so this s here is the s for 0:07:38.800,0:07:43.680 -sum is a summation of all these - -0:07:40.960,0:07:46.720 -components here multiplied by the +sum is a summation of all these components here multiplied by the 0:07:43.680,0:07:48.879 -interval cool - -0:07:46.720,0:07:50.080 -this symbol over here it's simply the +interval cool this symbol over here it's simply the 0:07:48.879,0:07:53.280 -measure - -0:07:50.080,0:07:56.319 -of the domain of z so in our case +measure of the domain of z so in our case 0:07:53.280,0:07:56.960 -uh z goes from zero to two pi and - -0:07:56.319,0:08:00.000 -therefore +uh z goes from zero to two pi and therefore 0:07:56.960,0:08:00.560 -this item over here it simply means two - -0:08:00.000,0:08:03.680 -pi +this item over here it simply means two pi 0:08:00.560,0:08:07.520 -okay all right all right but who - -0:08:03.680,0:08:09.759 -who remembers what this kbt is right +okay all right all right but who who remembers what this kbt is right 0:08:07.520,0:08:11.199 -what is this kbt why are we talking - -0:08:09.759,0:08:14.160 -about energies right +what is this kbt why are we talking about energies right 0:08:11.199,0:08:14.960 -so again from physics no 101 you might - -0:08:14.160,0:08:18.000 -remember that +so again from physics no 101 you might remember that 0:08:14.960,0:08:20.319 -the average - -0:08:18.000,0:08:21.039 -translational kinetic energy was the two +the average translational kinetic energy was the two 0:08:20.319,0:08:25.039 -third - -0:08:21.039,0:08:25.840 -kbt no and therefore kbt or two third +third kbt no and therefore kbt or two third 0:08:25.039,0:08:29.199 -kbt - -0:08:25.840,0:08:30.080 -express the uh kinetic energy right of +kbt express the uh kinetic energy right of 0:08:29.199,0:08:33.200 -this - -0:08:30.080,0:08:35.680 -let's say gas with all those particles +this let's say gas with all those particles 0:08:33.200,0:08:36.959 -and so the temperature allows you to - -0:08:35.680,0:08:38.959 -express +and so the temperature allows you to express 0:08:36.959,0:08:41.360 -uh the energy right so you have - -0:08:38.959,0:08:44.560 -temperature and energy are connected +uh the energy right so you have temperature and energy are connected 0:08:41.360,0:08:46.720 -um so you can make uh - -0:08:44.560,0:08:48.800 -a quick you know check check here and +um so you can make uh a quick you know check check here and 0:08:46.720,0:08:51.920 -beta since it's going to be the inverse - -0:08:48.800,0:08:55.600 -of kbt it's going to be in one over +beta since it's going to be the inverse of kbt it's going to be in one over 0:08:51.920,0:08:58.800 -joule right and so here we have these - -0:08:55.600,0:09:00.800 -one over joule means that this stuff is +joule right and so here we have these one over joule means that this stuff is 0:08:58.800,0:09:01.839 -joule therefore f is going to be an - -0:09:00.800,0:09:04.080 -energy +joule therefore f is going to be an energy 0:09:01.839,0:09:04.880 -and then inside this exponential we have - -0:09:04.080,0:09:07.680 -one over +and then inside this exponential we have one over 0:09:04.880,0:09:09.279 -joule times the e which is joule and - -0:09:07.680,0:09:12.399 -then if you multiply the two you +joule times the e which is joule and then if you multiply the two you 0:09:09.279,0:09:12.800 -then the two you know units cancel out - -0:09:12.399,0:09:16.000 -so +then the two you know units cancel out so 0:09:12.800,0:09:18.560 -everything works just fine - -0:09:16.000,0:09:19.360 -all right all right all right um and +everything works just fine all right all right all right um and 0:09:18.560,0:09:21.760 -also yes - -0:09:19.360,0:09:23.279 -the the dimension of z cancel out with +also yes the the dimension of z cancel out with 0:09:21.760,0:09:24.080 -this dimension right so everything is - -0:09:23.279,0:09:27.360 -just pure +this dimension right so everything is just pure 0:09:24.080,0:09:29.040 -uh pure number okay again these are this - -0:09:27.360,0:09:30.640 -is not machine learning this is physics +uh pure number okay again these are this is not machine learning this is physics 0:09:29.040,0:09:32.320 -just to give you a little bit of you - -0:09:30.640,0:09:34.800 -know uh +just to give you a little bit of you know uh 0:09:32.320,0:09:36.320 -overview about what this stuff where - -0:09:34.800,0:09:38.000 -this stuff comes from right so this is +overview about what this stuff where this stuff comes from right so this is 0:09:36.320,0:09:39.839 -just from our friends from the physics - -0:09:38.000,0:09:41.600 -department +just from our friends from the physics department 0:09:39.839,0:09:43.600 -all right all right all right so i want - -0:09:41.600,0:09:45.519 -to compute this free energy in this +all right all right all right so i want to compute this free energy in this 0:09:43.600,0:09:48.720 -relaxed version of this - -0:09:45.519,0:09:50.399 -uh free energy uh since i don't want to +relaxed version of this uh free energy uh since i don't want to 0:09:48.720,0:09:53.360 -compute this integral - -0:09:50.399,0:09:55.440 -i may not know how to do that i simply +compute this integral i may not know how to do that i simply 0:09:53.360,0:09:58.959 -use a simple discretization right - -0:09:55.440,0:10:01.519 -and so i replace this latin s with a +use a simple discretization right and so i replace this latin s with a 0:09:58.959,0:10:02.320 -greek s right and then i replace this - -0:10:01.519,0:10:05.040 -latin d +greek s right and then i replace this latin d 0:10:02.320,0:10:06.800 -with a greek t so everything else is - -0:10:05.040,0:10:09.440 -just the same so i go from the +with a greek t so everything else is just the same so i go from the 0:10:06.800,0:10:11.680 -time continuous to a discretization very - -0:10:09.440,0:10:13.600 -simple discretization it works +time continuous to a discretization very simple discretization it works 0:10:11.680,0:10:15.519 -in our case because z is like one - -0:10:13.600,0:10:18.720 -dimension i saw you know everything is +in our case because z is like one dimension i saw you know everything is 0:10:15.519,0:10:22.160 -pretty easy uh - -0:10:18.720,0:10:24.480 -moreover here i will just define and +pretty easy uh moreover here i will just define and 0:10:22.160,0:10:25.760 -pay attention i am defining right now - -0:10:24.480,0:10:29.200 -for this class +pay attention i am defining right now for this class 0:10:25.760,0:10:32.880 -okay this thing uh has been the - -0:10:29.200,0:10:36.480 -soft mean of e so my +okay this thing uh has been the soft mean of e so my 0:10:32.880,0:10:39.120 -free energy uh the purple one - -0:10:36.480,0:10:39.839 -it's simply the relaxation of the zero +free energy uh the purple one it's simply the relaxation of the zero 0:10:39.120,0:10:42.079 -temperature - -0:10:39.839,0:10:43.120 -limit is going to be simply this soft +temperature limit is going to be simply this soft 0:10:42.079,0:10:45.360 -mean so - -0:10:43.120,0:10:46.560 -the zero temperature the super cold one +mean so the zero temperature the super cold one 0:10:45.360,0:10:50.000 -is simply the mean - -0:10:46.560,0:10:52.480 -okay am i n min whereas if i +is simply the mean okay am i n min whereas if i 0:10:50.000,0:10:53.120 -compute if i relax if i turn on the - -0:10:52.480,0:10:55.680 -temperature +compute if i relax if i turn on the temperature 0:10:53.120,0:10:56.640 -like i increase the thermostat i'm gonna - -0:10:55.680,0:10:59.760 -have this +like i increase the thermostat i'm gonna have this 0:10:56.640,0:11:02.399 -soft mean which is this log - -0:10:59.760,0:11:03.360 -summation of exponential okay and i call +soft mean which is this log summation of exponential okay and i call 0:11:02.399,0:11:05.920 -this actual - -0:11:03.360,0:11:07.279 -soft mean why do i call it actual soft +this actual soft mean why do i call it actual soft 0:11:05.920,0:11:10.560 -meme because other people - -0:11:07.279,0:11:12.720 -uh most of the people outside this class +meme because other people uh most of the people outside this class 0:11:10.560,0:11:14.000 -will call this the soft means something - -0:11:12.720,0:11:16.480 -else and i will let you +will call this the soft means something else and i will let you 0:11:14.000,0:11:18.480 -know a bit more about these in a few - -0:11:16.480,0:11:21.920 -slides okay +know a bit more about these in a few slides okay 0:11:18.480,0:11:25.680 -something that is super interesting is - -0:11:21.920,0:11:29.440 -computing the limit of this free energy +something that is super interesting is computing the limit of this free energy 0:11:25.680,0:11:31.440 -here for beta that goes to zero so - -0:11:29.440,0:11:33.120 -whenever you increase the temperature +here for beta that goes to zero so whenever you increase the temperature 0:11:31.440,0:11:35.279 -as the temperature on the sun like it's - -0:11:33.120,0:11:39.200 -super warm what is the most +as the temperature on the sun like it's super warm what is the most 0:11:35.279,0:11:41.920 -relaxed version of this min - -0:11:39.200,0:11:43.360 -and so if you do that you're gonna see +relaxed version of this min and so if you do that you're gonna see 0:11:41.920,0:11:46.480 -that - -0:11:43.360,0:11:48.320 -this stuff ends up being the average but +that this stuff ends up being the average but 0:11:46.480,0:11:50.720 -again this is just you know - -0:11:48.320,0:11:51.519 -um it's not relevant it's not too +again this is just you know um it's not relevant it's not too 0:11:50.720,0:11:54.720 -important - -0:11:51.519,0:11:55.120 -uh is the derivation just i can show you +important uh is the derivation just i can show you 0:11:54.720,0:11:56.959 -here - -0:11:55.120,0:11:58.240 -and i just show you so you can you have +here and i just show you so you can you have 0:11:56.959,0:12:01.279 -access later - -0:11:58.240,0:12:03.920 -uh the limit of this free energy +access later uh the limit of this free energy 0:12:01.279,0:12:04.399 -for beta that goes to zero so it's very - -0:12:03.920,0:12:06.959 -warm +for beta that goes to zero so it's very warm 0:12:04.399,0:12:07.920 -super warm it ends up being simply the - -0:12:06.959,0:12:11.839 -average +super warm it ends up being simply the average 0:12:07.920,0:12:15.279 -of the energy okay across those heads - -0:12:11.839,0:12:17.200 -again not to uh you don't have to get +of the energy okay across those heads again not to uh you don't have to get 0:12:15.279,0:12:19.440 -scared about that math - -0:12:17.200,0:12:20.800 -all right so let's compute this free +scared about that math all right so let's compute this free 0:12:19.440,0:12:23.040 -energy for - -0:12:20.800,0:12:24.079 -the cases we saw before right so we are +energy for the cases we saw before right so we are 0:12:23.040,0:12:25.920 -still doing inference - -0:12:24.079,0:12:27.760 -as last time but instead of using the +still doing inference as last time but instead of using the 0:12:25.920,0:12:29.360 -cold inference the cold free energy - -0:12:27.760,0:12:30.560 -we're going to use this you know relaxed +cold inference the cold free energy we're going to use this you know relaxed 0:12:29.360,0:12:34.000 -version - -0:12:30.560,0:12:37.279 -for the y equal 23 so if you remember +version for the y equal 23 so if you remember 0:12:34.000,0:12:39.440 -the y equal 23 was this x - -0:12:37.279,0:12:42.160 -the green x on the right hand side and +the y equal 23 was this x the green x on the right hand side and 0:12:39.440,0:12:44.800 -then here the free energy was the square - -0:12:42.160,0:12:46.560 -of the distance between the blue x and +then here the free energy was the square of the distance between the blue x and 0:12:44.800,0:12:49.440 -the green x right so the - -0:12:46.560,0:12:52.240 -the distance was 0.5 square would would +the green x right so the the distance was 0.5 square would would 0:12:49.440,0:12:53.519 -have been 0.25 and that would have been - -0:12:52.240,0:12:56.240 -the free energy +have been 0.25 and that would have been the free energy 0:12:53.519,0:12:57.040 -uh zero temperature zero zero - -0:12:56.240,0:12:59.680 -temperature +uh zero temperature zero zero temperature 0:12:57.040,0:13:01.440 -limit free energy but in this case we - -0:12:59.680,0:13:04.399 -have now to consider +limit free energy but in this case we have now to consider 0:13:01.440,0:13:05.120 -all these contributions and so i'm gonna - -0:13:04.399,0:13:08.959 -show you +all these contributions and so i'm gonna show you 0:13:05.120,0:13:12.560 -how all those little - -0:13:08.959,0:13:15.920 -z dz will contribute to this free energy +how all those little z dz will contribute to this free energy 0:13:12.560,0:13:19.519 -and so we choose a beta equal 1 - -0:13:15.920,0:13:21.760 -and we have now this so given that y +and so we choose a beta equal 1 and we have now this so given that y 0:13:19.519,0:13:23.760 -prime is going to be this x on the right - -0:13:21.760,0:13:27.600 -hand side +prime is going to be this x on the right hand side 0:13:23.760,0:13:31.040 -my free energy now comes from the - -0:13:27.600,0:13:31.920 -addition of all these uh terms here the +my free energy now comes from the addition of all these uh terms here the 0:13:31.040,0:13:35.120 -exponential - -0:13:31.920,0:13:38.399 -of you know minus the energy of +exponential of you know minus the energy of 0:13:35.120,0:13:40.000 -all of this right so all the squares - -0:13:38.399,0:13:41.920 -like the exponential of the negative +all of this right so all the squares like the exponential of the negative 0:13:40.000,0:13:44.560 -squares right - -0:13:41.920,0:13:45.440 -so as you can tell those points that are +squares right so as you can tell those points that are 0:13:44.560,0:13:48.560 -close to the - -0:13:45.440,0:13:50.240 -x will have like a +close to the x will have like a 0:13:48.560,0:13:51.920 -smaller energy and therefore the - -0:13:50.240,0:13:54.480 -exponential will be larger +smaller energy and therefore the exponential will be larger 0:13:51.920,0:13:55.920 -and that's why you can see them but for - -0:13:54.480,0:13:58.320 -energy that are you know +and that's why you can see them but for energy that are you know 0:13:55.920,0:14:00.240 -further away that are very high energy - -0:13:58.320,0:14:01.839 -you do not do the exponential of main +further away that are very high energy you do not do the exponential of main 0:14:00.240,0:14:04.079 -minus and large number you're going to - -0:14:01.839,0:14:07.040 -get basically zero so they don't count +minus and large number you're going to get basically zero so they don't count 0:14:04.079,0:14:09.120 -in this summation in this integral okay - -0:14:07.040,0:14:12.560 -first question for people at home +in this summation in this integral okay first question for people at home 0:14:09.120,0:14:17.600 -to just check if you are following - -0:14:12.560,0:14:20.079 -how that where does 0.75 come from +to just check if you are following how that where does 0.75 come from 0:14:17.600,0:14:21.120 -so where does this value over here come - -0:14:20.079,0:14:23.760 -from +so where does this value over here come from 0:14:21.120,0:14:25.279 -and you're supposed to type on the chart - -0:14:23.760,0:14:27.920 -such that i can read +and you're supposed to type on the chart such that i can read 0:14:25.279,0:14:29.199 -aloud what you're saying so i'm asking - -0:14:27.920,0:14:33.680 -once again +aloud what you're saying so i'm asking once again 0:14:29.199,0:14:36.240 -where does this value over here 075 - -0:14:33.680,0:14:36.240 -come from +where does this value over here 075 come from 0:14:39.440,0:14:45.839 -and someone is to reply - -0:14:47.480,0:14:53.440 -contribution to the energy yes yes no +and someone is to reply contribution to the energy yes yes no 0:14:49.760,0:14:56.399 -the number 075 i i need you to tell me - -0:14:53.440,0:14:58.240 -how to compute 0.75 where does that +the number 075 i i need you to tell me how to compute 0.75 where does that 0:14:56.399,0:15:02.399 -number come from - -0:14:58.240,0:15:04.560 -you have all these closest why +number come from you have all these closest why 0:15:02.399,0:15:05.600 -till then yeah tell me uh how do i how - -0:15:04.560,0:15:08.800 -do i compute +till then yeah tell me uh how do i how do i compute 0:15:05.600,0:15:08.800 -1 over 2 pi no - -0:15:10.880,0:15:17.600 -x okay x minus beta e okay so how much +1 over 2 pi no x okay x minus beta e okay so how much 0:15:15.199,0:15:17.600 -is e - -0:15:19.839,0:15:23.440 -e is the square distance right so how +is e e is the square distance right so how 0:15:22.240,0:15:26.639 -much is it - -0:15:23.440,0:15:30.240 -how much is okay e is 0 25 +much is it how much is okay e is 0 25 0:15:26.639,0:15:34.079 -correct and so e to the minus - -0:15:30.240,0:15:37.120 -0 25 is going to be 0 75 +correct and so e to the minus 0 25 is going to be 0 75 0:15:34.079,0:15:40.160 -correct okay - -0:15:37.120,0:15:43.360 -so jc got the right answer +correct okay so jc got the right answer 0:15:40.160,0:15:43.839 -good job so great now we know where that - -0:15:43.360,0:15:45.680 -number +good job so great now we know where that number 0:15:43.839,0:15:48.000 -comes from so every time you see this - -0:15:45.680,0:15:49.600 -diagram so although it looks very sparse +comes from so every time you see this diagram so although it looks very sparse 0:15:48.000,0:15:51.839 -and pretty and whatever you always have - -0:15:49.600,0:15:52.800 -to pay attention to the number i put on +and pretty and whatever you always have to pay attention to the number i put on 0:15:51.839,0:15:54.560 -this - -0:15:52.800,0:15:57.120 -on the screen right so those numbers are +this on the screen right so those numbers are 0:15:54.560,0:16:00.560 -not random number they are - -0:15:57.120,0:16:03.519 -computed by my computer and you always +not random number they are computed by my computer and you always 0:16:00.560,0:16:04.160 -always always have to check on a piece - -0:16:03.519,0:16:06.720 -of paper +always always have to check on a piece of paper 0:16:04.160,0:16:08.240 -that these numbers make sense because if - -0:16:06.720,0:16:09.600 -they don't make sense +that these numbers make sense because if they don't make sense 0:16:08.240,0:16:11.600 -then you're not understanding what's - -0:16:09.600,0:16:12.560 -going on okay so you have to pay +then you're not understanding what's going on okay so you have to pay 0:16:11.600,0:16:15.600 -attention - -0:16:12.560,0:16:18.240 -to the numbers and you know +attention to the numbers and you know 0:16:15.600,0:16:19.519 -okay i'm a physicist right so you always - -0:16:18.240,0:16:22.480 -i always +okay i'm a physicist right so you always i always 0:16:19.519,0:16:23.839 -uh have in advance in my mind the answer - -0:16:22.480,0:16:26.079 -that my program my +uh have in advance in my mind the answer that my program my 0:16:23.839,0:16:28.000 -network my whatever is supposed to do - -0:16:26.079,0:16:29.600 -right if i make an electronic circuit i +network my whatever is supposed to do right if i make an electronic circuit i 0:16:28.000,0:16:31.360 -must understand - -0:16:29.600,0:16:32.720 -i must know in advance what is the you +must understand i must know in advance what is the you 0:16:31.360,0:16:34.720 -know voltage somewhere - -0:16:32.720,0:16:36.399 -here and there before i actually measure +know voltage somewhere here and there before i actually measure 0:16:34.720,0:16:39.600 -it otherwise uh - -0:16:36.399,0:16:41.759 -you know it don't go much ahead +it otherwise uh you know it don't go much ahead 0:16:39.600,0:16:43.680 -all right all right all right so let's - -0:16:41.759,0:16:46.480 -move on and let's now consider +all right all right all right so let's move on and let's now consider 0:16:43.680,0:16:47.519 -instead the case for when i have y y - -0:16:46.480,0:16:51.279 -prime +instead the case for when i have y y prime 0:16:47.519,0:16:52.240 -equal 10 right so the 10th item so which - -0:16:51.279,0:16:55.839 -is the +equal 10 right so the 10th item so which is the 0:16:52.240,0:16:58.560 -element on the top there so in this case - -0:16:55.839,0:16:59.839 -i'm gonna get that all those points here +element on the top there so in this case i'm gonna get that all those points here 0:16:58.560,0:17:02.160 -will contribute - -0:16:59.839,0:17:06.400 -to the free energy right in this case +will contribute to the free energy right in this case 0:17:02.160,0:17:09.520 -we're gonna have a number to 0.26 - -0:17:06.400,0:17:12.799 -27 okay someone else that is not jesse +we're gonna have a number to 0.26 27 okay someone else that is not jesse 0:17:09.520,0:17:15.039 -uh can write on the chat how much - -0:17:12.799,0:17:17.039 -where that number comes from so where +uh can write on the chat how much where that number comes from so where 0:17:15.039,0:17:18.799 -does 026 come from - -0:17:17.039,0:17:21.439 -i think you must have understand +does 026 come from i think you must have understand 0:17:18.799,0:17:21.439 -understood now - -0:17:22.559,0:17:30.160 -e to the minus 1 kind of yes so +understood now e to the minus 1 kind of yes so 0:17:26.559,0:17:31.600 -so the distance here is 1.1 1.1 square - -0:17:30.160,0:17:34.000 -is going to be 1.2 +so the distance here is 1.1 1.1 square is going to be 1.2 0:17:31.600,0:17:36.160 -and then you take e to the minus 1.2 - -0:17:34.000,0:17:38.640 -which is 0 26 yeah +and then you take e to the minus 1.2 which is 0 26 yeah 0:17:36.160,0:17:39.919 -that's correct all right all right all - -0:17:38.640,0:17:42.960 -right okay so +that's correct all right all right all right okay so 0:17:39.919,0:17:47.840 -next question uh what happens now - -0:17:42.960,0:17:47.840 -if my y prime is going to be the origin +next question uh what happens now if my y prime is going to be the origin 0:17:48.480,0:17:53.760 -so what happens if my y prime is the - -0:17:52.000,0:17:54.559 -origin for the zero temperature you're +so what happens if my y prime is the origin for the zero temperature you're 0:17:53.760,0:17:57.280 -going to get the - -0:17:54.559,0:17:59.440 -square the distance right from either +going to get the square the distance right from either 0:17:57.280,0:18:03.120 -side - -0:17:59.440,0:18:04.400 -in this case what's going to be the main +side in this case what's going to be the main 0:18:03.120,0:18:05.919 -difference if you warm up the - -0:18:04.400,0:18:06.880 -temperature right so you're not zero +difference if you warm up the temperature right so you're not zero 0:18:05.919,0:18:08.240 -temperature it's not - -0:18:06.880,0:18:09.520 -it's not freezing cold we are going to +temperature it's not it's not freezing cold we are going to 0:18:08.240,0:18:10.559 -be increasing a little bit the - -0:18:09.520,0:18:13.919 -temperature +be increasing a little bit the temperature 0:18:10.559,0:18:14.400 -and how is this free energy changing - -0:18:13.919,0:18:19.360 -from +and how is this free energy changing from 0:18:14.400,0:18:19.360 -before anyone can type on the chart - -0:18:20.000,0:18:24.400 -it's symmetric yeah that's perfect how +before anyone can type on the chart it's symmetric yeah that's perfect how 0:18:23.120,0:18:26.320 -do you know - -0:18:24.400,0:18:28.240 -you already saw the slides before oh you +do you know you already saw the slides before oh you 0:18:26.320,0:18:31.919 -actually got it right - -0:18:28.240,0:18:31.919 -okay i assume you got it right +actually got it right okay i assume you got it right 0:18:32.000,0:18:35.360 -all right okay that's perfect yes it's - -0:18:33.760,0:18:38.559 -symmetric right so +all right okay that's perfect yes it's symmetric right so 0:18:35.360,0:18:42.559 -uh a point now inside - -0:18:38.559,0:18:43.120 -oh okay yeah i don't know if it's a he +uh a point now inside oh okay yeah i don't know if it's a he 0:18:42.559,0:18:45.120 -or she - -0:18:43.120,0:18:46.720 -but studied physics in the undergrad +or she but studied physics in the undergrad 0:18:45.120,0:18:49.919 -okay cool - -0:18:46.720,0:18:52.320 -all right uh so +okay cool all right uh so 0:18:49.919,0:18:54.080 -in this case you have again that all - -0:18:52.320,0:18:56.640 -those points on the top on the bottom +in this case you have again that all those points on the top on the bottom 0:18:54.080,0:18:58.720 -will contribute to the free energy uh - -0:18:56.640,0:19:01.760 -given that i choose that y +will contribute to the free energy uh given that i choose that y 0:18:58.720,0:19:04.880 -uh y prime to be in the center okay - -0:19:01.760,0:19:06.320 -all right so that's pretty much oh +uh y prime to be in the center okay all right so that's pretty much oh 0:19:04.880,0:19:09.520 -but why we are talking why are we - -0:19:06.320,0:19:12.640 -talking about this right so we came here +but why we are talking why are we talking about this right so we came here 0:19:09.520,0:19:14.720 -because we had that issue with the - -0:19:12.640,0:19:17.600 -picky picky center right i showed you +because we had that issue with the picky picky center right i showed you 0:19:14.720,0:19:19.360 -before that spinning bathtub - -0:19:17.600,0:19:21.280 -and then the cross-section here that we +before that spinning bathtub and then the cross-section here that we 0:19:19.360,0:19:23.120 -had this picky thing - -0:19:21.280,0:19:24.720 -which was coming from the cold free +had this picky thing which was coming from the cold free 0:19:23.120,0:19:26.799 -energy let's - -0:19:24.720,0:19:29.679 -let me show you what happens now if i +energy let's let me show you what happens now if i 0:19:26.799,0:19:32.880 -choose the warm free energy right - -0:19:29.679,0:19:35.919 -and so if i do that i'm gonna get if i +choose the warm free energy right and so if i do that i'm gonna get if i 0:19:32.880,0:19:35.919 -can scroll my screen - -0:19:37.600,0:19:40.880 -oh you don't see anything okay let me +can scroll my screen oh you don't see anything okay let me 0:19:39.679,0:19:43.840 -click click - -0:19:40.880,0:19:44.400 -okay all right and so the red one was +click click okay all right and so the red one was 0:19:43.840,0:19:47.120 -the - -0:19:44.400,0:19:47.919 -super cold the beta is the coldness +the super cold the beta is the coldness 0:19:47.120,0:19:51.120 -again so - -0:19:47.919,0:19:52.799 -large beta is called and then we +again so large beta is called and then we 0:19:51.120,0:19:54.880 -reduce the coldness so we increase the - -0:19:52.799,0:19:57.840 -temperature and as you can see the +reduce the coldness so we increase the temperature and as you can see the 0:19:54.880,0:19:58.720 -the picky part becomes smooth smooth - -0:19:57.840,0:20:02.400 -smooth +the picky part becomes smooth smooth smooth 0:19:58.720,0:20:05.120 -until it becomes oh - -0:20:02.400,0:20:06.159 -becomes some a parabola with a single +until it becomes oh becomes some a parabola with a single 0:20:05.120,0:20:09.200 -global - -0:20:06.159,0:20:11.600 -minima oh +global minima oh 0:20:09.200,0:20:15.039 -this is coming out to be remember what - -0:20:11.600,0:20:17.200 -happens if beta goes to zero +this is coming out to be remember what happens if beta goes to zero 0:20:15.039,0:20:21.200 -you get the average right so you - -0:20:17.200,0:20:24.559 -actually recover the msc +you get the average right so you actually recover the msc 0:20:21.200,0:20:28.480 -okay i'm just giving like small uh - -0:20:24.559,0:20:31.360 -small like information bits +okay i'm just giving like small uh small like information bits 0:20:28.480,0:20:31.919 -pills whatever but again yeah so - -0:20:31.360,0:20:33.679 -whenever +pills whatever but again yeah so whenever 0:20:31.919,0:20:35.679 -we increase the temperature you're going - -0:20:33.679,0:20:36.799 -to be relaxing until you get just one +we increase the temperature you're going to be relaxing until you get just one 0:20:35.679,0:20:38.320 -single minimum - -0:20:36.799,0:20:40.559 -and then there are no more latent +single minimum and then there are no more latent 0:20:38.320,0:20:43.840 -because we just average out everything - -0:20:40.559,0:20:46.480 -without those weights right anyhow +because we just average out everything without those weights right anyhow 0:20:43.840,0:20:47.440 -uh i i think now if you if you need to - -0:20:46.480,0:20:49.440 -implement this stuff +uh i i think now if you if you need to implement this stuff 0:20:47.440,0:20:52.000 -in pytorch you're gonna be be getting - -0:20:49.440,0:20:54.640 -like quite frustrated because +in pytorch you're gonna be be getting like quite frustrated because 0:20:52.000,0:20:55.280 -they use different names for the things - -0:20:54.640,0:20:58.000 -i just +they use different names for the things i just 0:20:55.280,0:20:59.440 -defined and someone say oh you should - -0:20:58.000,0:21:01.679 -have used their names no +defined and someone say oh you should have used their names no 0:20:59.440,0:21:04.159 -because those are wrong right so i use - -0:21:01.679,0:21:06.400 -the correct name so the one that is +because those are wrong right so i use the correct name so the one that is 0:21:04.159,0:21:07.520 -that makes sense i will try to sell it - -0:21:06.400,0:21:09.760 -to you this way +that makes sense i will try to sell it to you this way 0:21:07.520,0:21:12.159 -so let me explain to you a little bit of - -0:21:09.760,0:21:15.039 -you know what is the nomenclature i use +so let me explain to you a little bit of you know what is the nomenclature i use 0:21:12.159,0:21:15.440 -uh such that it makes sense at least to - -0:21:15.039,0:21:18.240 -me +uh such that it makes sense at least to me 0:21:15.440,0:21:19.760 -otherwise things don't make sense to me - -0:21:18.240,0:21:21.600 -so this is the actual +otherwise things don't make sense to me so this is the actual 0:21:19.760,0:21:23.440 -soft max right not the soft max that - -0:21:21.600,0:21:25.039 -people talk outside this class this is +soft max right not the soft max that people talk outside this class this is 0:21:23.440,0:21:27.840 -the actual soft max - -0:21:25.039,0:21:29.679 -which is this you know one over betas +the actual soft max which is this you know one over betas 0:21:27.840,0:21:31.039 -log of blah blah blah some of the - -0:21:29.679,0:21:34.640 -exponentials +log of blah blah blah some of the exponentials 0:21:31.039,0:21:38.000 -i just expanded these um the previous - -0:21:34.640,0:21:40.000 -uh i just expanded the the one over z +i just expanded these um the previous uh i just expanded the the one over z 0:21:38.000,0:21:42.480 -i took it out right in the in the - -0:21:40.000,0:21:44.159 -logarithm so i just split the two things +i took it out right in the in the logarithm so i just split the two things 0:21:42.480,0:21:46.480 -so how do we implement this stuff in - -0:21:44.159,0:21:49.360 -pytorch well you just use this function +so how do we implement this stuff in pytorch well you just use this function 0:21:46.480,0:21:52.640 -which is called torch dot log sum x - -0:21:49.360,0:21:53.120 -which is this soft max actual softmax +which is called torch dot log sum x which is this soft max actual softmax 0:21:52.640,0:21:55.600 -right - -0:21:53.120,0:21:56.880 -and then plus or minus that additional +right and then plus or minus that additional 0:21:55.600,0:21:58.960 -constant over there - -0:21:56.880,0:22:00.240 -right so this is how you want to use how +constant over there right so this is how you want to use how 0:21:58.960,0:22:03.440 -to implement that - -0:22:00.240,0:22:06.559 -because you know it's numerically stable +to implement that because you know it's numerically stable 0:22:03.440,0:22:08.320 -moreover if you - -0:22:06.559,0:22:10.240 -this is the actual definition of the +moreover if you this is the actual definition of the 0:22:08.320,0:22:12.640 -actual soft min - -0:22:10.240,0:22:14.240 -and you can see this is what i wrote +actual soft min and you can see this is what i wrote 0:22:12.640,0:22:16.240 -before - -0:22:14.240,0:22:17.840 -you can think about that it's very +before you can think about that it's very 0:22:16.240,0:22:19.360 -similar to the softmax right - -0:22:17.840,0:22:21.520 -the actual softmax what's the only +similar to the softmax right the actual softmax what's the only 0:22:19.360,0:22:24.240 -difference there are two minuses right - -0:22:21.520,0:22:24.880 -and so you can do that you can get get +difference there are two minuses right and so you can do that you can get get 0:22:24.240,0:22:26.640 -that away - -0:22:24.880,0:22:28.400 -with you know you put a minus in front +that away with you know you put a minus in front 0:22:26.640,0:22:30.480 -so you cancel the first minus - -0:22:28.400,0:22:31.760 -and you put a minus inside so you cancel +so you cancel the first minus and you put a minus inside so you cancel 0:22:30.480,0:22:34.320 -the other minus - -0:22:31.760,0:22:36.240 -and so you know the soft mean is simply +the other minus and so you know the soft mean is simply 0:22:34.320,0:22:38.400 -uh you can implement it as a - -0:22:36.240,0:22:40.559 -soft max with the two minuses okay +uh you can implement it as a soft max with the two minuses okay 0:22:38.400,0:22:42.320 -against actual softmax - -0:22:40.559,0:22:43.760 -and then someone of course is going to +against actual softmax and then someone of course is going to 0:22:42.320,0:22:46.000 -be asking - -0:22:43.760,0:22:48.480 -but what is the softmax we use in class +be asking but what is the softmax we use in class 0:22:46.000,0:22:51.840 -every time so that one - -0:22:48.480,0:22:55.120 -is actually the soft arc max right +every time so that one is actually the soft arc max right 0:22:51.840,0:22:56.880 -why is that right because a arc max is - -0:22:55.120,0:23:00.000 -going to be like a one hot vector +why is that right because a arc max is going to be like a one hot vector 0:22:56.880,0:23:02.799 -and d1 tells you what is the index - -0:23:00.000,0:23:04.000 -of the element that has the maximum +and d1 tells you what is the index of the element that has the maximum 0:23:02.799,0:23:06.720 -value right - -0:23:04.000,0:23:08.000 -so the max gives you retrieves the +value right so the max gives you retrieves the 0:23:06.720,0:23:10.240 -maximum value - -0:23:08.000,0:23:12.400 -you know and then the arc max is going +maximum value you know and then the arc max is going 0:23:10.240,0:23:14.320 -to tell you where is the index - -0:23:12.400,0:23:15.840 -pointing to that maximum value right so +to tell you where is the index pointing to that maximum value right so 0:23:14.320,0:23:18.000 -this is like a vector - -0:23:15.840,0:23:19.200 -with a one hot vector and the other one +this is like a vector with a one hot vector and the other one 0:23:18.000,0:23:21.440 -is a scalar - -0:23:19.200,0:23:23.600 -similarly whenever i compute this soft +is a scalar similarly whenever i compute this soft 0:23:21.440,0:23:26.159 -max the softer version of the max - -0:23:23.600,0:23:28.400 -now this max is not just the max it's +max the softer version of the max now this max is not just the max it's 0:23:26.159,0:23:30.880 -going to be like a summation of this - -0:23:28.400,0:23:32.640 -uh the logarithm of the summation of the +going to be like a summation of this uh the logarithm of the summation of the 0:23:30.880,0:23:34.400 -exponential right - -0:23:32.640,0:23:36.080 -which you can change the temperature if +exponential right which you can change the temperature if 0:23:34.400,0:23:37.200 -you get the temperature super cold you - -0:23:36.080,0:23:38.720 -retrieve the max +you get the temperature super cold you retrieve the max 0:23:37.200,0:23:41.039 -if you warm up the temperature you get - -0:23:38.720,0:23:43.840 -something like more +if you warm up the temperature you get something like more 0:23:41.039,0:23:45.039 -like a weighted summation and for the - -0:23:43.840,0:23:46.880 -soft dark max +like a weighted summation and for the soft dark max 0:23:45.039,0:23:48.960 -which was like the arc max is the one - -0:23:46.880,0:23:50.640 -hot if it's super cold +which was like the arc max is the one hot if it's super cold 0:23:48.960,0:23:52.080 -it's gonna still be one hot but if you - -0:23:50.640,0:23:53.600 -warm up the temperature +it's gonna still be one hot but if you warm up the temperature 0:23:52.080,0:23:55.679 -you're gonna get a distribution - -0:23:53.600,0:23:58.000 -probability distribution right +you're gonna get a distribution probability distribution right 0:23:55.679,0:23:59.200 -so whenever someone says oh the soft max - -0:23:58.000,0:24:01.360 -gives you the probability distribution +so whenever someone says oh the soft max gives you the probability distribution 0:23:59.200,0:24:04.159 -now that's the soft dark mask okay - -0:24:01.360,0:24:06.000 -uh arc max being the one hot or the zero +now that's the soft dark mask okay uh arc max being the one hot or the zero 0:24:04.159,0:24:06.720 -temperature limited limit gives you the - -0:24:06.000,0:24:08.080 -one hot +temperature limited limit gives you the one hot 0:24:06.720,0:24:10.000 -if you increase the temperature you get - -0:24:08.080,0:24:12.799 -a distribution so finally +if you increase the temperature you get a distribution so finally 0:24:10.000,0:24:14.080 -these are the correct names no one is - -0:24:12.799,0:24:18.159 -using but me +these are the correct names no one is using but me 0:24:14.080,0:24:21.360 -so i hope i didn't create confusion - -0:24:18.159,0:24:24.960 -if i did sorry but still this is the +so i hope i didn't create confusion if i did sorry but still this is the 0:24:21.360,0:24:27.840 -correct way of seeing these things okay - -0:24:24.960,0:24:28.799 -because it makes sense right so again uh +correct way of seeing these things okay because it makes sense right so again uh 0:24:27.840,0:24:31.279 -if you have the - -0:24:28.799,0:24:32.000 -max if you have a function you want to +if you have the max if you have a function you want to 0:24:31.279,0:24:34.640 -find the - -0:24:32.000,0:24:36.080 -max it's here right if you have this +find the max it's here right if you have this 0:24:34.640,0:24:38.240 -function you want to find the mean - -0:24:36.080,0:24:39.679 -you can take the function you flip it +function you want to find the mean you can take the function you flip it 0:24:38.240,0:24:41.039 -you find the max - -0:24:39.679,0:24:42.559 -and then you flip it back again you get +you find the max and then you flip it back again you get 0:24:41.039,0:24:43.039 -the mean right so that's what i show you - -0:24:42.559,0:24:45.600 -here +the mean right so that's what i show you here 0:24:43.039,0:24:46.880 -i show you that soft min is simply the - -0:24:45.600,0:24:50.799 -flipped version +i show you that soft min is simply the flipped version 0:24:46.880,0:24:54.000 -the negative right of the max with a - -0:24:50.799,0:24:55.919 -flipped in argument okay +the negative right of the max with a flipped in argument okay 0:24:54.000,0:24:57.039 -all right all right so enough me talking - -0:24:55.919,0:24:59.919 -about +all right all right so enough me talking about 0:24:57.039,0:25:01.520 -mathematics and things i hope it was - -0:24:59.919,0:25:05.200 -fine +mathematics and things i hope it was fine 0:25:01.520,0:25:08.640 -so this was the part - -0:25:05.200,0:25:09.440 -uh that was concluding the last lesson +so this was the part uh that was concluding the last lesson 0:25:08.640,0:25:12.880 -right so - -0:25:09.440,0:25:14.880 -this is the end of the inference +right so this is the end of the inference 0:25:12.880,0:25:16.000 -and we figured that there is the free - -0:25:14.880,0:25:18.880 -energy +and we figured that there is the free energy 0:25:16.000,0:25:19.679 -there is a very cold one or there is a - -0:25:18.880,0:25:21.520 -warm +there is a very cold one or there is a warm 0:25:19.679,0:25:23.120 -version or there is a very hot version - -0:25:21.520,0:25:24.080 -the hot version is going to be the +version or there is a very hot version the hot version is going to be the 0:25:23.120,0:25:25.760 -average - -0:25:24.080,0:25:27.600 -the warm version is going to be like +average the warm version is going to be like 0:25:25.760,0:25:30.400 -something you you may like - -0:25:27.600,0:25:31.200 -is like this marginalization of the of +something you you may like is like this marginalization of the of 0:25:30.400,0:25:33.360 -the latent - -0:25:31.200,0:25:34.720 -and then the super cold version the zero +the latent and then the super cold version the zero 0:25:33.360,0:25:37.200 -temperature limit is going to be - -0:25:34.720,0:25:39.200 -this exactly the minimum version minimum +temperature limit is going to be this exactly the minimum version minimum 0:25:37.200,0:25:42.480 -value - -0:25:39.200,0:25:45.120 -uh what i showed you was the fact that +value uh what i showed you was the fact that 0:25:42.480,0:25:45.919 -this model is a very poorly trained - -0:25:45.120,0:25:49.120 -model +this model is a very poorly trained model 0:25:45.919,0:25:51.760 -because those low energy - -0:25:49.120,0:25:53.679 -regions were not you know happening +because those low energy regions were not you know happening 0:25:51.760,0:25:56.480 -around this training set right so - -0:25:53.679,0:25:58.080 -let me show you once again uh the same +around this training set right so let me show you once again uh the same 0:25:56.480,0:26:00.480 -and the same diagram i showed you - -0:25:58.080,0:26:02.080 -at the beginning of today's lesson which +and the same diagram i showed you at the beginning of today's lesson which 0:26:00.480,0:26:04.320 -is this one over here - -0:26:02.080,0:26:05.520 -so here i show you with these white +is this one over here so here i show you with these white 0:26:04.320,0:26:08.559 -checks a few - -0:26:05.520,0:26:12.240 -uh samples uh on the +checks a few uh samples uh on the 0:26:08.559,0:26:14.000 -model manifold and then the y's the blue - -0:26:12.240,0:26:15.520 -eyes are the training sample but we +model manifold and then the y's the blue eyes are the training sample but we 0:26:14.000,0:26:17.200 -never use the training sample right i - -0:26:15.520,0:26:20.080 -just use the training sample to +never use the training sample right i just use the training sample to 0:26:17.200,0:26:21.039 -compute the energy the free energy but - -0:26:20.080,0:26:22.799 -we never +compute the energy the free energy but we never 0:26:21.039,0:26:24.480 -use them to learn because we didn't talk - -0:26:22.799,0:26:27.679 -about learning we talked about +use them to learn because we didn't talk about learning we talked about 0:26:24.480,0:26:29.120 -inference so far right and so - -0:26:27.679,0:26:31.600 -guess what is going to be the next part +inference so far right and so guess what is going to be the next part 0:26:29.120,0:26:34.080 -of today's lesson - -0:26:31.600,0:26:35.360 -you guessed it right training so now +of today's lesson you guessed it right training so now 0:26:34.080,0:26:38.640 -we're going to be starting - -0:26:35.360,0:26:39.120 -uh to learn how to train learn how to +we're going to be starting uh to learn how to train learn how to 0:26:38.640,0:26:41.440 -learn - -0:26:39.120,0:26:42.640 -train train how to learn no learn how to +learn train train how to learn no learn how to 0:26:41.440,0:26:45.600 -train - -0:26:42.640,0:26:48.960 -energy based model okay unless there are +train energy based model okay unless there are 0:26:45.600,0:26:48.960 -questions for me on the chat - -0:26:50.320,0:26:55.760 -no questions everything clear meta +questions for me on the chat no questions everything clear meta 0:26:53.039,0:26:57.600 -learning yes - -0:26:55.760,0:26:59.919 -no that one is a different subject next +learning yes no that one is a different subject next 0:26:57.600,0:26:59.919 -time - -0:27:00.159,0:27:04.720 -all right okay so i think yeah there is +time all right okay so i think yeah there is 0:27:02.960,0:27:06.880 -no big deal right so this is just - -0:27:04.720,0:27:08.799 -inference we didn't talk about any crazy +no big deal right so this is just inference we didn't talk about any crazy 0:27:06.880,0:27:09.279 -stuff and we talked in for inference - -0:27:08.799,0:27:11.039 -about +stuff and we talked in for inference about 0:27:09.279,0:27:12.559 -about the inference the whole last - -0:27:11.039,0:27:15.360 -lesson so +about the inference the whole last lesson so 0:27:12.559,0:27:17.840 -i guess we can move on move on and start - -0:27:15.360,0:27:20.320 -the training +i guess we can move on move on and start the training 0:27:17.840,0:27:21.279 -finding a well-behaved energy function - -0:27:20.320,0:27:24.080 -right +finding a well-behaved energy function right 0:27:21.279,0:27:25.200 -what does this mean so this means we - -0:27:24.080,0:27:27.840 -have to introduce +what does this mean so this means we have to introduce 0:27:25.200,0:27:30.240 -a loss functional what's a loss - -0:27:27.840,0:27:33.679 -functional +a loss functional what's a loss functional 0:27:30.240,0:27:35.440 -well it's a metric it's a scalar - -0:27:33.679,0:27:38.640 -function +well it's a metric it's a scalar function 0:27:35.440,0:27:42.000 -that is telling you how good your - -0:27:38.640,0:27:44.559 -energy function is right so we have an +that is telling you how good your energy function is right so we have an 0:27:42.000,0:27:45.520 -energy function which is this free - -0:27:44.559,0:27:47.360 -energy +energy function which is this free energy 0:27:45.520,0:27:49.279 -and then we're going to have a function - -0:27:47.360,0:27:51.039 -of my function +and then we're going to have a function of my function 0:27:49.279,0:27:53.840 -which is giving me a scalar which is - -0:27:51.039,0:27:56.880 -just telling me how good this +which is giving me a scalar which is just telling me how good this 0:27:53.840,0:27:59.520 -energy function is right so - -0:27:56.880,0:28:00.240 -a loss functional gives me a scalar +energy function is right so a loss functional gives me a scalar 0:27:59.520,0:28:03.679 -given that i - -0:28:00.240,0:28:06.320 -feed a function +given that i feed a function 0:28:03.679,0:28:07.679 -and here i just show to you that if i - -0:28:06.320,0:28:10.720 -have this curly l +and here i just show to you that if i have this curly l 0:28:07.679,0:28:13.600 -as you know the loss functional for the - -0:28:10.720,0:28:15.840 -all the whole batch my whole data set i +as you know the loss functional for the all the whole batch my whole data set i 0:28:13.600,0:28:19.600 -can also express this as the average - -0:28:15.840,0:28:21.600 -of these per sample loss functionals +can also express this as the average of these per sample loss functionals 0:28:19.600,0:28:23.919 -okay so i just do the average of those - -0:28:21.600,0:28:26.640 -per sample those functions +okay so i just do the average of those per sample those functions 0:28:23.919,0:28:28.559 -cool so what the heck am i talking about - -0:28:26.640,0:28:30.720 -right so i'm just giving you i'm +cool so what the heck am i talking about right so i'm just giving you i'm 0:28:28.559,0:28:32.080 -making so much hype but i didn't tell - -0:28:30.720,0:28:33.520 -you anything so far +making so much hype but i didn't tell you anything so far 0:28:32.080,0:28:35.120 -and we already know this stuff from you - -0:28:33.520,0:28:37.840 -know machine learning and you know +and we already know this stuff from you know machine learning and you know 0:28:35.120,0:28:39.360 -previous lessons and so here we go with - -0:28:37.840,0:28:41.520 -the first loss function +previous lessons and so here we go with the first loss function 0:28:39.360,0:28:43.200 -which is the which is the energy loss - -0:28:41.520,0:28:46.240 -function +which is the which is the energy loss function 0:28:43.200,0:28:49.360 -so this energy loss functional - -0:28:46.240,0:28:52.720 -it's simply the free energy +so this energy loss functional it's simply the free energy 0:28:49.360,0:28:54.720 -f evaluated in my y - -0:28:52.720,0:28:56.840 -where y is the data point on the data +f evaluated in my y where y is the data point on the data 0:28:54.720,0:28:59.840 -set right - -0:28:56.840,0:29:01.600 -so whenever whenever we train these +set right so whenever whenever we train these 0:28:59.840,0:29:03.200 -models we're going to be minimizing the - -0:29:01.600,0:29:06.399 -loss functionalities +models we're going to be minimizing the loss functionalities 0:29:03.200,0:29:08.000 -right the loss function and so in this - -0:29:06.399,0:29:11.120 -case the loss functional +right the loss function and so in this case the loss functional 0:29:08.000,0:29:14.399 -is actually the free energy at the - -0:29:11.120,0:29:17.440 -training point of course right i mean +is actually the free energy at the training point of course right i mean 0:29:14.399,0:29:18.559 -what this energy function has to do with - -0:29:17.440,0:29:21.200 -the free energy +what this energy function has to do with the free energy 0:29:18.559,0:29:22.559 -should be small for data that comes from - -0:29:21.200,0:29:25.279 -the training distribution +should be small for data that comes from the training distribution 0:29:22.559,0:29:26.960 -large elsewhere right and so what is the - -0:29:25.279,0:29:28.480 -easiest way to do that well of course +large elsewhere right and so what is the easiest way to do that well of course 0:29:26.960,0:29:31.679 -we're gonna just have the - -0:29:28.480,0:29:34.799 -loss functional being the free energy +we're gonna just have the loss functional being the free energy 0:29:31.679,0:29:37.600 -evaluated at the training point - -0:29:34.799,0:29:39.440 -so if it's larger than zero then the +evaluated at the training point so if it's larger than zero then the 0:29:37.600,0:29:41.039 -training of the network you know - -0:29:39.440,0:29:42.960 -changing the parameters such that we +training of the network you know changing the parameters such that we 0:29:41.039,0:29:45.520 -minimize the loss functional - -0:29:42.960,0:29:47.120 -is going to be squeezing down the free +minimize the loss functional is going to be squeezing down the free 0:29:45.520,0:29:48.640 -energy on those - -0:29:47.120,0:29:50.399 -points right so you have the point you +energy on those points right so you have the point you 0:29:48.640,0:29:53.679 -have a free energy boom - -0:29:50.399,0:29:56.799 -point free energy bomb all right so +have a free energy boom point free energy bomb all right so 0:29:53.679,0:29:58.480 -we just small like clamp like we - -0:29:56.799,0:30:00.559 -we are reducing the free energy in +we just small like clamp like we we are reducing the free energy in 0:29:58.480,0:30:04.000 -correspondence to all these - -0:30:00.559,0:30:06.000 -uh whites why is +correspondence to all these uh whites why is 0:30:04.000,0:30:07.120 -there is a check there is a check - -0:30:06.000,0:30:08.960 -because i +there is a check there is a check because i 0:30:07.120,0:30:10.240 -want to emphasize the fact that we are - -0:30:08.960,0:30:12.240 -trying to push down +want to emphasize the fact that we are trying to push down 0:30:10.240,0:30:13.440 -the energy at those locations right so i - -0:30:12.240,0:30:16.799 -push down there is the +the energy at those locations right so i push down there is the 0:30:13.440,0:30:19.600 -arrow pointing down i push down - -0:30:16.799,0:30:20.960 -all right okay i might sound silly but +arrow pointing down i push down all right okay i might sound silly but 0:30:19.600,0:30:24.880 -it doesn't matter i like - -0:30:20.960,0:30:26.320 -myself silly so instead now we're gonna +it doesn't matter i like myself silly so instead now we're gonna 0:30:24.880,0:30:28.399 -be introducing these - -0:30:26.320,0:30:29.520 -uh contrastive methods what is a +be introducing these uh contrastive methods what is a 0:30:28.399,0:30:32.080 -contrastive method - -0:30:29.520,0:30:32.720 -uh in this case this contrasting method +contrastive method uh in this case this contrasting method 0:30:32.080,0:30:35.440 -will have - -0:30:32.720,0:30:36.399 -a white check which is blue why is blue +will have a white check which is blue why is blue 0:30:35.440,0:30:38.320 -because it's cold - -0:30:36.399,0:30:40.320 -right so we want to try to get low +because it's cold right so we want to try to get low 0:30:38.320,0:30:41.600 -energy again the energy the temperature - -0:30:40.320,0:30:44.399 -are connected right +energy again the energy the temperature are connected right 0:30:41.600,0:30:46.320 -so low energy is going to be cold blue - -0:30:44.399,0:30:50.159 -and then i have a white hot +so low energy is going to be cold blue and then i have a white hot 0:30:46.320,0:30:51.520 -why is hot why is red why why hot is red - -0:30:50.159,0:30:53.679 -i want to increase the energy right +why is hot why is red why why hot is red i want to increase the energy right 0:30:51.520,0:30:56.720 -that's why there is the the hot pointing - -0:30:53.679,0:31:00.080 -upwards and so in this case +that's why there is the the hot pointing upwards and so in this case 0:30:56.720,0:31:04.000 -uh given that m is a positive number - -0:31:00.080,0:31:08.480 -the difference f of y hat +uh given that m is a positive number the difference f of y hat 0:31:04.000,0:31:10.880 -minus f of y check that the difference - -0:31:08.480,0:31:11.760 -it will the network will try to make it +minus f of y check that the difference it will the network will try to make it 0:31:10.880,0:31:15.360 -larger than - -0:31:11.760,0:31:19.039 -m right so for as long as the difference +larger than m right so for as long as the difference 0:31:15.360,0:31:22.159 -is smaller than m then these - -0:31:19.039,0:31:22.559 -you know this value over here will have +is smaller than m then these you know this value over here will have 0:31:22.159,0:31:25.919 -a - -0:31:22.559,0:31:29.519 -positive value whenever f y hat +a positive value whenever f y hat 0:31:25.919,0:31:32.799 -minus f y check will be larger than - -0:31:29.519,0:31:34.240 -m then you're gonna have that +minus f y check will be larger than m then you're gonna have that 0:31:32.799,0:31:36.000 -you know the output of this stuff is - -0:31:34.240,0:31:38.880 -gonna be zero +you know the output of this stuff is gonna be zero 0:31:36.000,0:31:39.360 -okay because there is a a positive part - -0:31:38.880,0:31:42.559 -so +okay because there is a a positive part so 0:31:39.360,0:31:42.880 -again this hinge loss will simply try to - -0:31:42.559,0:31:45.919 -get +again this hinge loss will simply try to get 0:31:42.880,0:31:46.720 -that second difference to be larger than - -0:31:45.919,0:31:50.960 -the +that second difference to be larger than the 0:31:46.720,0:31:50.960 -uh the first term the margin - -0:31:51.039,0:31:54.640 -in order to have like a smoother version +uh the first term the margin in order to have like a smoother version 0:31:53.120,0:31:56.960 -of this margin - -0:31:54.640,0:31:58.799 -this is like very binary right if you're +of this margin this is like very binary right if you're 0:31:56.960,0:32:01.279 -lower than the margin you push - -0:31:58.799,0:32:03.120 -larger than the margin you stop pushing +lower than the margin you push larger than the margin you stop pushing 0:32:01.279,0:32:07.679 -you can use this other version the - -0:32:03.120,0:32:10.799 -the loss log loss functional +you can use this other version the the loss log loss functional 0:32:07.679,0:32:13.279 -which is a smooth margin uh - -0:32:10.799,0:32:14.399 -you can you can see right whenever you +which is a smooth margin uh you can you can see right whenever you 0:32:13.279,0:32:17.440 -have that - -0:32:14.399,0:32:18.240 -inside these in this parenthesis you +have that inside these in this parenthesis you 0:32:17.440,0:32:20.399 -have a very - -0:32:18.240,0:32:21.679 -negative number so if this is very very +have a very negative number so if this is very very 0:32:20.399,0:32:23.600 -large and this is zero - -0:32:21.679,0:32:25.360 -let's say you're gonna have the x of a +large and this is zero let's say you're gonna have the x of a 0:32:23.600,0:32:25.919 -very negative number which is roughly - -0:32:25.360,0:32:28.080 -zero +very negative number which is roughly zero 0:32:25.919,0:32:29.279 -and i have the log of one which is you - -0:32:28.080,0:32:32.240 -know stop pushing +and i have the log of one which is you know stop pushing 0:32:29.279,0:32:33.600 -there is no more instead if this value - -0:32:32.240,0:32:35.840 -here is large +there is no more instead if this value here is large 0:32:33.600,0:32:37.600 -and this value is maybe negative or - -0:32:35.840,0:32:38.880 -whatever is zero +and this value is maybe negative or whatever is zero 0:32:37.600,0:32:40.640 -you're gonna have the exponential of - -0:32:38.880,0:32:42.799 -this number which is gonna be very large +you're gonna have the exponential of this number which is gonna be very large 0:32:40.640,0:32:44.240 -and then you're gonna have the one plus - -0:32:42.799,0:32:47.679 -this exponential +and then you're gonna have the one plus this exponential 0:32:44.240,0:32:48.240 -uh which again the one gets neglected - -0:32:47.679,0:32:49.840 -you don't get +uh which again the one gets neglected you don't get 0:32:48.240,0:32:51.840 -the log of this x but you're gonna get - -0:32:49.840,0:32:53.679 -basically the uh the +the log of this x but you're gonna get basically the uh the 0:32:51.840,0:32:57.279 -the loss is gonna be proportional to the - -0:32:53.679,0:33:00.880 -energy right if it's very large +the loss is gonna be proportional to the energy right if it's very large 0:32:57.279,0:33:03.519 -cool cool but again for our case uh - -0:33:00.880,0:33:04.240 -we just have a very tiny one-dimensional +cool cool but again for our case uh we just have a very tiny one-dimensional 0:33:03.519,0:33:06.399 -latent - -0:33:04.240,0:33:07.760 -so we don't need to do this uh this +latent so we don't need to do this uh this 0:33:06.399,0:33:09.760 -contrastive sampling - -0:33:07.760,0:33:11.440 -uh contrastive learning it's necessary +contrastive sampling uh contrastive learning it's necessary 0:33:09.760,0:33:13.440 -whenever you have like a - -0:33:11.440,0:33:14.720 -um you know maybe like a high +whenever you have like a um you know maybe like a high 0:33:13.440,0:33:18.320 -dimensional latent - -0:33:14.720,0:33:20.080 -and so on um so let's just +dimensional latent and so on um so let's just 0:33:18.320,0:33:22.559 -you know let's just train this model - -0:33:20.080,0:33:26.080 -because i didn't train this model so far +you know let's just train this model because i didn't train this model so far 0:33:22.559,0:33:29.519 -with this energy loss functional - -0:33:26.080,0:33:31.360 -okay and so i train this model it takes +with this energy loss functional okay and so i train this model it takes 0:33:29.519,0:33:33.840 -one epoch to converge - -0:33:31.360,0:33:36.159 -it's ridiculously fast okay but it's a +one epoch to converge it's ridiculously fast okay but it's a 0:33:33.840,0:33:38.880 -toy example so you understand that - -0:33:36.159,0:33:39.200 -and i'm gonna start by uh showing you +toy example so you understand that and i'm gonna start by uh showing you 0:33:38.880,0:33:41.600 -the - -0:33:39.200,0:33:43.519 -zero temperature limit the super cold +the zero temperature limit the super cold 0:33:41.600,0:33:45.200 -free energy okay - -0:33:43.519,0:33:46.720 -uh on the left hand side i'm gonna show +free energy okay uh on the left hand side i'm gonna show 0:33:45.200,0:33:48.559 -you the untrained version which is the - -0:33:46.720,0:33:52.000 -one we already saw before +you the untrained version which is the one we already saw before 0:33:48.559,0:33:54.559 -so in this case for every training - -0:33:52.000,0:33:55.440 -point the blue point i have a +so in this case for every training point the blue point i have a 0:33:54.559,0:33:57.919 -corresponding - -0:33:55.440,0:34:00.080 -x which is the location on the model +corresponding x which is the location on the model 0:33:57.919,0:34:04.559 -manifold that is the closest to that - -0:34:00.080,0:34:08.639 -training point okay whenever i train +manifold that is the closest to that training point okay whenever i train 0:34:04.559,0:34:11.200 -i'm gonna be you know uh get a gradient - -0:34:08.639,0:34:12.000 -that gradient is gonna be i just i told +i'm gonna be you know uh get a gradient that gradient is gonna be i just i told 0:34:11.200,0:34:13.520 -you before - -0:34:12.000,0:34:15.679 -if you if you get the mean you're gonna +you before if you if you get the mean you're gonna 0:34:13.520,0:34:17.280 -get one item and then if you do the - -0:34:15.679,0:34:19.119 -derivative you're gonna get the +get one item and then if you do the derivative you're gonna get the 0:34:17.280,0:34:21.040 -argument which is just the one in - -0:34:19.119,0:34:24.240 -correspondence to the +argument which is just the one in correspondence to the 0:34:21.040,0:34:25.919 -lowest value and so that one is going to - -0:34:24.240,0:34:28.960 -be represented here +lowest value and so that one is going to be represented here 0:34:25.919,0:34:31.359 -by uh that - -0:34:28.960,0:34:32.480 -arrow over here right so this arrow here +by uh that arrow over here right so this arrow here 0:34:31.359,0:34:35.440 -is going to be - -0:34:32.480,0:34:36.320 -the energy the derivative of the energy +is going to be the energy the derivative of the energy 0:34:35.440,0:34:39.599 -which is going to be - -0:34:36.320,0:34:42.320 -just the distance like the y minus +which is going to be just the distance like the y minus 0:34:39.599,0:34:43.440 -y check and then that's going to be - -0:34:42.320,0:34:45.760 -multiplied by +y check and then that's going to be multiplied by 0:34:43.440,0:34:48.240 -you know the one in corresponding to the - -0:34:45.760,0:34:51.280 -location that is closest to uh +you know the one in corresponding to the location that is closest to uh 0:34:48.240,0:34:54.879 -to our point all right so - -0:34:51.280,0:34:57.440 -what this means is that during training +to our point all right so what this means is that during training 0:34:54.879,0:34:59.040 -whenever we use the ztl the zero - -0:34:57.440,0:35:00.720 -temperature limit you're gonna get +whenever we use the ztl the zero temperature limit you're gonna get 0:34:59.040,0:35:02.240 -the location on the manifold that is - -0:35:00.720,0:35:03.680 -closest to your training point +the location on the manifold that is closest to your training point 0:35:02.240,0:35:05.520 -and then you're gonna get this point to - -0:35:03.680,0:35:06.960 -be moving there +and then you're gonna get this point to be moving there 0:35:05.520,0:35:08.800 -you have this training point you get - -0:35:06.960,0:35:09.839 -this location that is on the manifold +you have this training point you get this location that is on the manifold 0:35:08.800,0:35:11.200 -closer to this point - -0:35:09.839,0:35:13.440 -and then you get a gradient that is +closer to this point and then you get a gradient that is 0:35:11.200,0:35:15.119 -making going up here - -0:35:13.440,0:35:16.960 -same you have a training point here +making going up here same you have a training point here 0:35:15.119,0:35:19.520 -close this point to the manifold here - -0:35:16.960,0:35:20.160 -you get this point a gradient that goes +close this point to the manifold here you get this point a gradient that goes 0:35:19.520,0:35:23.359 -down here - -0:35:20.160,0:35:26.079 -okay so this is the training procedure +down here okay so this is the training procedure 0:35:23.359,0:35:28.079 -when using this zero temperature limit - -0:35:26.079,0:35:30.480 -one epoch later +when using this zero temperature limit one epoch later 0:35:28.079,0:35:31.520 -on the right hand side the train version - -0:35:30.480,0:35:34.800 -bam +on the right hand side the train version bam 0:35:31.520,0:35:38.720 -all those axes automatically managed to - -0:35:34.800,0:35:41.119 -arrive to destination finished +all those axes automatically managed to arrive to destination finished 0:35:38.720,0:35:42.400 -so this is like a well-trained model - -0:35:41.119,0:35:45.520 -which i'll show you +so this is like a well-trained model which i'll show you 0:35:42.400,0:35:46.160 -where i show you the energy uh going to - -0:35:45.520,0:35:48.560 -zero in +where i show you the energy uh going to zero in 0:35:46.160,0:35:50.880 -in the all around like acro - -0:35:48.560,0:35:52.720 -corresponding to all the locations +in the all around like acro corresponding to all the locations 0:35:50.880,0:35:54.640 -corresponding to my training data set - -0:35:52.720,0:35:58.000 -right the training points the +corresponding to my training data set right the training points the 0:35:54.640,0:36:01.359 -the blue points what happens if you - -0:35:58.000,0:36:05.520 -have two closest point on a manifold if +the blue points what happens if you have two closest point on a manifold if 0:36:01.359,0:36:08.000 -for example if y is at zero zero - -0:36:05.520,0:36:08.000 -um +for example if y is at zero zero um 0:36:09.280,0:36:14.240 -right so in the energy in the in the - -0:36:12.720,0:36:15.839 -zero temperature limit you're going to +right so in the energy in the in the zero temperature limit you're going to 0:36:14.240,0:36:17.119 -get just one point it's going to be - -0:36:15.839,0:36:20.320 -pulled there +get just one point it's going to be pulled there 0:36:17.119,0:36:22.480 -and this is very prone to overfitting - -0:36:20.320,0:36:23.440 -let's say our z is not just one +and this is very prone to overfitting let's say our z is not just one 0:36:22.480,0:36:25.280 -dimensional - -0:36:23.440,0:36:27.119 -large it's larger right so instead of +dimensional large it's larger right so instead of 0:36:25.280,0:36:29.599 -having like a ellipse you're gonna have - -0:36:27.119,0:36:31.599 -like a potato +having like a ellipse you're gonna have like a potato 0:36:29.599,0:36:33.200 -if you haven't hold on let me finish the - -0:36:31.599,0:36:35.040 -answer if you have a potato +if you haven't hold on let me finish the answer if you have a potato 0:36:33.200,0:36:37.440 -or potato you're gonna get all these - -0:36:35.040,0:36:40.960 -locations on the potato to go +or potato you're gonna get all these locations on the potato to go 0:36:37.440,0:36:44.000 -to those training points and so if your - -0:36:40.960,0:36:45.599 -z is a high dimensional latent variable +to those training points and so if your z is a high dimensional latent variable 0:36:44.000,0:36:47.599 -you end up with a you start with a - -0:36:45.599,0:36:49.760 -potato and you end up with a porcupine +you end up with a you start with a potato and you end up with a porcupine 0:36:47.599,0:36:51.359 -with all those peaks going uh you know - -0:36:49.760,0:36:52.800 -going out and this is basically +with all those peaks going uh you know going out and this is basically 0:36:51.359,0:36:53.280 -overfitting you just memorize the - -0:36:52.800,0:36:55.440 -training +overfitting you just memorize the training 0:36:53.280,0:36:58.320 -set in our case this doesn't happen - -0:36:55.440,0:37:01.680 -because our latent is one dimensional so +set in our case this doesn't happen because our latent is one dimensional so 0:36:58.320,0:37:04.320 -you can't really pull spikes out - -0:37:01.680,0:37:04.320 -of that thing +you can't really pull spikes out of that thing 0:37:05.119,0:37:11.839 -but nevertheless we may want to figure - -0:37:08.880,0:37:16.320 -out how to deal with this overfitting +but nevertheless we may want to figure out how to deal with this overfitting 0:37:11.839,0:37:18.240 -uh by using this you know temperature - -0:37:16.320,0:37:19.839 -regularization thing right so before i +uh by using this you know temperature regularization thing right so before i 0:37:18.240,0:37:21.760 -show you there was a peak - -0:37:19.839,0:37:23.280 -if there is a zero temperature limit +show you there was a peak if there is a zero temperature limit 0:37:21.760,0:37:25.760 -then if you increase the temperature you - -0:37:23.280,0:37:28.560 -actually smooth out that peak +then if you increase the temperature you actually smooth out that peak 0:37:25.760,0:37:30.640 -and so here i'm going to show you uh - -0:37:28.560,0:37:33.359 -then i answer the other question +and so here i'm going to show you uh then i answer the other question 0:37:30.640,0:37:34.079 -actually let me see what happens here - -0:37:33.359,0:37:36.800 -how +actually let me see what happens here how 0:37:34.079,0:37:38.800 -do we update the energy function is it - -0:37:36.800,0:37:41.359 -parametrized with uh +do we update the energy function is it parametrized with uh 0:37:38.800,0:37:42.000 -oh here this is definition from last - -0:37:41.359,0:37:45.520 -time +oh here this is definition from last time 0:37:42.000,0:37:47.440 -right so my energy function is this one - -0:37:45.520,0:37:50.480 -right +right so my energy function is this one right 0:37:47.440,0:37:52.000 -where so my energy function is my model - -0:37:50.480,0:37:54.240 -right +where so my energy function is my model right 0:37:52.000,0:37:56.000 -which is the square difference between - -0:37:54.240,0:37:57.119 -the locations and because the laden for +which is the square difference between the locations and because the laden for 0:37:56.000,0:37:58.640 -the first component and the code of the - -0:37:57.119,0:37:59.280 -latent for the second component so this +the first component and the code of the latent for the second component so this 0:37:58.640,0:38:03.280 -is like - -0:37:59.280,0:38:03.280 -this is how e is parametrized right +is like this is how e is parametrized right 0:38:03.359,0:38:08.079 -uh does the learning interpolate between - -0:38:06.400,0:38:10.880 -the points +uh does the learning interpolate between the points 0:38:08.079,0:38:11.200 -uh it asked would this algorithm learn - -0:38:10.880,0:38:14.000 -the +uh it asked would this algorithm learn the 0:38:11.200,0:38:14.880 -mod the whole ellipse or just the blue - -0:38:14.000,0:38:17.839 -points +mod the whole ellipse or just the blue points 0:38:14.880,0:38:18.640 -okay so i'm getting there okay is there - -0:38:17.839,0:38:20.480 -a visualization +okay so i'm getting there okay is there a visualization 0:38:18.640,0:38:24.240 -for the spikes to talk about when - -0:38:20.480,0:38:26.800 -overfitting yeah getting there as well +for the spikes to talk about when overfitting yeah getting there as well 0:38:24.240,0:38:28.160 -all right so we were telling like we - -0:38:26.800,0:38:30.400 -were talking about +all right so we were telling like we were talking about 0:38:28.160,0:38:32.079 -how we train this energy function right - -0:38:30.400,0:38:34.000 -so this energy energy function +how we train this energy function right so this energy energy function 0:38:32.079,0:38:35.280 -is going to be this color thing i show - -0:38:34.000,0:38:37.200 -you over here +is going to be this color thing i show you over here 0:38:35.280,0:38:39.280 -and this is you know a different - -0:38:37.200,0:38:39.920 -representation it's simply the location +and this is you know a different representation it's simply the location 0:38:39.280,0:38:43.680 -of that - -0:38:39.920,0:38:46.480 -uh violet ellipse +of that uh violet ellipse 0:38:43.680,0:38:48.400 -training for the zero temperature zero - -0:38:46.480,0:38:49.280 -temperature limit means you take that +training for the zero temperature zero temperature limit means you take that 0:38:48.400,0:38:51.760 -point - -0:38:49.280,0:38:53.520 -of these ellipse you try to pull it up +point of these ellipse you try to pull it up 0:38:51.760,0:38:55.760 -right how you pull it up - -0:38:53.520,0:38:56.800 -the only two parameters we had in this +right how you pull it up the only two parameters we had in this 0:38:55.760,0:39:00.240 -model were - -0:38:56.800,0:39:03.599 -w1 and w2 which were con controlling the +model were w1 and w2 which were con controlling the 0:39:00.240,0:39:04.960 -x radius and the y radius right so we - -0:39:03.599,0:39:07.359 -had two parameters +x radius and the y radius right so we had two parameters 0:39:04.960,0:39:08.079 -and with two parameters we try to fit - -0:39:07.359,0:39:11.119 -all these +and with two parameters we try to fit all these 0:39:08.079,0:39:11.440 -y's right and so basically the network - -0:39:11.119,0:39:13.599 -will +y's right and so basically the network will 0:39:11.440,0:39:15.200 -like the the training procedure gradient - -0:39:13.599,0:39:17.680 -descent will eventually +like the the training procedure gradient descent will eventually 0:39:15.200,0:39:19.200 -try to change the size of this ellipse - -0:39:17.680,0:39:20.880 -such that it +try to change the size of this ellipse such that it 0:39:19.200,0:39:22.240 -you know expands and they're going to be - -0:39:20.880,0:39:25.680 -matching all those +you know expands and they're going to be matching all those 0:39:22.240,0:39:29.040 -uh blue dots okay - -0:39:25.680,0:39:31.280 -the spiky thing was i was saying is that +uh blue dots okay the spiky thing was i was saying is that 0:39:29.040,0:39:33.200 -if you have a high dimensional z - -0:39:31.280,0:39:34.800 -like in this case z is one dimension so +if you have a high dimensional z like in this case z is one dimension so 0:39:33.200,0:39:36.960 -you have like one line - -0:39:34.800,0:39:38.800 -like that if that is two dimensional +you have like one line like that if that is two dimensional 0:39:36.960,0:39:41.119 -it's going to be the whole surface right - -0:39:38.800,0:39:42.800 -and so now it's trivial to overfit you +it's going to be the whole surface right and so now it's trivial to overfit you 0:39:41.119,0:39:44.160 -can move anywhere in the plane there is - -0:39:42.800,0:39:46.960 -no more constraint +can move anywhere in the plane there is no more constraint 0:39:44.160,0:39:47.359 -of living on that line and so we have to - -0:39:46.960,0:39:49.760 -see +of living on that line and so we have to see 0:39:47.359,0:39:51.200 -how we can avoid overfitting but in this - -0:39:49.760,0:39:53.359 -case it doesn't happen but +how we can avoid overfitting but in this case it doesn't happen but 0:39:51.200,0:39:54.720 -you know we can see now that by - -0:39:53.359,0:39:57.839 -increasing the temperature +you know we can see now that by increasing the temperature 0:39:54.720,0:39:59.599 -we no longer pick points individually - -0:39:57.839,0:40:02.720 -so we are using this marginalization +we no longer pick points individually so we are using this marginalization 0:39:59.599,0:40:06.160 -this vision thingy - -0:40:02.720,0:40:08.800 -so on the bottom part is marginalization +this vision thingy so on the bottom part is marginalization 0:40:06.160,0:40:11.200 -on the left hand side i show you how the - -0:40:08.800,0:40:14.160 -training uh works right +on the left hand side i show you how the training uh works right 0:40:11.200,0:40:15.440 -so you have that all those locations - -0:40:14.160,0:40:18.720 -contribute +so you have that all those locations contribute 0:40:15.440,0:40:21.119 -to these you know the gradient - -0:40:18.720,0:40:22.960 -are just the average of those arrows +to these you know the gradient are just the average of those arrows 0:40:21.119,0:40:26.079 -here so given that we pick - -0:40:22.960,0:40:29.680 -one y that is this green x +here so given that we pick one y that is this green x 0:40:26.079,0:40:32.960 -over here you get these all these - -0:40:29.680,0:40:35.280 -points on this manifold will contribute +over here you get these all these points on this manifold will contribute 0:40:32.960,0:40:37.760 -and will be attracted there here before - -0:40:35.280,0:40:39.920 -we have only one point gets pulled up +and will be attracted there here before we have only one point gets pulled up 0:40:37.760,0:40:41.839 -here we have that all these points get - -0:40:39.920,0:40:44.240 -pulled up right so it's much harder to +here we have that all these points get pulled up right so it's much harder to 0:40:41.839,0:40:45.280 -overfit something you want to pay - -0:40:44.240,0:40:47.359 -attention here +overfit something you want to pay attention here 0:40:45.280,0:40:48.960 -is that how do i compute the gradient so - -0:40:47.359,0:40:50.839 -the gradient +is that how do i compute the gradient so the gradient 0:40:48.960,0:40:52.000 -i'm computing the gradient of this soft - -0:40:50.839,0:40:54.800 -mean +i'm computing the gradient of this soft mean 0:40:52.000,0:40:57.040 -and so automatically we are gonna get a - -0:40:54.800,0:40:58.640 -soft argument right so if you have a max +and so automatically we are gonna get a soft argument right so if you have a max 0:40:57.040,0:41:00.560 -you do the gradient you're gonna get the - -0:40:58.640,0:41:02.960 -arc max or if you have a mean +you do the gradient you're gonna get the arc max or if you have a mean 0:41:00.560,0:41:04.240 -the gradient is gonna be the argument - -0:41:02.960,0:41:06.319 -here we have a soft +the gradient is gonna be the argument here we have a soft 0:41:04.240,0:41:08.400 -mean and therefore the gradient is going - -0:41:06.319,0:41:11.280 -to be the soft argument +mean and therefore the gradient is going to be the soft argument 0:41:08.400,0:41:11.920 -multiplied by the the derivative of this - -0:41:11.280,0:41:13.680 -energy +multiplied by the the derivative of this energy 0:41:11.920,0:41:15.920 -and which is going to be simply this - -0:41:13.680,0:41:18.079 -vector right so the energy is the square +and which is going to be simply this vector right so the energy is the square 0:41:15.920,0:41:19.520 -distance if you do the derivative you're - -0:41:18.079,0:41:22.640 -going to get the vector which are +distance if you do the derivative you're going to get the vector which are 0:41:19.520,0:41:26.400 -here shown in white and then the height - -0:41:22.640,0:41:28.720 -is gonna be uh basically given to you +here shown in white and then the height is gonna be uh basically given to you 0:41:26.400,0:41:29.440 -by the you know the the vector - -0:41:28.720,0:41:32.640 -multiplied +by the you know the the vector multiplied 0:41:29.440,0:41:36.560 -by this soft argument - -0:41:32.640,0:41:36.880 -cool wow that's a lot to take i think +by this soft argument cool wow that's a lot to take i think 0:41:36.560,0:41:39.599 -but - -0:41:36.880,0:41:41.280 -it's it's i think it's just great uh +but it's it's i think it's just great uh 0:41:39.599,0:41:44.240 -finally i train the last one - -0:41:41.280,0:41:45.599 -and i'm gonna get something like this on +finally i train the last one and i'm gonna get something like this on 0:41:44.240,0:41:48.800 -the right hand side - -0:41:45.599,0:41:52.240 -okay so before i show you +the right hand side okay so before i show you 0:41:48.800,0:41:53.839 -the cross-section for the left-hand side - -0:41:52.240,0:41:55.760 -the untrained version i'm going to show +the cross-section for the left-hand side the untrained version i'm going to show 0:41:53.839,0:41:56.800 -you now the cross-section for this train - -0:41:55.760,0:41:59.200 -version +you now the cross-section for this train version 0:41:56.800,0:42:01.359 -so the zero temperature limit the super - -0:41:59.200,0:42:02.640 -cold one i'm gonna get this red one with +so the zero temperature limit the super cold one i'm gonna get this red one with 0:42:01.359,0:42:05.280 -a spike - -0:42:02.640,0:42:05.839 -and then as you increase the temperature +a spike and then as you increase the temperature 0:42:05.280,0:42:08.400 -as you - -0:42:05.839,0:42:10.079 -reduce this beta we're moving up until +as you reduce this beta we're moving up until 0:42:08.400,0:42:14.400 -you get this you know average - -0:42:10.079,0:42:17.359 -version this parabolic uh blue one right +you get this you know average version this parabolic uh blue one right 0:42:14.400,0:42:18.800 -okay okay okay and so all of this was - -0:42:17.359,0:42:22.079 -about +okay okay okay and so all of this was about 0:42:18.800,0:42:23.680 -unsupervised learning right so far we - -0:42:22.079,0:42:28.400 -only have seen +unsupervised learning right so far we only have seen 0:42:23.680,0:42:30.880 -y's where are the x's - -0:42:28.400,0:42:32.319 -and so this is like yesterday night i'm +y's where are the x's and so this is like yesterday night i'm 0:42:30.880,0:42:34.560 -like okay maybe i don't talk about - -0:42:32.319,0:42:36.079 -supervised learning like i don't +like okay maybe i don't talk about supervised learning like i don't 0:42:34.560,0:42:38.000 -how long is going to take me to now - -0:42:36.079,0:42:39.040 -train a model with the x's and +how long is going to take me to now train a model with the x's and 0:42:38.000,0:42:42.400 -everything and - -0:42:39.040,0:42:44.800 -i don't want to do it but then +everything and i don't want to do it but then 0:42:42.400,0:42:45.680 -i just change one line of code and - -0:42:44.800,0:42:48.000 -everything just +i just change one line of code and everything just 0:42:45.680,0:42:49.920 -works so everything we have seen so far - -0:42:48.000,0:42:51.760 -is exactly the same +works so everything we have seen so far is exactly the same 0:42:49.920,0:42:53.200 -for the unconditional which is this - -0:42:51.760,0:42:56.240 -unsupervised +for the unconditional which is this unsupervised 0:42:53.200,0:42:57.760 -learning way and it's gonna - -0:42:56.240,0:43:00.240 -like one line change you're gonna get +learning way and it's gonna like one line change you're gonna get 0:42:57.760,0:43:00.640 -the supervised like the self-supervised - -0:43:00.240,0:43:03.040 -the +the supervised like the self-supervised the 0:43:00.640,0:43:03.760 -conditional and so now in the last five - -0:43:03.040,0:43:04.880 -minutes +conditional and so now in the last five minutes 0:43:03.760,0:43:06.880 -we're gonna be talking about the - -0:43:04.880,0:43:08.240 -self-supervised learning or the +we're gonna be talking about the self-supervised learning or the 0:43:06.880,0:43:10.400 -conditional case - -0:43:08.240,0:43:12.480 -what does this mean so let's get back to +conditional case what does this mean so let's get back to 0:43:10.400,0:43:13.280 -the training data this is my training - -0:43:12.480,0:43:16.560 -data right +the training data this is my training data right 0:43:13.280,0:43:19.119 -we have we try to learn this horn - -0:43:16.560,0:43:21.040 -that is starting with a horizontal mouth +we have we try to learn this horn that is starting with a horizontal mouth 0:43:19.119,0:43:24.240 -like it's like a closed mouth ah - -0:43:21.040,0:43:27.680 -like that and then it goes like a very +like it's like a closed mouth ah like that and then it goes like a very 0:43:24.240,0:43:28.400 -you know tall and narrow and then the - -0:43:27.680,0:43:31.119 -profile +you know tall and narrow and then the profile 0:43:28.400,0:43:31.760 -the envelope is exponential right so - -0:43:31.119,0:43:34.880 -here +the envelope is exponential right so here 0:43:31.760,0:43:38.240 -the the the radius - -0:43:34.880,0:43:39.839 -goes from beta to alpha and in x +the the the radius goes from beta to alpha and in x 0:43:38.240,0:43:41.839 -in exp like it's multiplied by the - -0:43:39.839,0:43:43.599 -exponential of two times the x +in exp like it's multiplied by the exponential of two times the x 0:43:41.839,0:43:45.920 -and the other case the goes from alpha - -0:43:43.599,0:43:46.640 -to beta and also it is multiplied by +and the other case the goes from alpha to beta and also it is multiplied by 0:43:45.920,0:43:50.560 -this - -0:43:46.640,0:43:52.640 -exponential so let's see if we can learn +this exponential so let's see if we can learn 0:43:50.560,0:43:54.960 -this stuff and i didn't know if it was - -0:43:52.640,0:43:58.160 -easy or hard i thought it was hard +this stuff and i didn't know if it was easy or hard i thought it was hard 0:43:54.960,0:43:59.119 -it was very easy and so untrained model - -0:43:58.160,0:44:02.000 -manifold +it was very easy and so untrained model manifold 0:43:59.119,0:44:03.119 -so let's give it a look how does my - -0:44:02.000,0:44:05.920 -model look now +so let's give it a look how does my model look now 0:44:03.119,0:44:07.359 -so i have a z and since i have control - -0:44:05.920,0:44:10.079 -over z i take +so i have a z and since i have control over z i take 0:44:07.359,0:44:12.079 -you know zero to two pi to pi excluded - -0:44:10.079,0:44:16.240 -that's why the bracket is flipped +you know zero to two pi to pi excluded that's why the bracket is flipped 0:44:12.079,0:44:19.359 -with an interval of pi over 24. - -0:44:16.240,0:44:21.760 -so i get a line on over there i fit this +with an interval of pi over 24. so i get a line on over there i fit this 0:44:19.359,0:44:22.480 -z on the decoder and then i'm gonna get - -0:44:21.760,0:44:25.440 -my y +z on the decoder and then i'm gonna get my y 0:44:22.480,0:44:26.319 -tilde which is gonna be moving uh like - -0:44:25.440,0:44:28.800 -going around +tilde which is gonna be moving uh like going around 0:44:26.319,0:44:32.960 -ellipses because that's how my network - -0:44:28.800,0:44:35.359 -is routed inside the decoder right +ellipses because that's how my network is routed inside the decoder right 0:44:32.960,0:44:37.520 -moreover we're gonna have our y's our - -0:44:35.359,0:44:39.040 -observer why is observed +moreover we're gonna have our y's our observer why is observed 0:44:37.520,0:44:41.280 -you can see is observed because there is - -0:44:39.040,0:44:44.160 -a shade in that +you can see is observed because there is a shade in that 0:44:41.280,0:44:45.200 -bubble there in the circle and now we - -0:44:44.160,0:44:47.920 -have a predictor +bubble there in the circle and now we have a predictor 0:44:45.200,0:44:48.319 -and the decoder not only takes my latent - -0:44:47.920,0:44:50.960 -z +and the decoder not only takes my latent z 0:44:48.319,0:44:51.760 -but also a predictor and the predictor - -0:44:50.960,0:44:54.880 -is fed +but also a predictor and the predictor is fed 0:44:51.760,0:44:56.480 -with my observed x and since again if i - -0:44:54.880,0:44:59.760 -have control over z +with my observed x and since again if i have control over z 0:44:56.480,0:45:02.960 -i can simply say it goes from 0 to 1 - -0:44:59.760,0:45:05.280 -with 0.02 interval +i can simply say it goes from 0 to 1 with 0.02 interval 0:45:02.960,0:45:07.520 -let me show you how my untrained network - -0:45:05.280,0:45:11.200 -manifold looks right so this is what +let me show you how my untrained network manifold looks right so this is what 0:45:07.520,0:45:13.599 -this untrained network manifold looks - -0:45:11.200,0:45:15.680 -all right so how do i train this well i +this untrained network manifold looks all right so how do i train this well i 0:45:13.599,0:45:19.359 -just do the zero temperature limit - -0:45:15.680,0:45:22.000 -uh free energy training so given my +just do the zero temperature limit uh free energy training so given my 0:45:19.359,0:45:22.640 -horn as before i take one point one y - -0:45:22.000,0:45:25.200 -point +horn as before i take one point one y point 0:45:22.640,0:45:25.839 -i find the closest point on my manifold - -0:45:25.200,0:45:27.839 -and then +i find the closest point on my manifold and then 0:45:25.839,0:45:29.440 -i try to pull it up i take this other - -0:45:27.839,0:45:31.440 -point i take the closest point +i try to pull it up i take this other point i take the closest point 0:45:29.440,0:45:33.280 -and i put it down there i take this - -0:45:31.440,0:45:34.240 -point over here i take the closest point +and i put it down there i take this point over here i take the closest point 0:45:33.280,0:45:36.000 -and then put it on - -0:45:34.240,0:45:38.560 -i take this point over here on the on +and then put it on i take this point over here on the on 0:45:36.000,0:45:39.200 -the horn i take the closest point on the - -0:45:38.560,0:45:41.760 -manifold +the horn i take the closest point on the manifold 0:45:39.200,0:45:42.240 -and i pull it down i do that for one - -0:45:41.760,0:45:45.119 -epoch +and i pull it down i do that for one epoch 0:45:42.240,0:45:47.119 -only i told you it was very easy to - -0:45:45.119,0:45:49.839 -train this model +only i told you it was very easy to train this model 0:45:47.119,0:45:51.599 -and we get actually i had to define - -0:45:49.839,0:45:53.359 -first what is the energy function right +and we get actually i had to define first what is the energy function right 0:45:51.599,0:45:56.240 -so my energy function - -0:45:53.359,0:45:57.920 -uh in this case is going to be this e of +so my energy function uh in this case is going to be this e of 0:45:56.240,0:46:01.200 -x y and z - -0:45:57.920,0:46:03.599 -where again those two components uh like +x y and z where again those two components uh like 0:46:01.200,0:46:05.040 -it's going to be the sum of the square - -0:46:03.599,0:46:08.560 -distances +it's going to be the sum of the square distances 0:46:05.040,0:46:10.560 -but in this case i have f and g right so - -0:46:08.560,0:46:13.760 -we have a predictor f +but in this case i have f and g right so we have a predictor f 0:46:10.560,0:46:16.800 -which are both of them mapping r to r r2 - -0:46:13.760,0:46:19.520 -and then f is going to be a neural net +which are both of them mapping r to r r2 and then f is going to be a neural net 0:46:16.800,0:46:20.160 -mapping my input x through a linear - -0:46:19.520,0:46:22.720 -layer and +mapping my input x through a linear layer and 0:46:20.160,0:46:23.440 -reload to a eight dimensional hidden - -0:46:22.720,0:46:25.280 -layer +reload to a eight dimensional hidden layer 0:46:23.440,0:46:26.880 -then i go again through another linear - -0:46:25.280,0:46:28.000 -layer and reload another eight +then i go again through another linear layer and reload another eight 0:46:26.880,0:46:29.680 -dimensional - -0:46:28.000,0:46:31.760 -hidden layer and then i have my final +dimensional hidden layer and then i have my final 0:46:29.680,0:46:32.880 -linear layer to end up in two dimensions - -0:46:31.760,0:46:36.400 -so i have a four layer +linear layer to end up in two dimensions so i have a four layer 0:46:32.880,0:46:38.560 -network input two hidden of size eight - -0:46:36.400,0:46:41.040 -and then one output of size two +network input two hidden of size eight and then one output of size two 0:46:38.560,0:46:42.000 -and then my g function is simply what - -0:46:41.040,0:46:45.839 -allows me to get +and then my g function is simply what allows me to get 0:46:42.000,0:46:47.839 -this z going in in loops okay - -0:46:45.839,0:46:50.240 -and but then the point now is that these +this z going in in loops okay and but then the point now is that these 0:46:47.839,0:46:53.359 -two components are going to be scaled - -0:46:50.240,0:46:56.079 -by the output of f +two components are going to be scaled by the output of f 0:46:53.359,0:46:58.000 -so this is my model very very tiny very - -0:46:56.079,0:47:01.040 -tiny model +so this is my model very very tiny very tiny model 0:46:58.000,0:47:01.760 -and i'm going to be training it and then - -0:47:01.040,0:47:04.480 -i show you +and i'm going to be training it and then i show you 0:47:01.760,0:47:05.040 -the model manifold so again i take the - -0:47:04.480,0:47:08.160 -same +the model manifold so again i take the same 0:47:05.040,0:47:09.040 -discretization for the znx and this is - -0:47:08.160,0:47:13.040 -how +discretization for the znx and this is how 0:47:09.040,0:47:16.240 -the training train model manifold looks - -0:47:13.040,0:47:17.839 -it's awesome right i think it's just +the training train model manifold looks it's awesome right i think it's just 0:47:16.240,0:47:19.839 -great - -0:47:17.839,0:47:21.119 -all right and this one took nothing no +great all right and this one took nothing no 0:47:19.839,0:47:25.359 -time to train - -0:47:21.119,0:47:27.440 -so how can we move on how +time to train so how can we move on how 0:47:25.359,0:47:29.200 -how what do we do next right how do we - -0:47:27.440,0:47:31.760 -move forward from here +how what do we do next right how do we move forward from here 0:47:29.200,0:47:34.079 -so there are a few more ways to scale - -0:47:31.760,0:47:36.079 -this up not to toy example so +so there are a few more ways to scale this up not to toy example so 0:47:34.079,0:47:37.359 -so far i've been kind of cheating right - -0:47:36.079,0:47:40.800 -i've been always +so far i've been kind of cheating right i've been always 0:47:37.359,0:47:41.520 -uh embedding into d decoder the fact - -0:47:40.800,0:47:44.960 -that my +uh embedding into d decoder the fact that my 0:47:41.520,0:47:45.359 -z goes around circles but i don't know - -0:47:44.960,0:47:48.160 -that +z goes around circles but i don't know that 0:47:45.359,0:47:49.920 -right so we don't know that and so we - -0:47:48.160,0:47:53.359 -may use something like this +right so we don't know that and so we may use something like this 0:47:49.920,0:47:56.559 -in this case my g function takes my - -0:47:53.359,0:47:58.720 -f and z +in this case my g function takes my f and z 0:47:56.559,0:47:59.839 -and then you know g can be a neural net - -0:47:58.720,0:48:01.680 -as well and +and then you know g can be a neural net as well and 0:47:59.839,0:48:02.960 -in this case i have to learn the fact - -0:48:01.680,0:48:06.240 -that this stuff +in this case i have to learn the fact that this stuff 0:48:02.960,0:48:09.920 -moves around circles so i in this case i - -0:48:06.240,0:48:13.119 -should be learning this sine and cosine +moves around circles so i in this case i should be learning this sine and cosine 0:48:09.920,0:48:15.839 -but then how do i know that actually - -0:48:13.119,0:48:18.000 -z is one dimensional well i know because +but then how do i know that actually z is one dimensional well i know because 0:48:15.839,0:48:20.559 -i generated my data right so - -0:48:18.000,0:48:22.480 -i am the owner of my data generation +i generated my data right so i am the owner of my data generation 0:48:20.559,0:48:25.680 -process so i knew that - -0:48:22.480,0:48:28.000 -uh theta was a one-dimensional item so +process so i knew that uh theta was a one-dimensional item so 0:48:25.680,0:48:29.359 -definitely i can just use a latent that - -0:48:28.000,0:48:31.040 -is one-dimensional but +definitely i can just use a latent that is one-dimensional but 0:48:29.359,0:48:33.599 -no one can tell me that for you know - -0:48:31.040,0:48:36.079 -natural images or whatever right so +no one can tell me that for you know natural images or whatever right so 0:48:33.599,0:48:37.760 -that's the other big issue and so how - -0:48:36.079,0:48:38.720 -how would we deal with the fact that we +that's the other big issue and so how how would we deal with the fact that we 0:48:37.760,0:48:42.240 -don't know - -0:48:38.720,0:48:44.160 -what is the correct size of my latent +don't know what is the correct size of my latent 0:48:42.240,0:48:45.839 -uh because again if you choose a large - -0:48:44.160,0:48:48.319 -latent you're going to be +uh because again if you choose a large latent you're going to be 0:48:45.839,0:48:49.839 -very easily overfitting everything and - -0:48:48.319,0:48:51.520 -so in this case +very easily overfitting everything and so in this case 0:48:49.839,0:48:53.920 -what changes from the previous slide - -0:48:51.520,0:48:58.319 -which is this one is that now z +what changes from the previous slide which is this one is that now z 0:48:53.920,0:49:00.400 -is a vector okay so z is a vector - -0:48:58.319,0:49:01.760 -no longer just a single line so actually +is a vector okay so z is a vector no longer just a single line so actually 0:49:00.400,0:49:03.200 -this should be a vector and this should - -0:49:01.760,0:49:06.640 -be like whatever +this should be a vector and this should be like whatever 0:49:03.200,0:49:09.680 -the shape and now my g goes from - -0:49:06.640,0:49:11.200 -the dimension of f and you know +the shape and now my g goes from the dimension of f and you know 0:49:09.680,0:49:11.599 -cartesian product with the dimension of - -0:49:11.200,0:49:15.119 -set +cartesian product with the dimension of set 0:49:11.599,0:49:18.240 -into r2 now the issue is that - -0:49:15.119,0:49:20.640 -we need to regularize +into r2 now the issue is that we need to regularize 0:49:18.240,0:49:21.920 -this loss functional because otherwise - -0:49:20.640,0:49:23.040 -you are going to be drastically over +this loss functional because otherwise you are going to be drastically over 0:49:21.920,0:49:25.280 -fitting right - -0:49:23.040,0:49:26.160 -and so this is what current research +fitting right and so this is what current research 0:49:25.280,0:49:28.880 -with jan - -0:49:26.160,0:49:30.000 -uh is now what me and my students are +with jan uh is now what me and my students are 0:49:28.880,0:49:32.559 -doing - -0:49:30.000,0:49:35.200 -with jan we are trying to figure out +doing with jan we are trying to figure out 0:49:32.559,0:49:37.440 -ways to regularize the latent variable - -0:49:35.200,0:49:38.240 -such that we can you know make things +ways to regularize the latent variable such that we can you know make things 0:49:37.440,0:49:42.480 -actually not - -0:49:38.240,0:49:44.640 -simply overfit and that was it +actually not simply overfit and that was it 0:49:42.480,0:49:45.680 -that was all i had to tell you about - -0:49:44.640,0:49:49.040 -latent variable +that was all i had to tell you about latent variable 0:49:45.680,0:49:51.680 -energy-based models inference training - -0:49:49.040,0:49:52.880 -zero temperature limit a bit warmer than +energy-based models inference training zero temperature limit a bit warmer than 0:49:51.680,0:49:55.680 -free energy - -0:49:52.880,0:49:56.240 -uh and then we saw the unconditional +free energy uh and then we saw the unconditional 0:49:55.680,0:49:57.920 -case - -0:49:56.240,0:50:00.640 -with unsupervised learning and then we +case with unsupervised learning and then we 0:49:57.920,0:50:02.640 -have seen the conditional case with the - -0:50:00.640,0:50:04.240 -uh self-supervised learning right where +have seen the conditional case with the uh self-supervised learning right where 0:50:02.640,0:50:07.280 -we have access to these - -0:50:04.240,0:50:09.760 -acts and the code to train +we have access to these acts and the code to train 0:50:07.280,0:50:11.760 -these two models uh like the code that i - -0:50:09.760,0:50:14.319 -use for training the conditional case +these two models uh like the code that i use for training the conditional case 0:50:11.760,0:50:16.480 -it's just the same code as i use for the - -0:50:14.319,0:50:18.160 -unsupervised in a supervised case but +it's just the same code as i use for the unsupervised in a supervised case but 0:50:16.480,0:50:20.079 -with one line change so - -0:50:18.160,0:50:22.559 -really really it doesn't take much +with one line change so really really it doesn't take much 0:50:20.079,0:50:25.680 -effort to put this together - -0:50:22.559,0:50:28.240 -what it took some effort was to draw +effort to put this together what it took some effort was to draw 0:50:25.680,0:50:30.640 -the slides but again that's just because - -0:50:28.240,0:50:33.680 -i like making things pretty +the slides but again that's just because i like making things pretty 0:50:30.640,0:50:37.119 -and that was it ah - -0:50:33.680,0:50:39.440 -thank you for listening questions please +and that was it ah thank you for listening questions please 0:50:37.119,0:50:41.520 -go on i mean it's done right class is - -0:50:39.440,0:50:44.720 -finished +go on i mean it's done right class is finished 0:50:41.520,0:50:44.720 -you can ask anything you want - -0:50:45.200,0:50:48.000 -are you still awake +you can ask anything you want are you still awake 0:50:51.280,0:50:55.119 -yes okay someone is awake can you - -0:50:53.040,0:50:57.359 -explain the input dimension of +yes okay someone is awake can you explain the input dimension of 0:50:55.119,0:50:59.200 -g again yes i can explain as much as you - -0:50:57.359,0:51:00.240 -want as you want so now it's office +g again yes i can explain as much as you want as you want so now it's office 0:50:59.200,0:51:02.880 -hours right - -0:51:00.240,0:51:04.400 -you can ask anything you want uh hold on +hours right you can ask anything you want uh hold on 0:51:02.880,0:51:05.839 -first question so can you explain the - -0:51:04.400,0:51:07.920 -input dimension of g +first question so can you explain the input dimension of g 0:51:05.839,0:51:09.680 -uh in this case let me go back to the - -0:51:07.920,0:51:13.200 -first case +uh in this case let me go back to the first case 0:51:09.680,0:51:15.839 -in this case g is 1 because it was - -0:51:13.200,0:51:18.400 -fed with z right and then the output was +in this case g is 1 because it was fed with z right and then the output was 0:51:15.839,0:51:20.559 -this g1 and g2 which were cosine and - -0:51:18.400,0:51:23.200 -sine +this g1 and g2 which were cosine and sine 0:51:20.559,0:51:24.559 -in the second case g is going to be the - -0:51:23.200,0:51:26.559 -input is going to be this +in the second case g is going to be the input is going to be this 0:51:24.559,0:51:28.160 -f which we don't know exactly the - -0:51:26.559,0:51:31.599 -dimension can be anything +f which we don't know exactly the dimension can be anything 0:51:28.160,0:51:33.839 -so the dimension of f and then - -0:51:31.599,0:51:34.960 -that given that i know that z is one +so the dimension of f and then that given that i know that z is one 0:51:33.839,0:51:37.599 -dimensional - -0:51:34.960,0:51:38.079 -finally which is the super you know norm +dimensional finally which is the super you know norm 0:51:37.599,0:51:41.520 -like - -0:51:38.079,0:51:41.920 -the actual case that the more realistic +like the actual case that the more realistic 0:51:41.520,0:51:44.640 -case - -0:51:41.920,0:51:45.520 -is this one where we don't necessarily +case is this one where we don't necessarily 0:51:44.640,0:51:48.079 -know what is - -0:51:45.520,0:51:48.720 -the supposed the dimension for the +know what is the supposed the dimension for the 0:51:48.079,0:51:51.359 -latent - -0:51:48.720,0:51:53.359 -and therefore now we're going to use a +latent and therefore now we're going to use a 0:51:51.359,0:51:54.800 -whatever dimensional variable latent - -0:51:53.359,0:51:58.000 -variable but +whatever dimensional variable latent variable but 0:51:54.800,0:51:59.280 -it's going to be necessary to regularize - -0:51:58.000,0:52:02.319 -the +it's going to be necessary to regularize the 0:51:59.280,0:52:04.400 -loss functional otherwise as i was - -0:52:02.319,0:52:07.599 -pointing out you can easily overfit +loss functional otherwise as i was pointing out you can easily overfit 0:52:04.400,0:52:09.920 -by using that zero temperature limit - -0:52:07.599,0:52:11.359 -uh nevertheless you can use you can warm +by using that zero temperature limit uh nevertheless you can use you can warm 0:52:09.920,0:52:13.599 -up the the temperature - -0:52:11.359,0:52:15.280 -and use that as a regularizer of course +up the the temperature and use that as a regularizer of course 0:52:13.599,0:52:18.880 -right - -0:52:15.280,0:52:21.520 -did you get it uh yeah +right did you get it uh yeah 0:52:18.880,0:52:23.200 -so next question how does this look - -0:52:21.520,0:52:24.960 -without a latent variable +so next question how does this look without a latent variable 0:52:23.200,0:52:26.880 -okay without a latent variable it's - -0:52:24.960,0:52:30.079 -exactly as turning +okay without a latent variable it's exactly as turning 0:52:26.880,0:52:33.920 -beta to zero okay - -0:52:30.079,0:52:37.280 -so beta to zero you just average +beta to zero okay so beta to zero you just average 0:52:33.920,0:52:39.680 -over all possible values how - -0:52:37.280,0:52:41.599 -does what does happen what what are you +over all possible values how does what does happen what what are you 0:52:39.680,0:52:44.319 -going to be ending up having - -0:52:41.599,0:52:44.800 -if you start here on the left side and +going to be ending up having if you start here on the left side and 0:52:44.319,0:52:47.119 -then - -0:52:44.800,0:52:47.920 -instead of having all these arrows that +then instead of having all these arrows that 0:52:47.119,0:52:50.400 -are shaped - -0:52:47.920,0:52:51.119 -now like that all these arrows will have +are shaped now like that all these arrows will have 0:52:50.400,0:52:54.000 -the same - -0:52:51.119,0:52:55.599 -length well actually these points over +the same length well actually these points over 0:52:54.000,0:52:56.880 -here will be even longer now because - -0:52:55.599,0:53:00.960 -they are further away +here will be even longer now because they are further away 0:52:56.880,0:53:03.280 -so these ellipse will be pulled - -0:53:00.960,0:53:04.559 -in every direction and the way to +so these ellipse will be pulled in every direction and the way to 0:53:03.280,0:53:07.200 -minimize this energy - -0:53:04.559,0:53:08.480 -is actually to make it collapse in a +minimize this energy is actually to make it collapse in a 0:53:07.200,0:53:12.079 -single point - -0:53:08.480,0:53:13.760 -center in zero and so that's the actual +single point center in zero and so that's the actual 0:53:12.079,0:53:16.160 -it's a very good question right so what - -0:53:13.760,0:53:20.000 -is the classical +it's a very good question right so what is the classical 0:53:16.160,0:53:22.000 -failure mode in you know neural network - -0:53:20.000,0:53:24.000 -whenever you have multiple targets +failure mode in you know neural network whenever you have multiple targets 0:53:22.000,0:53:26.880 -associated to the same input - -0:53:24.000,0:53:29.440 -you end up predicting the average of all +associated to the same input you end up predicting the average of all 0:53:26.880,0:53:31.520 -the possible targets - -0:53:29.440,0:53:32.960 -in this case the average of all possible +the possible targets in this case the average of all possible 0:53:31.520,0:53:34.640 -targets that are all those - -0:53:32.960,0:53:36.960 -points in the ellipse is going to be +targets that are all those points in the ellipse is going to be 0:53:34.640,0:53:39.680 -just the point in the origin - -0:53:36.960,0:53:41.440 -which is like the collapse of your model +just the point in the origin which is like the collapse of your model 0:53:39.680,0:53:42.880 -right so that's a very good question and - -0:53:41.440,0:53:46.480 -the point is that +right so that's a very good question and the point is that 0:53:42.880,0:53:49.760 -if you try to learn multi modal output - -0:53:46.480,0:53:54.079 -multimodal data set a data with +if you try to learn multi modal output multimodal data set a data with 0:53:49.760,0:53:56.640 -a msc like without latent with zero - -0:53:54.079,0:53:57.440 -zero beta infinite temperature you're +a msc like without latent with zero zero beta infinite temperature you're 0:53:56.640,0:54:00.559 -just you know - -0:53:57.440,0:54:03.599 -collapsing uh to the mean +just you know collapsing uh to the mean 0:54:00.559,0:54:04.000 -the average right m e a and not m i n - -0:54:03.599,0:54:07.280 -mean +the average right m e a and not m i n mean 0:54:04.000,0:54:09.599 -average all right another question - -0:54:07.280,0:54:10.880 -uh to be clear at the zero temperature +average all right another question uh to be clear at the zero temperature 0:54:09.599,0:54:14.000 -limit the loss - -0:54:10.880,0:54:18.079 -is only considering the energy +limit the loss is only considering the energy 0:54:14.000,0:54:21.440 -of the nearest point yeah - -0:54:18.079,0:54:24.559 -and as we warm it up the loss is using +of the nearest point yeah and as we warm it up the loss is using 0:54:21.440,0:54:26.640 -a weighted sum of all points and yes - -0:54:24.559,0:54:28.480 -and the weighting weights that you're +a weighted sum of all points and yes and the weighting weights that you're 0:54:26.640,0:54:30.480 -using for the weight of the sum - -0:54:28.480,0:54:31.760 -is the are the weights that are coming +using for the weight of the sum is the are the weights that are coming 0:54:30.480,0:54:34.559 -from the uh - -0:54:31.760,0:54:36.000 -soft argument right if you take the arg +from the uh soft argument right if you take the arg 0:54:34.559,0:54:38.400 -softening - -0:54:36.000,0:54:39.760 -you have soft mean of the energy right +softening you have soft mean of the energy right 0:54:38.400,0:54:41.920 -so that's what you get - -0:54:39.760,0:54:44.160 -you have the soft mean of the energy +so that's what you get you have the soft mean of the energy 0:54:41.920,0:54:45.280 -right so the f tilde it's soft mean of - -0:54:44.160,0:54:48.240 -the energy +right so the f tilde it's soft mean of the energy 0:54:45.280,0:54:49.520 -you take the derivative of the softmin - -0:54:48.240,0:54:51.440 -you get the +you take the derivative of the softmin you get the 0:54:49.520,0:54:53.680 -what you get you get the exponential - -0:54:51.440,0:54:56.160 -divided by the sum of exponential +what you get you get the exponential divided by the sum of exponential 0:54:53.680,0:54:56.799 -so that's the soft argument right - -0:54:56.160,0:55:00.160 -multiply +so that's the soft argument right multiply 0:54:56.799,0:55:03.359 -by e prime what is e prime e - -0:55:00.160,0:55:04.960 -was the square distance so if you take +by e prime what is e prime e was the square distance so if you take 0:55:03.359,0:55:05.599 -the derivative of the square distance - -0:55:04.960,0:55:08.319 -you just get +the derivative of the square distance you just get 0:55:05.599,0:55:10.079 -the vector which is now multiplied by - -0:55:08.319,0:55:12.559 -this soft argument +the vector which is now multiplied by this soft argument 0:55:10.079,0:55:13.359 -so exactly what you said uh which is - -0:55:12.559,0:55:15.839 -very good +so exactly what you said uh which is very good 0:55:13.359,0:55:18.000 -summary i'm gonna just read it again and - -0:55:15.839,0:55:21.119 -i show the other chart +summary i'm gonna just read it again and i show the other chart 0:55:18.000,0:55:22.720 -so i just read your comment - -0:55:21.119,0:55:25.119 -to be clear at the zero temperature +so i just read your comment to be clear at the zero temperature 0:55:22.720,0:55:26.799 -limit the loss is only considering the - -0:55:25.119,0:55:28.480 -energy of the nearest point +limit the loss is only considering the energy of the nearest point 0:55:26.799,0:55:29.839 -the distance the square distance to the - -0:55:28.480,0:55:31.599 -closest point yeah +the distance the square distance to the closest point yeah 0:55:29.839,0:55:34.160 -and as you warm it up the loss is going - -0:55:31.599,0:55:37.200 -to be the weighted sum +and as you warm it up the loss is going to be the weighted sum 0:55:34.160,0:55:42.480 -of not the points right what is sum uh - -0:55:37.200,0:55:45.599 -of all those um contributions right +of not the points right what is sum uh of all those um contributions right 0:55:42.480,0:55:46.319 -the x uh this exponential of the minus - -0:55:45.599,0:55:49.119 -beta e +the x uh this exponential of the minus beta e 0:55:46.319,0:55:50.319 -right that's what that was written here - -0:55:49.119,0:55:51.760 -on the top right +right that's what that was written here on the top right 0:55:50.319,0:55:53.760 -so as you warm it up you're going to get - -0:55:51.760,0:55:56.160 -this exponential which is the soft mean +so as you warm it up you're going to get this exponential which is the soft mean 0:55:53.760,0:55:56.799 -so soft mean and then if you compute the - -0:55:56.160,0:55:59.280 -uh +so soft mean and then if you compute the uh 0:55:56.799,0:56:00.960 -the derivative you're going to get the - -0:55:59.280,0:56:02.640 -soft argument multiplied by the +the derivative you're going to get the soft argument multiplied by the 0:56:00.960,0:56:04.400 -derivative of the energy - -0:56:02.640,0:56:06.319 -which are the arrows multiplied by the +derivative of the energy which are the arrows multiplied by the 0:56:04.400,0:56:09.359 -soft argument so cool - -0:56:06.319,0:56:10.799 -what happens if we allow z to move +soft argument so cool what happens if we allow z to move 0:56:09.359,0:56:13.839 -freely into the space - -0:56:10.799,0:56:15.839 -we're going to basically get a collapsed +freely into the space we're going to basically get a collapsed 0:56:13.839,0:56:17.119 -network this model can simply output - -0:56:15.839,0:56:20.640 -zero everywhere +network this model can simply output zero everywhere 0:56:17.119,0:56:21.520 -and that's where you may need to use the - -0:56:20.640,0:56:24.559 -contrastive +and that's where you may need to use the contrastive 0:56:21.520,0:56:26.319 -uh cases right so in that case uh you - -0:56:24.559,0:56:28.319 -know a very easy way to get +uh cases right so in that case uh you know a very easy way to get 0:56:26.319,0:56:31.200 -zero energy is gonna be just everything - -0:56:28.319,0:56:33.119 -zero right uh but in this in the +zero energy is gonna be just everything zero right uh but in this in the 0:56:31.200,0:56:35.599 -in this case you can use the contrastive - -0:56:33.119,0:56:37.760 -case you can say oh no in this case it +in this case you can use the contrastive case you can say oh no in this case it 0:56:35.599,0:56:40.720 -should be larger than some margin - -0:56:37.760,0:56:42.319 -and so that's how you can deal with this +should be larger than some margin and so that's how you can deal with this 0:56:40.720,0:56:45.760 -larger than uh - -0:56:42.319,0:56:48.880 -like z into d okay so +larger than uh like z into d okay so 0:56:45.760,0:56:49.520 -taking beta okay so taking beta uh to - -0:56:48.880,0:56:51.760 -zero +taking beta okay so taking beta uh to zero 0:56:49.520,0:56:53.839 -would defeat the purpose of having a - -0:56:51.760,0:56:54.319 -latent variable at all that's exactly +would defeat the purpose of having a latent variable at all that's exactly 0:56:53.839,0:56:56.720 -yeah - -0:56:54.319,0:56:58.480 -and so this is what i kind of briefly +yeah and so this is what i kind of briefly 0:56:56.720,0:57:02.400 -show you i didn't talk about - -0:56:58.480,0:57:04.160 -but this is like a a quick uh derivation +show you i didn't talk about but this is like a a quick uh derivation 0:57:02.400,0:57:06.559 -by showing you that if you go beta - -0:57:04.160,0:57:07.440 -equals zero like the limit for beta that +by showing you that if you go beta equals zero like the limit for beta that 0:57:06.559,0:57:10.480 -tends to zero - -0:57:07.440,0:57:11.440 -you retrieve the average across all the +tends to zero you retrieve the average across all the 0:57:10.480,0:57:14.640 -latent - -0:57:11.440,0:57:16.079 -and that's basically the you end up with +latent and that's basically the you end up with 0:57:14.640,0:57:18.240 -having msc right - -0:57:16.079,0:57:19.920 -you end up throwing away all those kind +having msc right you end up throwing away all those kind 0:57:18.240,0:57:23.760 -of uh the goodies - -0:57:19.920,0:57:26.319 -right and that was pretty much it +of uh the goodies right and that was pretty much it 0:57:23.760,0:57:28.480 -how can you get more out of this lesson - -0:57:26.319,0:57:30.480 -firstly comprehension +how can you get more out of this lesson firstly comprehension 0:57:28.480,0:57:33.119 -if anything was not clear ask me - -0:57:30.480,0:57:34.480 -anything in the comment section below +if anything was not clear ask me anything in the comment section below 0:57:33.119,0:57:36.640 -if you would like to follow up with the - -0:57:34.480,0:57:39.760 -latest news follow me on twitter +if you would like to follow up with the latest news follow me on twitter 0:57:36.640,0:57:42.160 -under the endl alph cnz - -0:57:39.760,0:57:44.000 -if you would like to be notified when i +under the endl alph cnz if you would like to be notified when i 0:57:42.160,0:57:45.680 -upload the latest video - -0:57:44.000,0:57:47.839 -don't forget to subscribe to the channel +upload the latest video don't forget to subscribe to the channel 0:57:45.680,0:57:50.079 -and turn on the notification bell - -0:57:47.839,0:57:52.000 -and if you like this video don't forget +and turn on the notification bell and if you like this video don't forget 0:57:50.079,0:57:54.400 -to put a thumb up - -0:57:52.000,0:57:55.920 -this video has a transcript in english +to put a thumb up this video has a transcript in english 0:57:54.400,0:57:57.359 -and if you would like to contribute to - -0:57:55.920,0:58:00.160 -the translation in your language +and if you would like to contribute to the translation in your language 0:57:57.359,0:58:01.839 -please let me know so here as you can - -0:58:00.160,0:58:05.359 -see we have the +please let me know so here as you can see we have the 0:58:01.839,0:58:07.920 -write up where we can see all these - -0:58:05.359,0:58:10.319 -video that has been transcribed here in +write up where we can see all these video that has been transcribed here in 0:58:07.920,0:58:13.040 -plain english - -0:58:10.319,0:58:14.240 -and then again as i said before if we go +plain english and then again as i said before if we go 0:58:13.040,0:58:16.240 -back to the homepage - -0:58:14.240,0:58:18.720 -we can see here in the english flag and +back to the homepage we can see here in the english flag and 0:58:16.240,0:58:20.799 -we can select different languages - -0:58:18.720,0:58:22.079 -now we have arabic spanish version +we can select different languages now we have arabic spanish version 0:58:20.799,0:58:24.960 -french italian japanese - -0:58:22.079,0:58:25.920 -korean russian turkish and chinese and +french italian japanese korean russian turkish and chinese and 0:58:24.960,0:58:28.640 -your language is just - -0:58:25.920,0:58:29.359 -waiting for you to be translated in +your language is just waiting for you to be translated in 0:58:28.640,0:58:31.599 -finally - -0:58:29.359,0:58:32.960 -do play with notebook and by torch in +finally do play with notebook and by torch in 0:58:31.599,0:58:35.520 -order to get yourself - -0:58:32.960,0:58:36.640 -more acquainted with all these new +order to get yourself more acquainted with all these new 0:58:35.520,0:58:38.880 -topics - -0:58:36.640,0:58:41.040 -and then if you find any typo or +topics and then if you find any typo or 0:58:38.880,0:58:41.520 -mistakes or anything just please let me - -0:58:41.040,0:58:43.760 -know +mistakes or anything just please let me know 0:58:41.520,0:58:46.079 -directly on github or if you feel brave - -0:58:43.760,0:58:48.960 -enough you can even send a pull request +directly on github or if you feel brave enough you can even send a pull request 0:58:46.079,0:58:51.280 -it will be gladly appreciated thank you - -0:58:48.960,0:58:54.240 -for listening and don't forget to like +it will be gladly appreciated thank you for listening and don't forget to like 0:58:51.280,0:58:54.240 -share and subscribe - -0:58:55.000,0:58:58.000 -bye - +share and subscribe bye