diff --git a/docs/en/week14/practicum14.sbv b/docs/en/week14/practicum14.sbv
index 179935366..cb8a594c5 100644
--- a/docs/en/week14/practicum14.sbv
+++ b/docs/en/week14/practicum14.sbv
@@ -1,5679 +1,2837 @@
 0:00:00.320,0:00:07.759
-so today last lesson um
-
-0:00:04.160,0:00:09.599
-yeah i'm smiling but i'm sad uh
+so today last lesson um yeah i'm smiling but i'm sad uh
 
 0:00:07.759,0:00:12.559
-i wanted to talk about energy-based
-
-0:00:09.599,0:00:14.400
-models and how to train them
+i wanted to talk about energy-based models and how to train them
 
 0:00:12.559,0:00:16.080
-but i think i need to prepare like for a
-
-0:00:14.400,0:00:18.960
-month before that
+but i think i need to prepare like for a month before that
 
 0:00:16.080,0:00:20.640
-so actually uh if you are still
-
-0:00:18.960,0:00:22.000
-interested in this summer you will be
+so actually uh if you are still interested in this summer you will be
 
 0:00:20.640,0:00:24.720
-able to
-
-0:00:22.000,0:00:25.760
-get a tutorial on energy-based models uh
+able to get a tutorial on energy-based models uh
 
 0:00:24.720,0:00:28.880
-we are writing a paper
-
-0:00:25.760,0:00:30.560
-with jan together and so we actually i'm
+we are writing a paper with jan together and so we actually i'm
 
 0:00:28.880,0:00:32.559
-planning to get this paper written as
-
-0:00:30.560,0:00:33.600
-like part it's going to be math and then
+planning to get this paper written as like part it's going to be math and then
 
 0:00:32.559,0:00:35.040
-part is going to be actually the
-
-0:00:33.600,0:00:39.120
-implementation
+part is going to be actually the implementation
 
 0:00:35.040,0:00:41.840
-such that you can actually execute uh
-
-0:00:39.120,0:00:42.559
-the paper basically and you can get you
+such that you can actually execute uh the paper basically and you can get you
 
 0:00:41.840,0:00:44.399
-know
-
-0:00:42.559,0:00:47.600
-a better understanding of what's going
+know a better understanding of what's going
 
 0:00:44.399,0:00:50.480
-on um
-
-0:00:47.600,0:00:51.680
-yeah so yeah that's gonna come out maybe
+on um yeah so yeah that's gonna come out maybe
 
 0:00:50.480,0:00:54.480
-in a month uh
-
-0:00:51.680,0:00:56.079
-we we i have to do pretty a pretty good
+in a month uh we we i have to do pretty a pretty good
 
 0:00:54.480,0:00:59.440
-job there
-
-0:00:56.079,0:01:01.199
-um so and maybe uh if the
+job there um so and maybe uh if the
 
 0:00:59.440,0:01:02.960
-maybe we can even have a additional
-
-0:01:01.199,0:01:05.040
-class later on
+maybe we can even have a additional class later on
 
 0:01:02.960,0:01:06.799
-if you're interested and you know i'm
-
-0:01:05.040,0:01:10.000
-always here uh
+if you're interested and you know i'm always here uh
 
 0:01:06.799,0:01:11.280
-up for uh teaching you so again if
-
-0:01:10.000,0:01:12.240
-you're interested in this energy-based
+up for uh teaching you so again if you're interested in this energy-based
 
 0:01:11.280,0:01:14.400
-model later on
-
-0:01:12.240,0:01:15.439
-like outside the course and whatever we
+model later on like outside the course and whatever we
 
 0:01:14.400,0:01:18.640
-can again meet
-
-0:01:15.439,0:01:19.360
-and uh record and and pretend it's
+can again meet and uh record and and pretend it's
 
 0:01:18.640,0:01:22.560
-actually
-
-0:01:19.360,0:01:24.400
-one more class okay so yeah i i didn't
+actually one more class okay so yeah i i didn't
 
 0:01:22.560,0:01:26.560
-manage to do it for today
-
-0:01:24.400,0:01:28.240
-so today we're going to be covering um
+manage to do it for today so today we're going to be covering um
 
 0:01:26.560,0:01:31.520
-if i get to finish two
-
-0:01:28.240,0:01:35.280
-topics um we never
+if i get to finish two topics um we never
 
 0:01:31.520,0:01:36.960
-talked about them uh too much before uh
-
-0:01:35.280,0:01:38.880
-because they are more machine learning
+talked about them uh too much before uh because they are more machine learning
 
 0:01:36.960,0:01:41.119
-related but nevertheless
-
-0:01:38.880,0:01:42.240
-we care also in deep learning and the
+related but nevertheless we care also in deep learning and the
 
 0:01:41.119,0:01:45.360
-topic of today
-
-0:01:42.240,0:01:47.200
-is regularization overfitting and
+topic of today is regularization overfitting and
 
 0:01:45.360,0:01:50.479
-regularization let me start
-
-0:01:47.200,0:01:53.600
-sharing the screen so again this is my
+regularization let me start sharing the screen so again this is my
 
 0:01:50.479,0:01:56.000
-as usual perspective of
-
-0:01:53.600,0:01:57.280
-the topic uh it's not usually the
+as usual perspective of the topic uh it's not usually the
 
 0:01:56.000,0:02:00.320
-mainstream but you know
-
-0:01:57.280,0:02:02.560
-it's what you get since it's my
+mainstream but you know it's what you get since it's my
 
 0:02:00.320,0:02:04.079
-view and i'm your educator your
-
-0:02:02.560,0:02:06.479
-instructor today
+view and i'm your educator your instructor today
 
 0:02:04.079,0:02:08.879
-so overfitting and regularization
-
-0:02:06.479,0:02:11.039
-connection between them right so
+so overfitting and regularization connection between them right so
 
 0:02:08.879,0:02:12.160
-those are two different topics those are
-
-0:02:11.039,0:02:12.560
-two different things but they are of
+those are two different topics those are two different things but they are of
 
 0:02:12.160,0:02:16.480
-course
-
-0:02:12.560,0:02:16.959
-connected so i start with this drawing
+course connected so i start with this drawing
 
 0:02:16.480,0:02:19.360
-here
-
-0:02:16.959,0:02:20.720
-uh someone told me it's not intuitive
+here uh someone told me it's not intuitive
 
 0:02:19.360,0:02:23.760
-but again
-
-0:02:20.720,0:02:27.040
-for me so there you get it
+but again for me so there you get it
 
 0:02:23.760,0:02:30.000
-uh here i'm showing you in the
-
-0:02:27.040,0:02:30.720
-uh with the pink box the data complexity
+uh here i'm showing you in the uh with the pink box the data complexity
 
 0:02:30.000,0:02:35.200
-okay so
-
-0:02:30.720,0:02:37.920
-those dots are sampled from my
+okay so those dots are sampled from my
 
 0:02:35.200,0:02:38.879
-samples from from my training data set
-
-0:02:37.920,0:02:41.280
-and
+samples from from my training data set and
 
 0:02:38.879,0:02:42.800
-then i tried to fit their three
-
-0:02:41.280,0:02:46.800
-different models okay
+then i tried to fit their three different models okay
 
 0:02:42.800,0:02:50.160
-so in the first case that is
-
-0:02:46.800,0:02:52.480
-basically the model complexity is below
+so in the first case that is basically the model complexity is below
 
 0:02:50.160,0:02:54.480
-is under is
-
-0:02:52.480,0:02:55.519
-you know it's smaller than the data
+is under is you know it's smaller than the data
 
 0:02:54.480,0:02:57.599
-complexity
-
-0:02:55.519,0:02:59.120
-and therefore you have some phenomenon
+complexity and therefore you have some phenomenon
 
 0:02:57.599,0:03:00.239
-called under fitting right because you
-
-0:02:59.120,0:03:02.879
-try to fit
+called under fitting right because you try to fit
 
 0:03:00.239,0:03:06.480
-uh what looks like a parabola with a
-
-0:03:02.879,0:03:08.400
-straight line and therefore you're
+uh what looks like a parabola with a straight line and therefore you're
 
 0:03:06.480,0:03:09.920
-not you're not going you're not doing a
-
-0:03:08.400,0:03:12.560
-good job right
+not you're not going you're not doing a good job right
 
 0:03:09.920,0:03:14.239
-then what happened next here we actually
-
-0:03:12.560,0:03:17.040
-have the right fitting
+then what happened next here we actually have the right fitting
 
 0:03:14.239,0:03:17.599
-in this case the model complexity
-
-0:03:17.040,0:03:21.120
-matches
+in this case the model complexity matches
 
 0:03:17.599,0:03:22.560
-the data complexity right um and so
-
-0:03:21.120,0:03:24.480
-in this case what's the difference with
+the data complexity right um and so in this case what's the difference with
 
 0:03:22.560,0:03:27.280
-the previous case uh
-
-0:03:24.480,0:03:28.000
-in this case you have zero error right
+the previous case uh in this case you have zero error right
 
 0:03:27.280,0:03:31.280
-so your
-
-0:03:28.000,0:03:32.000
-your model exactly matches the training
+so your your model exactly matches the training
 
 0:03:31.280,0:03:36.239
-points those
-
-0:03:32.000,0:03:36.560
-points finally we have overfitting where
+points those points finally we have overfitting where
 
 0:03:36.239,0:03:39.920
-the
-
-0:03:36.560,0:03:43.200
-model complexity is actually
+the model complexity is actually
 
 0:03:39.920,0:03:44.799
-greater than the data complexity in this
-
-0:03:43.200,0:03:48.159
-case
+greater than the data complexity in this case
 
 0:03:44.799,0:03:51.519
-the model doesn't choose a parabola
-
-0:03:48.159,0:03:54.959
-because why question for your
+the model doesn't choose a parabola because why question for your
 
 0:03:51.519,0:03:58.239
-audience live my own live audience why
-
-0:03:54.959,0:04:00.400
-is this model like
+audience live my own live audience why is this model like
 
 0:03:58.239,0:04:03.040
-wiggly in this case why is not a
-
-0:04:00.400,0:04:05.200
-parabola
+wiggly in this case why is not a parabola
 
 0:04:03.040,0:04:06.720
-and you're supposed to type in the chat
-
-0:04:05.200,0:04:08.879
-because otherwise
+and you're supposed to type in the chat because otherwise
 
 0:04:06.720,0:04:10.080
-i don't know if you're following so my
-
-0:04:08.879,0:04:12.959
-question is in
+i don't know if you're following so my question is in
 
 0:04:10.080,0:04:15.280
-the last case my data my model
-
-0:04:12.959,0:04:19.280
-complexity is superior than the
+the last case my data my model complexity is superior than the
 
 0:04:15.280,0:04:21.120
-it's larger than the data complexity and
-
-0:04:19.280,0:04:22.800
-although those points look like they
+it's larger than the data complexity and although those points look like they
 
 0:04:21.120,0:04:26.320
-belong to a parabola
-
-0:04:22.800,0:04:28.560
-my model decides to get that spiky guy
+belong to a parabola my model decides to get that spiky guy
 
 0:04:26.320,0:04:30.479
-like spiky peak on the left and you know
-
-0:04:28.560,0:04:35.759
-some weird stuff
+like spiky peak on the left and you know some weird stuff
 
 0:04:30.479,0:04:37.440
-model doesn't learn but memorizes um
-
-0:04:35.759,0:04:39.759
-overfitting but sure sure it's written
+model doesn't learn but memorizes um overfitting but sure sure it's written
 
 0:04:37.440,0:04:41.600
-they're overfitting but why
-
-0:04:39.759,0:04:44.240
-if those points are coming from a
+they're overfitting but why if those points are coming from a
 
 0:04:41.600,0:04:45.600
-parabola i would expect even a very
-
-0:04:44.240,0:04:47.520
-larger model
+parabola i would expect even a very larger model
 
 0:04:45.600,0:04:48.720
-would make like a very nice parabola
-
-0:04:47.520,0:04:51.440
-right
+would make like a very nice parabola right
 
 0:04:48.720,0:04:52.240
-you're privately writing to me don't
-
-0:04:51.440,0:04:56.479
-private
+you're privately writing to me don't private
 
 0:04:52.240,0:04:59.600
-private right um
-
-0:04:56.479,0:05:02.160
-so if
+private right um so if
 
 0:04:59.600,0:05:04.000
-and this is a big if right if my points
-
-0:05:02.160,0:05:06.800
-my training points come from
+and this is a big if right if my points my training points come from
 
 0:05:04.000,0:05:09.520
-an actual parabola even the overfitting
-
-0:05:06.800,0:05:12.240
-model would be making a perfect parabola
+an actual parabola even the overfitting model would be making a perfect parabola
 
 0:05:09.520,0:05:12.720
-the point here is that there is some
-
-0:05:12.240,0:05:14.800
-noise
+the point here is that there is some noise
 
 0:05:12.720,0:05:17.680
-right there is always some noise and
-
-0:05:14.800,0:05:20.960
-therefore the model that perfectly
+right there is always some noise and therefore the model that perfectly
 
 0:05:17.680,0:05:23.280
-goes through every training point
-
-0:05:20.960,0:05:24.400
-will be like that it's going to be like
+goes through every training point will be like that it's going to be like
 
 0:05:23.280,0:05:26.960
-crazy because
-
-0:05:24.400,0:05:28.479
-all those points don't exactly live on
+crazy because all those points don't exactly live on
 
 0:05:26.960,0:05:31.360
-the parabola but they are
-
-0:05:28.479,0:05:32.479
-slightly offset and in order to be
+the parabola but they are slightly offset and in order to be
 
 0:05:31.360,0:05:34.960
-perfectly
-
-0:05:32.479,0:05:36.400
-uh going through them you're gonna have
+perfectly uh going through them you're gonna have
 
 0:05:34.960,0:05:37.120
-you know the mother is gonna have to try
-
-0:05:36.400,0:05:39.680
-to
+you know the mother is gonna have to try to
 
 0:05:37.120,0:05:40.960
-come up with some funky function okay
-
-0:05:39.680,0:05:44.080
-does it make sense
+come up with some funky function okay does it make sense
 
 0:05:40.960,0:05:47.120
-so the point is that without noise
-
-0:05:44.080,0:05:49.759
-this would be just a perfect parabola
+so the point is that without noise this would be just a perfect parabola
 
 0:05:47.120,0:05:53.840
-so someone would say okay maybe we
-
-0:05:49.759,0:05:53.840
-should use the right fitting right
+so someone would say okay maybe we should use the right fitting right
 
 0:05:54.080,0:05:57.759
-in machine learning maybe we are doing
-
-0:05:56.319,0:06:00.479
-deep learning
+in machine learning maybe we are doing deep learning
 
 0:05:57.759,0:06:01.840
-and it's not quite the case right
-
-0:06:00.479,0:06:03.759
-fitting
+and it's not quite the case right fitting
 
 0:06:01.840,0:06:05.199
-it's it's definitely not the case
-
-0:06:03.759,0:06:08.639
-actually our models
+it's it's definitely not the case actually our models
 
 0:06:05.199,0:06:10.960
-are so so so so powerful that they
-
-0:06:08.639,0:06:12.000
-even managed to learn noise like there
+are so so so so powerful that they even managed to learn noise like there
 
 0:06:10.960,0:06:14.319
-was a paper there
-
-0:06:12.000,0:06:15.520
-where they were showing that you can
+was a paper there where they were showing that you can
 
 0:06:14.319,0:06:18.720
-label imagenet
-
-0:06:15.520,0:06:19.280
-with random labels you can get a network
+label imagenet with random labels you can get a network
 
 0:06:18.720,0:06:21.759
-to
-
-0:06:19.280,0:06:22.319
-you know perfectly memorize every label
+to you know perfectly memorize every label
 
 0:06:21.759,0:06:25.039
-uh
-
-0:06:22.319,0:06:25.759
-for each of these samples so you can
+uh for each of these samples so you can
 
 0:06:25.039,0:06:28.080
-clearly
-
-0:06:25.759,0:06:30.400
-tell that these the models we are using
+clearly tell that these the models we are using
 
 0:06:28.080,0:06:32.080
-are absolutely over parameterized and
-
-0:06:30.400,0:06:35.520
-therefore means that
+are absolutely over parameterized and therefore means that
 
 0:06:32.080,0:06:38.720
-you have way more power than
-
-0:06:35.520,0:06:40.240
-the uh you know then then it's necessary
+you have way more power than the uh you know then then it's necessary
 
 0:06:38.720,0:06:44.080
-in order to learn
-
-0:06:40.240,0:06:47.759
-the structure of the data nevertheless
+in order to learn the structure of the data nevertheless
 
 0:06:44.080,0:06:50.560
-we actually need that hmm
-
-0:06:47.759,0:06:52.960
-so let's figure out what's going on okay
+we actually need that hmm so let's figure out what's going on okay
 
 0:06:50.560,0:06:52.960
-um
-
-0:06:53.199,0:06:57.120
-oh actually maybe you know the answer
+um oh actually maybe you know the answer
 
 0:06:54.639,0:06:58.960
-right so what is the point
-
-0:06:57.120,0:07:00.400
-why do we want to go in very very high
+right so what is the point why do we want to go in very very high
 
 0:06:58.960,0:07:03.840
-dimensional space
-
-0:07:00.400,0:07:06.479
-i told you a few times right because
+dimensional space i told you a few times right because
 
 0:07:03.840,0:07:06.479
-who answers
-
-0:07:07.199,0:07:14.960
-come on it's the last class answer me
+who answers come on it's the last class answer me
 
 0:07:12.000,0:07:15.199
-why do we want to go in very to expand
-
-0:07:14.960,0:07:17.680
-the
+why do we want to go in very to expand the
 
 0:07:15.199,0:07:19.039
-the data distribution yeah optimization
-
-0:07:17.680,0:07:20.960
-is easier yeah fantastic
+the data distribution yeah optimization is easier yeah fantastic
 
 0:07:19.039,0:07:23.199
-that's the point right whenever we go in
-
-0:07:20.960,0:07:25.120
-a hype over parameterized space
+that's the point right whenever we go in a hype over parameterized space
 
 0:07:23.199,0:07:27.280
-everything is very easy to move around
-
-0:07:25.120,0:07:29.039
-right and therefore we always
+everything is very easy to move around right and therefore we always
 
 0:07:27.280,0:07:30.319
-would like to put ourselves in the
-
-0:07:29.039,0:07:32.400
-overfitting
+would like to put ourselves in the overfitting
 
 0:07:30.319,0:07:35.120
-scenarios with our networks because it's
-
-0:07:32.400,0:07:37.919
-the training is going to be easier
+scenarios with our networks because it's the training is going to be easier
 
 0:07:35.120,0:07:38.960
-nevertheless what's the problem now well
-
-0:07:37.919,0:07:41.280
-the problem is they
+nevertheless what's the problem now well the problem is they
 
 0:07:38.960,0:07:42.800
-they're going to be like they wiggle
-
-0:07:41.280,0:07:46.080
-like crazy
+they're going to be like they wiggle like crazy
 
 0:07:42.800,0:07:48.400
-um another another thing um
-
-0:07:46.080,0:07:49.199
-so this is point number one point number
+um another another thing um so this is point number one point number
 
 0:07:48.400,0:07:52.400
-two
-
-0:07:49.199,0:07:55.680
-why would you think you actually
+two why would you think you actually
 
 0:07:52.400,0:07:59.840
-have to overfit
-
-0:07:55.680,0:07:59.840
-when writing your script
+have to overfit when writing your script
 
 0:08:01.199,0:08:05.840
-second question i know interactive
-
-0:08:04.400,0:08:09.520
-question today
+second question i know interactive question today
 
 0:08:05.840,0:08:13.840
-actually sure there is some trend
-
-0:08:09.520,0:08:15.120
-you can model okay maybe
+actually sure there is some trend you can model okay maybe
 
 0:08:13.840,0:08:17.919
-maybe it's in the right direction but
-
-0:08:15.120,0:08:21.199
-it's too complicated as an answer
+maybe it's in the right direction but it's too complicated as an answer
 
 0:08:17.919,0:08:23.199
-um so are you experts
-
-0:08:21.199,0:08:24.879
-you're a network trainer you should be
+um so are you experts you're a network trainer you should be
 
 0:08:23.199,0:08:28.560
-right because you've been
-
-0:08:24.879,0:08:31.680
-following these lessons for a bit but um
+right because you've been following these lessons for a bit but um
 
 0:08:28.560,0:08:32.479
-at the beginning okay try to answer this
-
-0:08:31.680,0:08:36.159
-question
+at the beginning okay try to answer this question
 
 0:08:32.479,0:08:36.159
-so why would you like to overfit
-
-0:08:36.479,0:08:42.159
-i even tell you one one bit more i would
+so why would you like to overfit i even tell you one one bit more i would
 
 0:08:39.760,0:08:43.039
-always i do always start training my
-
-0:08:42.159,0:08:45.680
-network on
+always i do always start training my network on
 
 0:08:43.039,0:08:45.680
-one batch
-
-0:08:46.880,0:08:50.959
-if the model has capabilities so this is
+one batch if the model has capabilities so this is
 
 0:08:49.519,0:08:54.399
-the number one rule
-
-0:08:50.959,0:08:56.640
-to debug machine learning code okay
+the number one rule to debug machine learning code okay
 
 0:08:54.399,0:08:57.920
-you would like to see whether you [ __ ]
-
-0:08:56.640,0:09:00.959
-up in your
+you would like to see whether you [ __ ] up in your
 
 0:08:57.920,0:09:03.440
-model creation okay so first thing
-
-0:09:00.959,0:09:04.480
-you can just get a batch of the correct
+model creation okay so first thing you can just get a batch of the correct
 
 0:09:03.440,0:09:07.120
-size
-
-0:09:04.480,0:09:07.600
-uh even with random noise right even you
+size uh even with random noise right even you
 
 0:09:07.120,0:09:10.080
-know
-
-0:09:07.600,0:09:11.440
-torch dot trend something with random
+know torch dot trend something with random
 
 0:09:10.080,0:09:13.440
-labels
-
-0:09:11.440,0:09:15.360
-and then you would like to go over a few
+labels and then you would like to go over a few
 
 0:09:13.440,0:09:17.120
-epochs with one batch
-
-0:09:15.360,0:09:19.200
-with random crap which could be the
+epochs with one batch with random crap which could be the
 
 0:09:17.120,0:09:21.680
-first batch of your data set or whatever
-
-0:09:19.200,0:09:22.320
-just to prove that your model can learn
+first batch of your data set or whatever just to prove that your model can learn
 
 0:09:21.680,0:09:25.279
-okay
-
-0:09:22.320,0:09:25.760
-you can easily make some tiny mistakes
+okay you can easily make some tiny mistakes
 
 0:09:25.279,0:09:28.080
-uh
-
-0:09:25.760,0:09:29.360
-like i made a few times like doing the
+uh like i made a few times like doing the
 
 0:09:28.080,0:09:34.560
-zero
-
-0:09:29.360,0:09:34.560
-zero grad uh after the backward
+zero zero grad uh after the backward
 
 0:09:35.279,0:09:39.200
-yeah i know it happens and nothing
-
-0:09:37.519,0:09:40.800
-happens nothing learns okay so you
+yeah i know it happens and nothing happens nothing learns okay so you
 
 0:09:39.200,0:09:43.440
-always want to see
-
-0:09:40.800,0:09:44.080
-that your model model can learn right
+always want to see that your model model can learn right
 
 0:09:43.440,0:09:46.000
-then if
-
-0:09:44.080,0:09:48.080
-you can memorize yeah fantastic we are
+then if you can memorize yeah fantastic we are
 
 0:09:46.000,0:09:50.880
-going to be now learning how to
-
-0:09:48.080,0:09:51.680
-uh improve performance of a model that
+going to be now learning how to uh improve performance of a model that
 
 0:09:50.880,0:09:54.640
-memorizes
-
-0:09:51.680,0:09:55.920
-uh its own data okay so two reasons
+memorizes uh its own data okay so two reasons
 
 0:09:54.640,0:09:57.120
-right first one we said over
-
-0:09:55.920,0:10:00.000
-parameterize
+right first one we said over parameterize
 
 0:09:57.120,0:10:01.920
-uh models are easy to train because the
-
-0:10:00.000,0:10:04.959
-landscape is much smoother
+uh models are easy to train because the landscape is much smoother
 
 0:10:01.920,0:10:06.720
-us and you know if you have a
-
-0:10:04.959,0:10:08.399
-over parameterized model you're gonna
+us and you know if you have a over parameterized model you're gonna
 
 0:10:06.720,0:10:10.000
-have you can
-
-0:10:08.399,0:10:11.839
-uh ideally start with different
+have you can uh ideally start with different
 
 0:10:10.000,0:10:14.240
-initializations so you get initial
-
-0:10:11.839,0:10:15.360
-points in the parameter space and then
+initializations so you get initial points in the parameter space and then
 
 0:10:14.240,0:10:17.120
-whenever you train
-
-0:10:15.360,0:10:18.640
-these different models all of them will
+whenever you train these different models all of them will
 
 0:10:17.120,0:10:21.760
-converge to a different
-
-0:10:18.640,0:10:25.120
-position because you can
+converge to a different position because you can
 
 0:10:21.760,0:10:28.000
-think about like a same model you can
-
-0:10:25.120,0:10:29.360
-permute all the weights you're gonna get
+think about like a same model you can permute all the weights you're gonna get
 
 0:10:28.000,0:10:30.000
-i mean you permeate the weights per
-
-0:10:29.360,0:10:33.040
-layer
+i mean you permeate the weights per layer
 
 0:10:30.000,0:10:35.279
-you can still get the same uh model
-
-0:10:33.040,0:10:37.440
-at the end so they are comparable in
+you can still get the same uh model at the end so they are comparable in
 
 0:10:35.279,0:10:40.000
-terms of the function approximator
-
-0:10:37.440,0:10:41.680
-you are building nevertheless in the
+terms of the function approximator you are building nevertheless in the
 
 0:10:40.000,0:10:42.800
-parameter space they are not the same
-
-0:10:41.680,0:10:45.519
-right so in the function
+parameter space they are not the same right so in the function
 
 0:10:42.800,0:10:46.880
-space they are exactly equivalent models
-
-0:10:45.519,0:10:48.800
-in the parameter space they are
+space they are exactly equivalent models in the parameter space they are
 
 0:10:46.880,0:10:51.760
-absolutely different models
-
-0:10:48.800,0:10:53.519
-nevertheless they will converge to
+absolutely different models nevertheless they will converge to
 
 0:10:51.760,0:10:55.440
-equivalently
-
-0:10:53.519,0:10:58.560
-equivalent models as in they will
+equivalently equivalent models as in they will
 
 0:10:55.440,0:11:01.760
-perform equivalently equivalently
-
-0:10:58.560,0:11:05.040
-good right are you following right am i
+perform equivalently equivalently good right are you following right am i
 
 0:11:01.760,0:11:06.880
-talking about weird stuff today but uh
-
-0:11:05.040,0:11:08.079
-i guess this counts a bit also from
+talking about weird stuff today but uh i guess this counts a bit also from
 
 0:11:06.880,0:11:10.160
-joanne's class
-
-0:11:08.079,0:11:11.120
-where we talk about parameter space and
+joanne's class where we talk about parameter space and
 
 0:11:10.160,0:11:13.120
-functional
-
-0:11:11.120,0:11:14.880
-uh functional space it's so so cool in
+functional uh functional space it's so so cool in
 
 0:11:13.120,0:11:17.600
-that class i think next year
-
-0:11:14.880,0:11:18.240
-i will try to put it online as well okay
+that class i think next year i will try to put it online as well okay
 
 0:11:17.600,0:11:19.680
-okay so
-
-0:11:18.240,0:11:21.680
-first point over pardon me over
+okay so first point over pardon me over
 
 0:11:19.680,0:11:24.079
-parameterization helps with training
-
-0:11:21.680,0:11:25.120
-second point or over parameterization
+parameterization helps with training second point or over parameterization
 
 0:11:24.079,0:11:28.320
-helps you with
-
-0:11:25.120,0:11:31.279
-math debugging can you repeat the point
+helps you with math debugging can you repeat the point
 
 0:11:28.320,0:11:33.680
-about function and parameter space yeah
-
-0:11:31.279,0:11:35.279
-so if you have a neural net and you
+about function and parameter space yeah so if you have a neural net and you
 
 0:11:33.680,0:11:38.320
-permute the rows
-
-0:11:35.279,0:11:38.880
-in your matrices right and then you
+permute the rows in your matrices right and then you
 
 0:11:38.320,0:11:42.399
-permute
-
-0:11:38.880,0:11:46.000
-the uh the column of the
+permute the uh the column of the
 
 0:11:42.399,0:11:47.600
-uh the next layer you can basically
-
-0:11:46.000,0:11:48.800
-you know you can reorganize the weight
+uh the next layer you can basically you know you can reorganize the weight
 
 0:11:47.600,0:11:49.839
-so you can get always the same
-
-0:11:48.800,0:11:51.360
-performance right
+so you can get always the same performance right
 
 0:11:49.839,0:11:53.200
-so if you have the first matrix you have
-
-0:11:51.360,0:11:54.079
-first element of the hidden layer equal
+so if you have the first matrix you have first element of the hidden layer equal
 
 0:11:53.200,0:11:55.920
-some number
-
-0:11:54.079,0:11:58.240
-let's say the hidden layer has size of
+some number let's say the hidden layer has size of
 
 0:11:55.920,0:11:59.200
-two right so you have a matrix with two
-
-0:11:58.240,0:12:01.040
-rows
+two right so you have a matrix with two rows
 
 0:11:59.200,0:12:03.600
-and so you can swap the the rows you're
-
-0:12:01.040,0:12:06.320
-gonna get a hidden layer that is flipped
+and so you can swap the the rows you're gonna get a hidden layer that is flipped
 
 0:12:03.600,0:12:07.120
-and then the last the next next weight
-
-0:12:06.320,0:12:10.399
-matrix
+and then the last the next next weight matrix
 
 0:12:07.120,0:12:13.839
-you can flip the um
-
-0:12:10.399,0:12:17.519
-columns i guess uh and you would get
+you can flip the um columns i guess uh and you would get
 
 0:12:13.839,0:12:19.519
-exactly the same network the same
-
-0:12:17.519,0:12:20.959
-you would sorry you would get exactly
+exactly the same network the same you would sorry you would get exactly
 
 0:12:19.519,0:12:22.560
-the same function
-
-0:12:20.959,0:12:24.079
-it's gonna give you exactly the same
+the same function it's gonna give you exactly the same
 
 0:12:22.560,0:12:26.399
-number as an output
-
-0:12:24.079,0:12:27.519
-although the parameters the the
+number as an output although the parameters the the
 
 0:12:26.399,0:12:29.440
-parameters are actually
-
-0:12:27.519,0:12:30.720
-different right because you swap them so
+parameters are actually different right because you swap them so
 
 0:12:29.440,0:12:36.240
-the same parameter
-
-0:12:30.720,0:12:37.680
-w11 is going to be w21 right so they are
+the same parameter w11 is going to be w21 right so they are
 
 0:12:36.240,0:12:39.200
-different so in the parameter space
-
-0:12:37.680,0:12:41.360
-these are different models so there are
+different so in the parameter space these are different models so there are
 
 0:12:39.200,0:12:42.560
-one point is here in the parameter space
-
-0:12:41.360,0:12:44.720
-another point is here
+one point is here in the parameter space another point is here
 
 0:12:42.560,0:12:46.880
-nevertheless the mapping from the
-
-0:12:44.720,0:12:49.120
-parameter space to the functional space
+nevertheless the mapping from the parameter space to the functional space
 
 0:12:46.880,0:12:50.639
-both of them both these two initial
-
-0:12:49.120,0:12:53.120
-those two configuration
+both of them both these two initial those two configuration
 
 0:12:50.639,0:12:54.079
-will map to the same function right
-
-0:12:53.120,0:12:55.920
-because the
+will map to the same function right because the
 
 0:12:54.079,0:12:58.079
-function connects the input to the
-
-0:12:55.920,0:13:00.160
-output and they're going to be the same
+function connects the input to the output and they're going to be the same
 
 0:12:58.079,0:13:01.200
-even if you do this permutation of the
-
-0:13:00.160,0:13:05.440
-rows and then and
+even if you do this permutation of the rows and then and
 
 0:13:01.200,0:13:08.800
-of the columns right makes sense
-
-0:13:05.440,0:13:12.560
-so if if we
+of the columns right makes sense so if if we
 
 0:13:08.800,0:13:12.560
-if the space of parameters
-
-0:13:12.720,0:13:17.120
-if the space for parameter space is very
+if the space of parameters if the space for parameter space is very
 
 0:13:14.880,0:13:20.320
-big for a given data set can we say
-
-0:13:17.120,0:13:21.680
-that the model is very uncertain about
+big for a given data set can we say that the model is very uncertain about
 
 0:13:20.320,0:13:23.360
-its prediction okay we are going to be
-
-0:13:21.680,0:13:25.120
-talking about uncertainty in a bit so
+its prediction okay we are going to be talking about uncertainty in a bit so
 
 0:13:23.360,0:13:27.839
-i'll address that in a bit
-
-0:13:25.120,0:13:28.480
-all right so we always start with the
+i'll address that in a bit all right so we always start with the
 
 0:13:27.839,0:13:31.600
-third
-
-0:13:28.480,0:13:33.600
-uh column here with overfitting uh i
+third uh column here with overfitting uh i
 
 0:13:31.600,0:13:35.920
-always want to have a model that is over
-
-0:13:33.600,0:13:37.760
-parameterized because it's easy to learn
+always want to have a model that is over parameterized because it's easy to learn
 
 0:13:35.920,0:13:39.120
-and also it's going to be powerful in
-
-0:13:37.760,0:13:40.160
-terms
+and also it's going to be powerful in terms
 
 0:13:39.120,0:13:42.240
-in the sense that it's going to be
-
-0:13:40.160,0:13:45.600
-learning more than what we
+in the sense that it's going to be learning more than what we
 
 0:13:42.240,0:13:47.600
-expect um and so
-
-0:13:45.600,0:13:48.720
-how do we deal with these overfitting
+expect um and so how do we deal with these overfitting
 
 0:13:47.600,0:13:51.199
-how do we
-
-0:13:48.720,0:13:52.800
-improve now the validation or tasting
+how do we improve now the validation or tasting
 
 0:13:51.199,0:13:55.120
-performances right so
-
-0:13:52.800,0:13:56.560
-we we said that overfitting means uh we
+performances right so we we said that overfitting means uh we
 
 0:13:55.120,0:13:57.839
-didn't say we're gonna see that next
-
-0:13:56.560,0:13:59.920
-slide but
+didn't say we're gonna see that next slide but
 
 0:13:57.839,0:14:01.040
-here we see how to fight this kind of
-
-0:13:59.920,0:14:03.279
-you know overfitting
+here we see how to fight this kind of you know overfitting
 
 0:14:01.040,0:14:04.560
-so we start from the right hand side
-
-0:14:03.279,0:14:06.639
-where we introduce
+so we start from the right hand side where we introduce
 
 0:14:04.560,0:14:08.320
-this weak regularizer so there is no
-
-0:14:06.639,0:14:11.680
-regularization
+this weak regularizer so there is no regularization
 
 0:14:08.320,0:14:12.399
-therefore the last plot the sixth plot
-
-0:14:11.680,0:14:15.440
-here
+therefore the last plot the sixth plot here
 
 0:14:12.399,0:14:18.560
-is the same as my third plot okay
-
-0:14:15.440,0:14:21.600
-then i keep uh adding some
+is the same as my third plot okay then i keep uh adding some
 
 0:14:18.560,0:14:23.360
-medium regularizer and so i
-
-0:14:21.600,0:14:24.959
-i like to think about this as you know
+medium regularizer and so i i like to think about this as you know
 
 0:14:23.360,0:14:29.120
-smoothing edges right so my
-
-0:14:24.959,0:14:31.279
-square gets around edges
+smoothing edges right so my square gets around edges
 
 0:14:29.120,0:14:34.399
-and you can tell now that this second
-
-0:14:31.279,0:14:37.440
-plot here is different from my second
+and you can tell now that this second plot here is different from my second
 
 0:14:34.399,0:14:39.279
-window here right so the the
-
-0:14:37.440,0:14:41.120
-medium regularization is different from
+window here right so the the medium regularization is different from
 
 0:14:39.279,0:14:43.199
-this just right fitting
-
-0:14:41.120,0:14:44.800
-as you can see there are some you know
+this just right fitting as you can see there are some you know
 
 0:14:43.199,0:14:48.079
-corners here
-
-0:14:44.800,0:14:48.959
-finally if you crank up this medicine
+corners here finally if you crank up this medicine
 
 0:14:48.079,0:14:50.480
-this kind of
-
-0:14:48.959,0:14:52.320
-you know it's like a drug you you're
+this kind of you know it's like a drug you you're
 
 0:14:50.480,0:14:55.519
-drugging you're hitting you're
-
-0:14:52.320,0:14:57.760
-poisoning your model for to restrict the
+drugging you're hitting you're poisoning your model for to restrict the
 
 0:14:55.519,0:14:59.199
-it's it's power then you get like a very
-
-0:14:57.760,0:15:01.440
-strong regularizer
+it's it's power then you get like a very strong regularizer
 
 0:14:59.199,0:15:02.720
-which gives you the the circular one
-
-0:15:01.440,0:15:05.440
-that's this this is my
+which gives you the the circular one that's this this is my
 
 0:15:02.720,0:15:06.480
-mental image anyhow we we gave you i
-
-0:15:05.440,0:15:08.480
-think i give you my
+mental image anyhow we we gave you i think i give you my
 
 0:15:06.480,0:15:10.000
-uh the big picture first and then let's
-
-0:15:08.480,0:15:13.040
-go on with the actual
+uh the big picture first and then let's go on with the actual
 
 0:15:10.000,0:15:13.920
-definitions right um so there are a few
-
-0:15:13.040,0:15:16.240
-definitions here
+definitions right um so there are a few definitions here
 
 0:15:13.920,0:15:18.079
-they are not quite equivalent but in
-
-0:15:16.240,0:15:21.440
-deep learning that's what we use
+they are not quite equivalent but in deep learning that's what we use
 
 0:15:18.079,0:15:24.160
-so here we go so the regularization
-
-0:15:21.440,0:15:25.920
-adds prior knowledge to a model a prior
+so here we go so the regularization adds prior knowledge to a model a prior
 
 0:15:24.160,0:15:27.120
-distribution is specified for the
-
-0:15:25.920,0:15:31.120
-parameters
+distribution is specified for the parameters
 
 0:15:27.120,0:15:34.000
-so we expect these parameters to be
-
-0:15:31.120,0:15:35.360
-coming from a specific distribution from
+so we expect these parameters to be coming from a specific distribution from
 
 0:15:34.000,0:15:39.040
-a specific
-
-0:15:35.360,0:15:41.279
-generation generating process okay
+a specific generation generating process okay
 
 0:15:39.040,0:15:43.759
-and then whenever we actually think
-
-0:15:41.279,0:15:46.959
-about regularization we can think about
+and then whenever we actually think about regularization we can think about
 
 0:15:43.759,0:15:49.120
-you know uh strongly assuming that these
-
-0:15:46.959,0:15:52.160
-parameters should be
+you know uh strongly assuming that these parameters should be
 
 0:15:49.120,0:15:54.560
-um coming from this specific
-
-0:15:52.160,0:15:56.639
-process that generates them okay so this
+um coming from this specific process that generates them okay so this
 
 0:15:54.560,0:15:58.639
-is talking about parameter space
-
-0:15:56.639,0:15:59.920
-then we can also talk about the
+is talking about parameter space then we can also talk about the
 
 0:15:58.639,0:16:02.240
-functional space
-
-0:15:59.920,0:16:04.959
-in this case we can be it can be seen a
+functional space in this case we can be it can be seen a
 
 0:16:02.240,0:16:08.000
-regularization is a restriction
-
-0:16:04.959,0:16:10.079
-of the set of possible learnable
+regularization is a restriction of the set of possible learnable
 
 0:16:08.000,0:16:11.920
-functions okay so these are again two
-
-0:16:10.079,0:16:14.720
-perspective one is on the weights
+functions okay so these are again two perspective one is on the weights
 
 0:16:11.920,0:16:15.199
-where how are supposed to be what kind
-
-0:16:14.720,0:16:17.040
-of
+where how are supposed to be what kind of
 
 0:16:15.199,0:16:18.720
-weights what kind of animals what kind
-
-0:16:17.040,0:16:20.800
-of objects
+weights what kind of animals what kind of objects
 
 0:16:18.720,0:16:23.680
-these weights are like they should be
-
-0:16:20.800,0:16:26.320
-somehow over some specific shape
+these weights are like they should be somehow over some specific shape
 
 0:16:23.680,0:16:27.759
-uh length or whatever structure there is
-
-0:16:26.320,0:16:30.800
-there is some structure that
+uh length or whatever structure there is there is some structure that
 
 0:16:27.759,0:16:32.959
-i assume uh in advance
-
-0:16:30.800,0:16:34.160
-that's the prior this means before in
+i assume uh in advance that's the prior this means before in
 
 0:16:32.959,0:16:36.240
-latin and in
-
-0:16:34.160,0:16:37.839
-others in another case instead if you
+latin and in others in another case instead if you
 
 0:16:36.240,0:16:39.920
-have all possible function
-
-0:16:37.839,0:16:41.519
-you'd like to find a restriction of
+have all possible function you'd like to find a restriction of
 
 0:16:39.920,0:16:45.279
-those possible functions
-
-0:16:41.519,0:16:47.759
-such that they are not too
+those possible functions such that they are not too
 
 0:16:45.279,0:16:48.399
-uh crazy okay they are not too extreme
-
-0:16:47.759,0:16:51.600
-as in
+uh crazy okay they are not too extreme as in
 
 0:16:48.399,0:16:54.480
-the way they behave ah
-
-0:16:51.600,0:16:55.600
-there's a question but in that image the
+the way they behave ah there's a question but in that image the
 
 0:16:54.480,0:16:58.959
-square is
-
-0:16:55.600,0:17:03.360
-still in the circle uh
+square is still in the circle uh
 
 0:16:58.959,0:17:05.760
-yeah i'm getting back
-
-0:17:03.360,0:17:06.400
-oh oh i see so maybe the circle should
+yeah i'm getting back oh oh i see so maybe the circle should
 
 0:17:05.760,0:17:09.520
-have been
-
-0:17:06.400,0:17:13.760
-smaller than the square okay
+have been smaller than the square okay
 
 0:17:09.520,0:17:16.400
-right good point um okay cool cool
-
-0:17:13.760,0:17:18.319
-finally that's the last definition of
+right good point um okay cool cool finally that's the last definition of
 
 0:17:16.400,0:17:20.480
-regularization which is the real
-
-0:17:18.319,0:17:21.760
-real deep learning part which is the
+regularization which is the real real deep learning part which is the
 
 0:17:20.480,0:17:25.520
-following which is
-
-0:17:21.760,0:17:28.880
-yeah kind of not it's like you know
+following which is yeah kind of not it's like you know
 
 0:17:25.520,0:17:30.880
-as a stretch
-
-0:17:28.880,0:17:34.160
-okay my google thinks i'm talking
+as a stretch okay my google thinks i'm talking
 
 0:17:30.880,0:17:34.160
-italian what the heck
-
-0:17:34.400,0:17:42.000
-okay regularization is any modification
+italian what the heck okay regularization is any modification
 
 0:17:38.880,0:17:44.640
-we make to a learning algorithm that is
-
-0:17:42.000,0:17:46.640
-intended to reduce its generalization
+we make to a learning algorithm that is intended to reduce its generalization
 
 0:17:44.640,0:17:48.000
-error but not its training error okay so
-
-0:17:46.640,0:17:51.280
-this is actually
+error but not its training error okay so this is actually
 
 0:17:48.000,0:17:53.280
-a stretch because it's no longer
-
-0:17:51.280,0:17:54.400
-talking about prior knowledge and
+a stretch because it's no longer talking about prior knowledge and
 
 0:17:53.280,0:17:57.360
-functional space
-
-0:17:54.400,0:17:59.039
-but actually modification to learning
+functional space but actually modification to learning
 
 0:17:57.360,0:18:01.520
-algorithms so this is like
-
-0:17:59.039,0:18:02.640
-moving towards maybe programming you
+algorithms so this is like moving towards maybe programming you
 
 0:18:01.520,0:18:06.000
-know
-
-0:18:02.640,0:18:08.160
-so parameters function then it's like
+know so parameters function then it's like
 
 0:18:06.000,0:18:10.160
-algorithmic implementation right so
-
-0:18:08.160,0:18:13.360
-these are really three different
+algorithmic implementation right so these are really three different
 
 0:18:10.160,0:18:16.400
-perspective of the same thing
-
-0:18:13.360,0:18:18.640
-cool so first let's start with
+perspective of the same thing cool so first let's start with
 
 0:18:16.400,0:18:19.919
-regularizing regularizing techniques a
-
-0:18:18.640,0:18:22.240
-few examples
+regularizing regularizing techniques a few examples
 
 0:18:19.919,0:18:24.400
-so first actually i start with xavier
-
-0:18:22.240,0:18:27.280
-initialization i told you before that we
+so first actually i start with xavier initialization i told you before that we
 
 0:18:24.400,0:18:29.919
-can think about these parameters as
-
-0:18:27.280,0:18:31.039
-coming from some generation generating
+can think about these parameters as coming from some generation generating
 
 0:18:29.919,0:18:32.960
-process right
-
-0:18:31.039,0:18:34.320
-so whenever you initialize a network you
+process right so whenever you initialize a network you
 
 0:18:32.960,0:18:37.600
-can choose to
-
-0:18:34.320,0:18:41.120
-to you can choose to select one uh
+can choose to to you can choose to select one uh
 
 0:18:37.600,0:18:44.799
-regular um one prior right so these are
-
-0:18:41.120,0:18:46.559
-this is defining where your um
+regular um one prior right so these are this is defining where your um
 
 0:18:44.799,0:18:48.400
-your your weights are coming from so in
-
-0:18:46.559,0:18:51.039
-this case we can choose xavier normal
+your your weights are coming from so in this case we can choose xavier normal
 
 0:18:48.400,0:18:55.039
-which is a initialization technique
-
-0:18:51.039,0:18:56.960
-and this assumes this kind of gaussian
+which is a initialization technique and this assumes this kind of gaussian
 
 0:18:55.039,0:18:58.960
-gaussian distribution right so you have
-
-0:18:56.960,0:19:00.480
-the weight space by weight values and
+gaussian distribution right so you have the weight space by weight values and
 
 0:18:58.960,0:19:02.240
-you know the most of them will be peaked
-
-0:19:00.480,0:19:03.200
-towards the zero and then you have some
+you know the most of them will be peaked towards the zero and then you have some
 
 0:19:02.240,0:19:07.280
-kind of
-
-0:19:03.200,0:19:09.039
-um some some kind of
+kind of um some some kind of
 
 0:19:07.280,0:19:12.320
-standard deviation that is based on the
-
-0:19:09.039,0:19:15.440
-size of the input and output
+standard deviation that is based on the size of the input and output
 
 0:19:12.320,0:19:16.400
-size of that specific layer and so from
-
-0:19:15.440,0:19:18.799
-here we can
+size of that specific layer and so from here we can
 
 0:19:16.400,0:19:20.720
-start introducing the weight decay
-
-0:19:18.799,0:19:21.280
-weight decay is the first regularization
+start introducing the weight decay weight decay is the first regularization
 
 0:19:20.720,0:19:23.840
-technique
-
-0:19:21.280,0:19:24.400
-that is widespread in machine learning
+technique that is widespread in machine learning
 
 0:19:23.840,0:19:27.280
-not
-
-0:19:24.400,0:19:28.320
-be not maybe too much in neural nets
+not be not maybe too much in neural nets
 
 0:19:27.280,0:19:31.200
-still relevant
-
-0:19:28.320,0:19:32.720
-so weight decay uh you can find it in
+still relevant so weight decay uh you can find it in
 
 0:19:31.200,0:19:35.440
-directly inside the
-
-0:19:32.720,0:19:35.840
-optim package like you it's a flag in
+directly inside the optim package like you it's a flag in
 
 0:19:35.440,0:19:38.320
-the
-
-0:19:35.840,0:19:39.039
-the different in the different optimizer
+the the different in the different optimizer
 
 0:19:38.320,0:19:41.919
-is also called
-
-0:19:39.039,0:19:43.039
-l2 regularization ridge regression or
+is also called l2 regularization ridge regression or
 
 0:19:41.919,0:19:44.799
-gaussian prior
-
-0:19:43.039,0:19:47.280
-which basically tells you that things
+gaussian prior which basically tells you that things
 
 0:19:44.799,0:19:48.799
-come from this gaussian process
-
-0:19:47.280,0:19:50.640
-or gaussian you know distribution
+come from this gaussian process or gaussian you know distribution
 
 0:19:48.799,0:19:53.200
-generating distribution
-
-0:19:50.640,0:19:54.559
-nevertheless we call it weight decay so
+generating distribution nevertheless we call it weight decay so
 
 0:19:53.200,0:19:57.840
-why do we call it weight decay
-
-0:19:54.559,0:19:59.360
-uh so this is first thing that you know
+why do we call it weight decay uh so this is first thing that you know
 
 0:19:57.840,0:19:59.679
-if you train neural net you're going to
-
-0:19:59.360,0:20:03.039
-call
+if you train neural net you're going to call
 
 0:19:59.679,0:20:05.600
-weight decay not the other things so
-
-0:20:03.039,0:20:07.520
-we can start with this j train that's
+weight decay not the other things so we can start with this j train that's
 
 0:20:05.600,0:20:10.559
-our objective
-
-0:20:07.520,0:20:12.320
-which is acting upon the parameters and
+our objective which is acting upon the parameters and
 
 0:20:10.559,0:20:14.080
-which is equal the old
-
-0:20:12.320,0:20:16.159
-training the the one without the
+which is equal the old training the the one without the
 
 0:20:14.080,0:20:20.240
-regularization
-
-0:20:16.159,0:20:22.400
-plus a penalty term like um like
+regularization plus a penalty term like um like
 
 0:20:20.240,0:20:23.440
-the following so we have the the square
-
-0:20:22.400,0:20:26.080
-norm
+the following so we have the the square norm
 
 0:20:23.440,0:20:27.679
-the square two norm right of these
-
-0:20:26.080,0:20:30.559
-parameters
+the square two norm right of these parameters
 
 0:20:27.679,0:20:30.880
-and so if you uh make the if you compute
-
-0:20:30.559,0:20:33.120
-the
+and so if you uh make the if you compute the
 
 0:20:30.880,0:20:34.320
-the gradient of course you're gonna get
-
-0:20:33.120,0:20:36.880
-just the
+the gradient of course you're gonna get just the
 
 0:20:34.320,0:20:39.760
-uh lambda theta right because the two
-
-0:20:36.880,0:20:42.559
-comes down simplifies you get that guy
+uh lambda theta right because the two comes down simplifies you get that guy
 
 0:20:39.760,0:20:43.200
-so if you think about this um second
-
-0:20:42.559,0:20:46.720
-equation
+so if you think about this um second equation
 
 0:20:43.200,0:20:50.799
-what you see you say that the theta gets
-
-0:20:46.720,0:20:54.080
-previous theta minus you know the
+what you see you say that the theta gets previous theta minus you know the
 
 0:20:50.799,0:20:57.600
-the minus step so like
-
-0:20:54.080,0:20:58.000
-minus a step towards uh the gradient
+the minus step so like minus a step towards uh the gradient
 
 0:20:57.600,0:20:59.679
-some
-
-0:20:58.000,0:21:01.600
-a step towards the opposite direction of
+some a step towards the opposite direction of
 
 0:20:59.679,0:21:03.360
-the gradient such that you can go
-
-0:21:01.600,0:21:05.280
-down the hill right in your training
+the gradient such that you can go down the hill right in your training
 
 0:21:03.360,0:21:09.039
-laws
-
-0:21:05.280,0:21:12.960
-minus some eta lambda which is a
+laws minus some eta lambda which is a
 
 0:21:09.039,0:21:12.960
-a scalar multiplying by
-
-0:21:13.120,0:21:16.720
-theta right and that means that it's
+a scalar multiplying by theta right and that means that it's
 
 0:21:15.360,0:21:17.679
-going to be you know the first part
-
-0:21:16.720,0:21:21.200
-there is you all go
+going to be you know the first part there is you all go
 
 0:21:17.679,0:21:21.840
-down the hill whereas the other one
-
-0:21:21.200,0:21:24.880
-tells you
+down the hill whereas the other one tells you
 
 0:21:21.840,0:21:27.760
-and go also towards where
-
-0:21:24.880,0:21:29.600
-zero right and so how does this how does
+and go also towards where zero right and so how does this how does
 
 0:21:27.760,0:21:31.679
-this look
-
-0:21:29.600,0:21:32.799
-so this looks like this right in every
+this look so this looks like this right in every
 
 0:21:31.679,0:21:35.600
-point so
-
-0:21:32.799,0:21:37.039
-consider we are already trained and the
+point so consider we are already trained and the
 
 0:21:35.600,0:21:39.280
-training loss is zero
-
-0:21:37.039,0:21:40.880
-and we just consider the second term
+training loss is zero and we just consider the second term
 
 0:21:39.280,0:21:44.320
-right so let's consider
-
-0:21:40.880,0:21:47.200
-we already finished training so
+right so let's consider we already finished training so
 
 0:21:44.320,0:21:49.919
-there is no there is not this term we
-
-0:21:47.200,0:21:53.600
-just have theta
+there is no there is not this term we just have theta
 
 0:21:49.919,0:21:56.640
-minus eta lambda theta what does it mean
-
-0:21:53.600,0:21:57.360
-so if there is no uh at the first term
+minus eta lambda theta what does it mean so if there is no uh at the first term
 
 0:21:56.640,0:22:00.240
-here
-
-0:21:57.360,0:22:01.520
-in any point you are so in theta you're
+here in any point you are so in theta you're
 
 0:22:00.240,0:22:04.640
-going to be subtracting
-
-0:22:01.520,0:22:07.679
-some multiplier some scalar you know i
+going to be subtracting some multiplier some scalar you know i
 
 0:22:04.640,0:22:08.400
-told you scalar a scalar is what scales
-
-0:22:07.679,0:22:12.080
-right
+told you scalar a scalar is what scales right
 
 0:22:08.400,0:22:14.400
-so this scalar scales this vector
-
-0:22:12.080,0:22:16.240
-probably by a factor that is slower than
+so this scalar scales this vector probably by a factor that is slower than
 
 0:22:14.400,0:22:18.559
-one and so if you're here
-
-0:22:16.240,0:22:19.520
-this one is going to take you down on
+one and so if you're here this one is going to take you down on
 
 0:22:18.559,0:22:21.520
-the point
-
-0:22:19.520,0:22:22.559
-that is connecting your head of the
+the point that is connecting your head of the
 
 0:22:21.520,0:22:27.840
-theta
-
-0:22:22.559,0:22:30.320
-towards zero right or this point here
+theta towards zero right or this point here
 
 0:22:27.840,0:22:31.600
-this is theta and then it takes you down
-
-0:22:30.320,0:22:33.520
-to zero okay
+this is theta and then it takes you down to zero okay
 
 0:22:31.600,0:22:35.760
-so if you don't have this term here and
-
-0:22:33.520,0:22:38.159
-you perform a few steps
+so if you don't have this term here and you perform a few steps
 
 0:22:35.760,0:22:39.039
-in this uh you know in this parameter
-
-0:22:38.159,0:22:41.440
-update
+in this uh you know in this parameter update
 
 0:22:39.039,0:22:43.120
-you're gonna get that the vector field
-
-0:22:41.440,0:22:45.200
-that you know results
+you're gonna get that the vector field that you know results
 
 0:22:43.120,0:22:47.200
-is something that attracts you towards
-
-0:22:45.200,0:22:49.039
-zero and that's why it's called weight
+is something that attracts you towards zero and that's why it's called weight
 
 0:22:47.200,0:22:54.720
-decay right so if you let it go
-
-0:22:49.039,0:22:57.200
-this stuff too it's gonna decay to zero
+decay right so if you let it go this stuff too it's gonna decay to zero
 
 0:22:54.720,0:22:57.919
-makes sense right so these are very cute
-
-0:22:57.200,0:23:01.919
-drawings
+makes sense right so these are very cute drawings
 
 0:22:57.919,0:23:04.159
-i think cool so
-
-0:23:01.919,0:23:06.159
-okay now you know about weight decay a
+i think cool so okay now you know about weight decay a
 
 0:23:04.159,0:23:08.960
-weight decay is also
-
-0:23:06.159,0:23:09.760
-we can think about this as adding a
+weight decay is also we can think about this as adding a
 
 0:23:08.960,0:23:11.679
-constraint
-
-0:23:09.760,0:23:13.760
-over the length of a vector so the
+constraint over the length of a vector so the
 
 0:23:11.679,0:23:17.039
-length of a vector is the you know
-
-0:23:13.760,0:23:20.159
-the the the euclidean norm
+length of a vector is the you know the the the euclidean norm
 
 0:23:17.039,0:23:21.600
-and so here we basically try to reduce
-
-0:23:20.159,0:23:24.480
-the length of this vector
+and so here we basically try to reduce the length of this vector
 
 0:23:21.600,0:23:26.080
-so weight decay is a way to reduce the
-
-0:23:24.480,0:23:30.480
-length
+so weight decay is a way to reduce the length
 
 0:23:26.080,0:23:33.840
-okay so l1
-
-0:23:30.480,0:23:37.200
-what is this l1 so l1 can also be
+okay so l1 what is this l1 so l1 can also be
 
 0:23:33.840,0:23:39.120
-used as a flag in the optimizer in torch
-
-0:23:37.200,0:23:41.520
-it's also called lasso which is least
+used as a flag in the optimizer in torch it's also called lasso which is least
 
 0:23:39.120,0:23:46.400
-absolute shrinking selector operator
-
-0:23:41.520,0:23:49.520
-wow yeah statisticians whatever
+absolute shrinking selector operator wow yeah statisticians whatever
 
 0:23:46.400,0:23:52.400
-it's also called a laplacian prior
-
-0:23:49.520,0:23:54.880
-because it comes from a laplacian
+it's also called a laplacian prior because it comes from a laplacian
 
 0:23:52.400,0:23:58.159
-probability distribution
-
-0:23:54.880,0:24:00.559
-and then also it can be called as a
+probability distribution and then also it can be called as a
 
 0:23:58.159,0:24:02.720
-sparsity prior why is that so this is
-
-0:24:00.559,0:24:04.000
-this is pretty interesting so here in
+sparsity prior why is that so this is this is pretty interesting so here in
 
 0:24:02.720,0:24:06.960
-the bottom part
-
-0:24:04.000,0:24:07.679
-you can see there is the dashed line uh
+the bottom part you can see there is the dashed line uh
 
 0:24:06.960,0:24:10.799
-represent
-
-0:24:07.679,0:24:12.640
-my gaussian prior right and then here i
+represent my gaussian prior right and then here i
 
 0:24:10.799,0:24:14.320
-just show you the laplace what's the
-
-0:24:12.640,0:24:15.600
-difference with laplace laplace is the
+just show you the laplace what's the difference with laplace laplace is the
 
 0:24:14.320,0:24:17.120
-same as gaussian so you have the
-
-0:24:15.600,0:24:19.039
-exponential
+same as gaussian so you have the exponential
 
 0:24:17.120,0:24:20.799
-but instead of having the quadratic
-
-0:24:19.039,0:24:24.159
-square norm you have the
+but instead of having the quadratic square norm you have the
 
 0:24:20.799,0:24:27.200
-uh one norm okay and so the
-
-0:24:24.159,0:24:29.440
-the whereas the the the
+uh one norm okay and so the the whereas the the the
 
 0:24:27.200,0:24:31.039
-you know whereas the quadratic is very
-
-0:24:29.440,0:24:34.480
-shallow like it's very flat
+you know whereas the quadratic is very shallow like it's very flat
 
 0:24:31.039,0:24:36.720
-towards zero the the l1 is like a
-
-0:24:34.480,0:24:37.840
-it's a spiky right so that's why if you
+towards zero the the l1 is like a it's a spiky right so that's why if you
 
 0:24:36.720,0:24:39.679
-get the exponential
-
-0:24:37.840,0:24:41.200
-you get like you get a spike this is
+get the exponential you get like you get a spike this is
 
 0:24:39.679,0:24:42.720
-minus the
-
-0:24:41.200,0:24:44.320
-the absolute value right so you get a
+minus the the absolute value right so you get a
 
 0:24:42.720,0:24:47.279
-spike for the laplacian
-
-0:24:44.320,0:24:48.080
-or you get like a smooth for this square
+spike for the laplacian or you get like a smooth for this square
 
 0:24:47.279,0:24:50.000
-because you have the
-
-0:24:48.080,0:24:51.919
-parabola right which is smooth on the
+because you have the parabola right which is smooth on the
 
 0:24:50.000,0:24:55.360
-bottom part
-
-0:24:51.919,0:24:56.159
-okay so the point is that there is much
+bottom part okay so the point is that there is much
 
 0:24:55.360,0:24:59.440
-more mass
-
-0:24:56.159,0:25:02.400
-now in this region than
+more mass now in this region than
 
 0:24:59.440,0:25:04.320
-it was before right so this is pretty
-
-0:25:02.400,0:25:05.840
-this is like a spike there is much more
+it was before right so this is pretty this is like a spike there is much more
 
 0:25:04.320,0:25:08.080
-probability that you get something
-
-0:25:05.840,0:25:09.919
-towards zero nevertheless maybe this is
+probability that you get something towards zero nevertheless maybe this is
 
 0:25:08.080,0:25:10.640
-not too clear as an explanation so i
-
-0:25:09.919,0:25:13.679
-show you
+not too clear as an explanation so i show you
 
 0:25:10.640,0:25:16.080
-the second diagram so in this case
-
-0:25:13.679,0:25:17.919
-my training loss instead of being the
+the second diagram so in this case my training loss instead of being the
 
 0:25:16.080,0:25:18.960
-all train loss i'm going to be summing
-
-0:25:17.919,0:25:22.159
-lambda
+all train loss i'm going to be summing lambda
 
 0:25:18.960,0:25:24.640
-the norm 1 of my theta okay
-
-0:25:22.159,0:25:27.440
-therefore if you compute the gradient of
+the norm 1 of my theta okay therefore if you compute the gradient of
 
 0:25:24.640,0:25:30.559
-the l1 what do you get
-
-0:25:27.440,0:25:33.120
-l one is going to be
+the l1 what do you get l one is going to be
 
 0:25:30.559,0:25:33.679
-just one right if you're positive or
-
-0:25:33.120,0:25:37.039
-it's going to be
+just one right if you're positive or it's going to be
 
 0:25:33.679,0:25:38.320
-minus one in the sine function yeah
-
-0:25:37.039,0:25:41.279
-exactly
+minus one in the sine function yeah exactly
 
 0:25:38.320,0:25:42.880
-so you get it it lambda sine function
-
-0:25:41.279,0:25:45.440
-and so let's now think
+so you get it it lambda sine function and so let's now think
 
 0:25:42.880,0:25:47.039
-the same way what happens uh if you
-
-0:25:45.440,0:25:49.520
-already finished training you don't have
+the same way what happens uh if you already finished training you don't have
 
 0:25:47.039,0:25:54.880
-this term over here and you just get
-
-0:25:49.520,0:25:54.880
-theta minus eta lambda sine theta
+this term over here and you just get theta minus eta lambda sine theta
 
 0:25:55.039,0:26:02.559
-so if you are on the
-
-0:25:59.039,0:26:04.880
-on the x axis you know the the y
+so if you are on the on the x axis you know the the y
 
 0:26:02.559,0:26:07.120
-is completely doesn't have is is already
-
-0:26:04.880,0:26:08.960
-zero so you're going to get some
+is completely doesn't have is is already zero so you're going to get some
 
 0:26:07.120,0:26:10.480
-arrows bringing you in right so if
-
-0:26:08.960,0:26:11.279
-you're on the axis you're gonna get
+arrows bringing you in right so if you're on the axis you're gonna get
 
 0:26:10.480,0:26:14.559
-exactly as
-
-0:26:11.279,0:26:16.320
-l2 you're gonna go towards zero
+exactly as l2 you're gonna go towards zero
 
 0:26:14.559,0:26:18.320
-now what happened if you're in first
-
-0:26:16.320,0:26:21.919
-quadrant
+now what happened if you're in first quadrant
 
 0:26:18.320,0:26:24.799
-so in the first quadrant you get a sign
-
-0:26:21.919,0:26:26.240
-in both direction right scale by the
+so in the first quadrant you get a sign in both direction right scale by the
 
 0:26:24.799,0:26:30.240
-scalar factor there
-
-0:26:26.240,0:26:34.960
-and so it's going to be pointing down
+scalar factor there and so it's going to be pointing down
 
 0:26:30.240,0:26:38.080
-deeply so here i show you
-
-0:26:34.960,0:26:39.360
-the uh the the gray arrows here
+deeply so here i show you the uh the the gray arrows here
 
 0:26:38.080,0:26:41.679
-they're showing you the l2
-
-0:26:39.360,0:26:45.039
-regularization which are taking you
+they're showing you the l2 regularization which are taking you
 
 0:26:41.679,0:26:47.600
-from the initial point towards zero
-
-0:26:45.039,0:26:48.559
-is proportional to this vector that is
+from the initial point towards zero is proportional to this vector that is
 
 0:26:47.600,0:26:52.159
-here
-
-0:26:48.559,0:26:54.000
-whereas the l1 which is going to be in a
+here whereas the l1 which is going to be in a
 
 0:26:52.159,0:26:57.520
-different color
-
-0:26:54.000,0:27:00.640
-and color green the l1 instead
+different color and color green the l1 instead
 
 0:26:57.520,0:27:03.679
-starting from here it takes you down
-
-0:27:00.640,0:27:04.240
-40 degrees here and then what happened
+starting from here it takes you down 40 degrees here and then what happened
 
 0:27:03.679,0:27:07.200
-here
-
-0:27:04.240,0:27:08.159
-well you just kill the y component right
+here well you just kill the y component right
 
 0:27:07.200,0:27:12.159
-and so
-
-0:27:08.159,0:27:15.200
-the l1 uh better feel
+and so the l1 uh better feel
 
 0:27:12.159,0:27:17.760
-it will quickly kill
-
-0:27:15.200,0:27:19.440
-components that are close to the axis
+it will quickly kill components that are close to the axis
 
 0:27:17.760,0:27:20.559
-right so if you're kind of close to the
-
-0:27:19.440,0:27:23.279
-axis this one bomb
+right so if you're kind of close to the axis this one bomb
 
 0:27:20.559,0:27:24.000
-takes you down to the axis in a view in
-
-0:27:23.279,0:27:25.679
-a
+takes you down to the axis in a view in a
 
 0:27:24.000,0:27:28.240
-few steps right and then if you still
-
-0:27:25.679,0:27:31.279
-apply this one you're going to go down
+few steps right and then if you still apply this one you're going to go down
 
 0:27:28.240,0:27:31.760
-the axis here right so this one allow
-
-0:27:31.279,0:27:33.600
-you to
+the axis here right so this one allow you to
 
 0:27:31.760,0:27:36.080
-quickly go down here and then if you
-
-0:27:33.600,0:27:38.240
-still apply you can shrink the length
+quickly go down here and then if you still apply you can shrink the length
 
 0:27:36.080,0:27:39.840
-but the point is that you're not looking
-
-0:27:38.240,0:27:43.279
-at the
+but the point is that you're not looking at the
 
 0:27:39.840,0:27:46.720
-length shrinking as in the
-
-0:27:43.279,0:27:50.320
-in in the l2 right so l2 was just
+length shrinking as in the in in the l2 right so l2 was just
 
 0:27:46.720,0:27:50.320
-shrinking the length of the vector
-
-0:27:51.760,0:27:54.880
-in the l1 instead you actually gonna
+shrinking the length of the vector in the l1 instead you actually gonna
 
 0:27:54.399,0:27:58.159
-kill
-
-0:27:54.880,0:28:01.600
-the components that are kind of cl
+kill the components that are kind of cl
 
 0:27:58.159,0:28:04.240
-near the axis okay so i think
-
-0:28:01.600,0:28:05.919
-you can clearly now understand how this
+near the axis okay so i think you can clearly now understand how this
 
 0:28:04.240,0:28:08.320
-works right so
-
-0:28:05.919,0:28:09.919
-uh and this actually is quite relevant
+works right so uh and this actually is quite relevant
 
 0:28:08.320,0:28:13.840
-for training
-
-0:28:09.919,0:28:15.520
-let's say you know our regularized
+for training let's say you know our regularized
 
 0:28:13.840,0:28:17.039
-regularized latent variable models
-
-0:28:15.520,0:28:17.840
-because you can you know you can think
+regularized latent variable models because you can you know you can think
 
 0:28:17.039,0:28:20.799
-about you know
-
-0:28:17.840,0:28:22.720
-a very quick way to regularize this this
+about you know a very quick way to regularize this this
 
 0:28:20.799,0:28:24.559
-latent virus is going to be just
-
-0:28:22.720,0:28:26.480
-killing some of these components such
+latent virus is going to be just killing some of these components such
 
 0:28:24.559,0:28:29.279
-that only the information is going to be
-
-0:28:26.480,0:28:32.799
-restricted in a few of these
+that only the information is going to be restricted in a few of these
 
 0:28:29.279,0:28:33.600
-values okay you like this stuff you like
-
-0:28:32.799,0:28:37.360
-the drawings
+values okay you like this stuff you like the drawings
 
 0:28:33.600,0:28:41.039
-no they're cute i think okay
-
-0:28:37.360,0:28:42.960
-uh okay drop out right so we we talk i
+no they're cute i think okay uh okay drop out right so we we talk i
 
 0:28:41.039,0:28:44.240
-think about dropout a few times but i
-
-0:28:42.960,0:28:48.000
-never show you
+think about dropout a few times but i never show you
 
 0:28:44.240,0:28:48.000
-the animation so
-
-0:28:48.640,0:28:52.240
-boom okay so dropout what does this
+the animation so boom okay so dropout what does this
 
 0:28:51.520,0:28:55.360
-dropout
-
-0:28:52.240,0:28:56.320
-do so i can show you my ninja skills in
+dropout do so i can show you my ninja skills in
 
 0:28:55.360,0:28:59.600
-powerpoint
-
-0:28:56.320,0:29:02.399
-and we have an infinite loop animation
+powerpoint and we have an infinite loop animation
 
 0:28:59.600,0:29:04.080
-so the input in the pink is provided to
-
-0:29:02.399,0:29:06.559
-the network
+so the input in the pink is provided to the network
 
 0:29:04.080,0:29:07.600
-uh and then you have that these hidden
-
-0:29:06.559,0:29:11.679
-layers hidden
+uh and then you have that these hidden layers hidden
 
 0:29:07.600,0:29:13.760
-neurons are sometimes set to zero
-
-0:29:11.679,0:29:15.279
-in this case is you have a dropping rate
+neurons are sometimes set to zero in this case is you have a dropping rate
 
 0:29:13.760,0:29:17.440
-of 0.5 so
-
-0:29:15.279,0:29:18.799
-half of the neurons are gonna be turned
+of 0.5 so half of the neurons are gonna be turned
 
 0:29:17.440,0:29:22.640
-to zero
-
-0:29:18.799,0:29:25.760
-on uh randomly during the training
+to zero on uh randomly during the training
 
 0:29:22.640,0:29:26.080
-and so what happens here is that there
-
-0:29:25.760,0:29:28.640
-is
+and so what happens here is that there is
 
 0:29:26.080,0:29:29.679
-no more path between the input and the
-
-0:29:28.640,0:29:33.120
-output
+no more path between the input and the output
 
 0:29:29.679,0:29:36.480
-that is uh you know there is no
-
-0:29:33.120,0:29:37.600
-learning of a singular path for input to
+that is uh you know there is no learning of a singular path for input to
 
 0:29:36.480,0:29:39.679
-output so
-
-0:29:37.600,0:29:41.039
-every time if you want to try to
+output so every time if you want to try to
 
 0:29:39.679,0:29:43.520
-memorize one
-
-0:29:41.039,0:29:45.679
-specific input you can't because every
+memorize one specific input you can't because every
 
 0:29:43.520,0:29:49.279
-time you get a different network
-
-0:29:45.679,0:29:53.520
-and so again this basically tell you
+time you get a different network and so again this basically tell you
 
 0:29:49.279,0:29:56.799
-uh oh scarf
-
-0:29:53.520,0:29:59.200
-okay so what happens here is that again
+uh oh scarf okay so what happens here is that again
 
 0:29:56.799,0:30:00.880
-before if we have like a fully connected
-
-0:29:59.200,0:30:03.600
-network like this
+before if we have like a fully connected network like this
 
 0:30:00.880,0:30:04.799
-you can think about oh i won't like to
-
-0:30:03.600,0:30:08.000
-memorize this
+you can think about oh i won't like to memorize this
 
 0:30:04.799,0:30:10.720
-neuron uh going this path and
-
-0:30:08.000,0:30:12.000
-then here right so you can try to
+neuron uh going this path and then here right so you can try to
 
 0:30:10.720,0:30:15.600
-memorize
-
-0:30:12.000,0:30:17.520
-uh some specific you know sample you get
+memorize uh some specific you know sample you get
 
 0:30:15.600,0:30:19.760
-you can memorize a specific sample in
-
-0:30:17.520,0:30:20.799
-this case but again if you have the net
+you can memorize a specific sample in this case but again if you have the net
 
 0:30:19.760,0:30:23.279
-product that is
-
-0:30:20.799,0:30:24.320
-taking off neurons sometimes sometimes
+product that is taking off neurons sometimes sometimes
 
 0:30:23.279,0:30:26.399
-this neuron here
-
-0:30:24.320,0:30:27.520
-on the left hand side doesn't exist
+this neuron here on the left hand side doesn't exist
 
 0:30:26.399,0:30:32.480
-right
-
-0:30:27.520,0:30:35.600
-and so if this one doesn't exist
+right and so if this one doesn't exist
 
 0:30:32.480,0:30:37.520
-then you cannot memorize a specific path
-
-0:30:35.600,0:30:39.360
-moreover you can think about this
+then you cannot memorize a specific path moreover you can think about this
 
 0:30:37.520,0:30:42.880
-dropout as
-
-0:30:39.360,0:30:46.799
-training a infinitely infinite
+dropout as training a infinitely infinite
 
 0:30:42.880,0:30:49.440
-number of networks that are different
-
-0:30:46.799,0:30:51.120
-right because every time you you drop
+number of networks that are different right because every time you you drop
 
 0:30:49.440,0:30:52.559
-some neurons you basically get a new
-
-0:30:51.120,0:30:55.520
-network
+some neurons you basically get a new network
 
 0:30:52.559,0:30:57.039
-uh they all share the initial kind of
-
-0:30:55.520,0:30:58.000
-starting position with the initial
+uh they all share the initial kind of starting position with the initial
 
 0:30:57.039,0:31:00.960
-weights
-
-0:30:58.000,0:31:03.200
-but then at the end whenever you use it
+weights but then at the end whenever you use it
 
 0:31:00.960,0:31:05.279
-i inference usually you turn off this
-
-0:31:03.200,0:31:07.360
-dropout
+i inference usually you turn off this dropout
 
 0:31:05.279,0:31:09.440
-and then you have to scale the the
-
-0:31:07.360,0:31:11.440
-weights right because otherwise you get
+and then you have to scale the the weights right because otherwise you get
 
 0:31:09.440,0:31:14.559
-a network that is you know
-
-0:31:11.440,0:31:16.159
-blowing you up this is because if you
+a network that is you know blowing you up this is because if you
 
 0:31:14.559,0:31:19.120
-have half of the neurons
-
-0:31:16.159,0:31:21.120
-off you know the other neurons are doing
+have half of the neurons off you know the other neurons are doing
 
 0:31:19.120,0:31:23.600
-the half of the neurons are doing
-
-0:31:21.120,0:31:25.440
-the whole job and if you turn everyone
+the half of the neurons are doing the whole job and if you turn everyone
 
 0:31:23.600,0:31:26.159
-on you're going to have twice as many
-
-0:31:25.440,0:31:30.320
-more
+on you're going to have twice as many more
 
 0:31:26.159,0:31:32.720
-uh values so so you can do two things
-
-0:31:30.320,0:31:34.480
-or when you actually use dropout you
+uh values so so you can do two things or when you actually use dropout you
 
 0:31:32.720,0:31:37.440
-crank up you multiply by
-
-0:31:34.480,0:31:40.399
-by let's say one over uh the dropping
+crank up you multiply by by let's say one over uh the dropping
 
 0:31:37.440,0:31:43.840
-rate so if you have dropping rate of 0.5
-
-0:31:40.399,0:31:46.320
-you can multiply by two such that
+rate so if you have dropping rate of 0.5 you can multiply by two such that
 
 0:31:43.840,0:31:46.880
-uh your neurons are twice as powerful
-
-0:31:46.320,0:31:50.559
-right
+uh your neurons are twice as powerful right
 
 0:31:46.880,0:31:53.200
-twice is more powerful uh than
-
-0:31:50.559,0:31:54.480
-one minus 0.5 right one divided one
+twice is more powerful uh than one minus 0.5 right one divided one
 
 0:31:53.200,0:31:57.840
-minus 4.5
-
-0:31:54.480,0:32:01.600
-so if you have a dropping rate of 0.1
+minus 4.5 so if you have a dropping rate of 0.1
 
 0:31:57.840,0:32:02.080
-uh means you have 90 of your neurons
-
-0:32:01.600,0:32:04.240
-there
+uh means you have 90 of your neurons there
 
 0:32:02.080,0:32:05.360
-and so your neuron should be one over
-
-0:32:04.240,0:32:08.640
-0.9
+and so your neuron should be one over 0.9
 
 0:32:05.360,0:32:11.200
-stronger right um
-
-0:32:08.640,0:32:12.080
-to be to have like the same kind of
+stronger right um to be to have like the same kind of
 
 0:32:11.200,0:32:15.440
-power right
-
-0:32:12.080,0:32:16.320
-in terms of values anyhow so you can
+power right in terms of values anyhow so you can
 
 0:32:15.440,0:32:19.840
-think about
-
-0:32:16.320,0:32:21.760
-uh drop dropout as having these multiple
+think about uh drop dropout as having these multiple
 
 0:32:19.840,0:32:23.919
-networks during training
-
-0:32:21.760,0:32:25.760
-but then whenever you use them at
+networks during training but then whenever you use them at
 
 0:32:23.919,0:32:28.080
-inference you turn off this dropout
-
-0:32:25.760,0:32:29.840
-module and you basically average out all
+inference you turn off this dropout module and you basically average out all
 
 0:32:28.080,0:32:31.360
-these performance of the singular
-
-0:32:29.840,0:32:33.919
-network and
+these performance of the singular network and
 
 0:32:31.360,0:32:34.799
-these allow you to get you know a much
-
-0:32:33.919,0:32:37.919
-better
+these allow you to get you know a much better
 
 0:32:34.799,0:32:38.640
-reduction of the noise uh which was
-
-0:32:37.919,0:32:41.440
-introduced
+reduction of the noise uh which was introduced
 
 0:32:38.640,0:32:42.320
-like that was arised by the the training
-
-0:32:41.440,0:32:43.679
-procedure
+like that was arised by the the training procedure
 
 0:32:42.320,0:32:45.760
-because again if you have you know
-
-0:32:43.679,0:32:47.039
-multiple experts you take the average of
+because again if you have you know multiple experts you take the average of
 
 0:32:45.760,0:32:48.000
-multiple experts you're going to get a
-
-0:32:47.039,0:32:50.399
-better
+multiple experts you're going to get a better
 
 0:32:48.000,0:32:53.120
-um answer because it's going to be
-
-0:32:50.399,0:32:56.640
-removing that kind of variability in the
+um answer because it's going to be removing that kind of variability in the
 
 0:32:53.120,0:32:59.600
-specific answer right
-
-0:32:56.640,0:33:01.519
-but perhaps we should keep in mind this
+specific answer right but perhaps we should keep in mind this
 
 0:32:59.600,0:33:03.360
-variability of the answers okay
-
-0:33:01.519,0:33:04.799
-because it can turn out quite
+variability of the answers okay because it can turn out quite
 
 0:33:03.360,0:33:08.559
-interesting
-
-0:33:04.799,0:33:10.880
-anyhow so dropout is amazing way to
+interesting anyhow so dropout is amazing way to
 
 0:33:08.559,0:33:11.840
-basically have an automatic model
-
-0:33:10.880,0:33:15.039
-averaging
+basically have an automatic model averaging
 
 0:33:11.840,0:33:18.240
-modeling assembling performance
-
-0:33:15.039,0:33:20.320
-cool cool cool uh is dropout a good
+modeling assembling performance cool cool cool uh is dropout a good
 
 0:33:18.240,0:33:21.200
-technique only for classification task
-
-0:33:20.320,0:33:24.320
-or also
+technique only for classification task or also
 
 0:33:21.200,0:33:27.760
-for other tasks as well
-
-0:33:24.320,0:33:29.679
-like metric learning and coding learning
+for other tasks as well like metric learning and coding learning
 
 0:33:27.760,0:33:32.640
-i would say that dropout gives you a
-
-0:33:29.679,0:33:32.640
-much more robust
+i would say that dropout gives you a much more robust
 
 0:33:33.360,0:33:38.000
-network a much more robust prediction
-
-0:33:36.000,0:33:40.159
-regardless of the task it doesn't it
+network a much more robust prediction regardless of the task it doesn't it
 
 0:33:38.000,0:33:44.000
-doesn't restrict to classification
-
-0:33:40.159,0:33:46.640
-you basically train uh multiple networks
+doesn't restrict to classification you basically train uh multiple networks
 
 0:33:44.000,0:33:49.600
-of reduced size right and then you
-
-0:33:46.640,0:33:50.880
-average out this reduced size network
+of reduced size right and then you average out this reduced size network
 
 0:33:49.600,0:33:52.799
-so although at the end you're going to
-
-0:33:50.880,0:33:54.320
-have a large network this large network
+so although at the end you're going to have a large network this large network
 
 0:33:52.799,0:33:58.080
-is just the
-
-0:33:54.320,0:34:01.360
-average of small networks performance
+is just the average of small networks performance
 
 0:33:58.080,0:34:03.600
-so and also if you think in this way
-
-0:34:01.360,0:34:04.880
-the small network can no longer overfit
+so and also if you think in this way the small network can no longer overfit
 
 0:34:03.600,0:34:06.960
-right because they are
-
-0:34:04.880,0:34:08.159
-no longer that over parameterized
+right because they are no longer that over parameterized
 
 0:34:06.960,0:34:10.879
-perhaps right
-
-0:34:08.159,0:34:12.480
-and so dropout allows you allows you to
+perhaps right and so dropout allows you allows you to
 
 0:34:10.879,0:34:15.359
-fight overfitting
-
-0:34:12.480,0:34:16.879
-with several by different you know
+fight overfitting with several by different you know
 
 0:34:15.359,0:34:20.960
-mechanisms
-
-0:34:16.879,0:34:23.440
-finally you can think uh if you apply
+mechanisms finally you can think uh if you apply
 
 0:34:20.960,0:34:25.040
-let's think about like uh applying
-
-0:34:23.440,0:34:28.159
-dropout to the input
+let's think about like uh applying dropout to the input
 
 0:34:25.040,0:34:30.240
-this is kind of uh sort of like
-
-0:34:28.159,0:34:31.440
-uh denoising out encoder no i mean you
+this is kind of uh sort of like uh denoising out encoder no i mean you
 
 0:34:30.240,0:34:33.919
-perturb the input
-
-0:34:31.440,0:34:35.839
-right in in this case and then you force
+perturb the input right in in this case and then you force
 
 0:34:33.919,0:34:39.119
-still the output to be the same
-
-0:34:35.839,0:34:40.399
-so if you think about that you are going
+still the output to be the same so if you think about that you are going
 
 0:34:39.119,0:34:42.720
-to be
-
-0:34:40.399,0:34:44.399
-insensitive to some small variations of
+to be insensitive to some small variations of
 
 0:34:42.720,0:34:47.200
-the input
-
-0:34:44.399,0:34:48.159
-uh which are gonna make your network
+the input uh which are gonna make your network
 
 0:34:47.200,0:34:50.879
-more robust right
-
-0:34:48.159,0:34:51.919
-or the same as i was as i wrote you in
+more robust right or the same as i was as i wrote you in
 
 0:34:50.879,0:34:54.560
-the midterm
-
-0:34:51.919,0:34:55.599
-uh how can you get a input that is you
+the midterm uh how can you get a input that is you
 
 0:34:54.560,0:34:58.320
-know annoying
-
-0:34:55.599,0:35:00.160
-you can find some noise in the input
+know annoying you can find some noise in the input
 
 0:34:58.320,0:35:03.359
-which is going to be
-
-0:35:00.160,0:35:05.680
-increasing your uh your loss right so
+which is going to be increasing your uh your loss right so
 
 0:35:03.359,0:35:07.599
-you can do some kind of adversarial
-
-0:35:05.680,0:35:09.599
-generation of noise and then you try to
+you can do some kind of adversarial generation of noise and then you try to
 
 0:35:07.599,0:35:13.040
-you train your network on these
-
-0:35:09.599,0:35:16.000
-um handcrafted samples which were
+you train your network on these um handcrafted samples which were
 
 0:35:13.040,0:35:17.760
-um corrected were like perturbed in
-
-0:35:16.000,0:35:20.560
-order to
+um corrected were like perturbed in order to
 
 0:35:17.760,0:35:21.680
-increase your your training loss right
-
-0:35:20.560,0:35:23.520
-okay so i gave you like
+increase your your training loss right okay so i gave you like
 
 0:35:21.680,0:35:27.119
-four different reasons why to use
-
-0:35:23.520,0:35:30.160
-dropout but then i don't use dropout
+four different reasons why to use dropout but then i don't use dropout
 
 0:35:27.119,0:35:31.520
-some not that often i actually do use it
-
-0:35:30.160,0:35:34.160
-for a different reason which i'm going
+some not that often i actually do use it for a different reason which i'm going
 
 0:35:31.520,0:35:38.240
-to be coming to that in a bit
-
-0:35:34.160,0:35:39.040
-um okay so early stopping so this is
+to be coming to that in a bit um okay so early stopping so this is
 
 0:35:38.240,0:35:42.160
-much
-
-0:35:39.040,0:35:42.560
-one of the most basic techniques uh if
+much one of the most basic techniques uh if
 
 0:35:42.160,0:35:45.359
-you're
-
-0:35:42.560,0:35:45.760
-training your model and your validation
+you're training your model and your validation
 
 0:35:45.359,0:35:48.880
-loss
-
-0:35:45.760,0:35:52.320
-starts starts increasing
+loss starts starts increasing
 
 0:35:48.880,0:35:54.320
-then you stop there okay
-
-0:35:52.320,0:35:55.920
-such that you get the lowest validation
+then you stop there okay such that you get the lowest validation
 
 0:35:54.320,0:35:57.440
-score and which
-
-0:35:55.920,0:35:59.520
-tells you okay you're not yet
+score and which tells you okay you're not yet
 
 0:35:57.440,0:36:02.400
-overfitting
-
-0:35:59.520,0:36:02.880
-uh and that basically doesn't let your
+overfitting uh and that basically doesn't let your
 
 0:36:02.400,0:36:04.880
-weights
-
-0:36:02.880,0:36:06.640
-grow too much right so instead of
+weights grow too much right so instead of
 
 0:36:04.880,0:36:08.640
-getting the l2 which is
-
-0:36:06.640,0:36:10.160
-trying not to get those weights to get
+getting the l2 which is trying not to get those weights to get
 
 0:36:08.640,0:36:13.680
-too lengthy too long
-
-0:36:10.160,0:36:16.960
-too long you just stop whenever they are
+too lengthy too long too long you just stop whenever they are
 
 0:36:13.680,0:36:16.960
-not yet that long right
-
-0:36:17.520,0:36:22.320
-uh fighting overfitting so these are
+not yet that long right uh fighting overfitting so these are
 
 0:36:20.079,0:36:25.520
-techniques that end up regularizing
-
-0:36:22.320,0:36:27.440
-our parameters our models but
+techniques that end up regularizing our parameters our models but
 
 0:36:25.520,0:36:29.760
-but they are not they are not
-
-0:36:27.440,0:36:32.480
-regularizers okay so this is important
+but they are not they are not regularizers okay so this is important
 
 0:36:29.760,0:36:33.920
-these are not regularizer although they
-
-0:36:32.480,0:36:37.040
-do regularize
+these are not regularizer although they do regularize
 
 0:36:33.920,0:36:39.280
-the uh network
-
-0:36:37.040,0:36:40.880
-okay as long as you keep this in mind we
+the uh network okay as long as you keep this in mind we
 
 0:36:39.280,0:36:43.680
-can also
-
-0:36:40.880,0:36:44.400
-see these uh other options but they are
+can also see these uh other options but they are
 
 0:36:43.680,0:36:46.880
-not
-
-0:36:44.400,0:36:47.599
-regularizing techniques right they do
+not regularizing techniques right they do
 
 0:36:46.880,0:36:50.240
-act as a
-
-0:36:47.599,0:36:51.200
-regularizer though first one batch
+act as a regularizer though first one batch
 
 0:36:50.240,0:36:53.440
-normalization
-
-0:36:51.200,0:36:54.320
-okay so we talked about this several
+normalization okay so we talked about this several
 
 0:36:53.440,0:36:56.880
-times
-
-0:36:54.320,0:36:57.599
-we don't know quite how it works too
+times we don't know quite how it works too
 
 0:36:56.880,0:37:00.320
-well
-
-0:36:57.599,0:37:01.440
-there is an article on a blog post that
+well there is an article on a blog post that
 
 0:37:00.320,0:37:03.599
-is explaining this i
-
-0:37:01.440,0:37:04.800
-we put the link in the optimization
+is explaining this i we put the link in the optimization
 
 0:37:03.599,0:37:06.880
-lecture
-
-0:37:04.800,0:37:08.079
-check it out i think it's like lecture
+lecture check it out i think it's like lecture
 
 0:37:06.880,0:37:11.359
-seven of some
-
-0:37:08.079,0:37:12.480
-blog post i really can't remember anyhow
+seven of some blog post i really can't remember anyhow
 
 0:37:11.359,0:37:14.720
-so the point is that
-
-0:37:12.480,0:37:15.520
-you reset the the mu then the mean and
+so the point is that you reset the the mu then the mean and
 
 0:37:14.720,0:37:18.079
-the
-
-0:37:15.520,0:37:20.079
-sigma the the sigma square the variance
+the sigma the the sigma square the variance
 
 0:37:18.079,0:37:24.720
-at each layer
-
-0:37:20.079,0:37:26.480
-and these allow you to
+at each layer and these allow you to
 
 0:37:24.720,0:37:29.520
-okay when you reset the mean and the
-
-0:37:26.480,0:37:31.760
-sigma this is based on the specific
+okay when you reset the mean and the sigma this is based on the specific
 
 0:37:29.520,0:37:33.599
-batch you have right because you compute
-
-0:37:31.760,0:37:36.480
-the mean and the sigma square
+batch you have right because you compute the mean and the sigma square
 
 0:37:33.599,0:37:38.800
-over the specific batch but then if you
-
-0:37:36.480,0:37:42.800
-actually sample uniformly from your
+over the specific batch but then if you actually sample uniformly from your
 
 0:37:38.800,0:37:44.960
-training data set you will never have
-
-0:37:42.800,0:37:46.480
-two identical batches right so every
+training data set you will never have two identical batches right so every
 
 0:37:44.960,0:37:50.000
-batch will have a different
-
-0:37:46.480,0:37:51.280
-configuration of samples therefore if
+batch will have a different configuration of samples therefore if
 
 0:37:50.000,0:37:53.520
-you compute the mean and
-
-0:37:51.280,0:37:54.320
-the standard deviation they will always
+you compute the mean and the standard deviation they will always
 
 0:37:53.520,0:37:56.880
-be different
-
-0:37:54.320,0:37:58.079
-right and therefore again i said five
+be different right and therefore again i said five
 
 0:37:56.880,0:37:59.680
-times therefore
-
-0:37:58.079,0:38:02.880
-you are going to be applying a different
+times therefore you are going to be applying a different
 
 0:37:59.680,0:38:04.880
-correction per batch
-
-0:38:02.880,0:38:06.960
-and the model will never see twice the
+correction per batch and the model will never see twice the
 
 0:38:04.880,0:38:10.560
-same input right because they are
-
-0:38:06.960,0:38:14.160
-altered based on where they happen to
+same input right because they are altered based on where they happen to
 
 0:38:10.560,0:38:18.000
-uh appear in your training uh procedure
-
-0:38:14.160,0:38:21.119
-so because you never
+uh appear in your training uh procedure so because you never
 
 0:38:18.000,0:38:23.680
-showed the same uh same input twice
-
-0:38:21.119,0:38:25.599
-and this is so cool uh i really like it
+showed the same uh same input twice and this is so cool uh i really like it
 
 0:38:23.680,0:38:27.040
-and that's all you need usually most of
-
-0:38:25.599,0:38:30.320
-the time to train your network
+and that's all you need usually most of the time to train your network
 
 0:38:27.040,0:38:32.960
-don't drop out
-
-0:38:30.320,0:38:33.680
-and this technique also speeds up your
+don't drop out and this technique also speeds up your
 
 0:38:32.960,0:38:36.480
-training like
-
-0:38:33.680,0:38:37.839
-crazy before batch norm was introduced
+training like crazy before batch norm was introduced
 
 0:38:36.480,0:38:40.880
-it was taking me i think
-
-0:38:37.839,0:38:44.000
-one week to train uh
+it was taking me i think one week to train uh
 
 0:38:40.880,0:38:46.160
-on imagenet i think at least
-
-0:38:44.000,0:38:47.280
-if it was if it wasn't a month it was
+on imagenet i think at least if it was if it wasn't a month it was
 
 0:38:46.160,0:38:50.400
-terrible i think
-
-0:38:47.280,0:38:51.760
-but again that's like eight years ago uh
+terrible i think but again that's like eight years ago uh
 
 0:38:50.400,0:38:54.160
-yeah it was terrible training on
-
-0:38:51.760,0:38:55.920
-imagenet uh with batch normalization i
+yeah it was terrible training on imagenet uh with batch normalization i
 
 0:38:54.160,0:38:59.280
-think you can train in one day so
-
-0:38:55.920,0:39:01.599
-that's ridiculous do you mean robust in
+think you can train in one day so that's ridiculous do you mean robust in
 
 0:38:59.280,0:39:04.000
-terms of adversarial learning as well
-
-0:39:01.599,0:39:05.920
-i don't understand why we don't see the
+terms of adversarial learning as well i don't understand why we don't see the
 
 0:39:04.000,0:39:09.359
-same sample twice
-
-0:39:05.920,0:39:12.960
-um i'm saying robust here
+same sample twice um i'm saying robust here
 
 0:39:09.359,0:39:15.920
-as in uh you're providing different
-
-0:39:12.960,0:39:16.720
-inputs every time because and so the
+as in uh you're providing different inputs every time because and so the
 
 0:39:15.920,0:39:19.119
-network gets
-
-0:39:16.720,0:39:20.160
-a better coverage what is the training
+network gets a better coverage what is the training
 
 0:39:19.119,0:39:22.560
-manifold
-
-0:39:20.160,0:39:24.800
-uh you don't see the same input twice
+manifold uh you don't see the same input twice
 
 0:39:22.560,0:39:27.200
-because the same input
-
-0:39:24.800,0:39:28.640
-based on how it appears in the in the
+because the same input based on how it appears in the in the
 
 0:39:27.200,0:39:32.000
-batch so if you appears
-
-0:39:28.640,0:39:35.119
-you have you know input 42
+batch so if you appears you have you know input 42
 
 0:39:32.000,0:39:35.760
-and this input 42 happens in a given
-
-0:39:35.119,0:39:37.359
-batch
+and this input 42 happens in a given batch
 
 0:39:35.760,0:39:39.119
-you subtract the mean of the batch and
-
-0:39:37.359,0:39:41.520
-divide by the standard deviation
+you subtract the mean of the batch and divide by the standard deviation
 
 0:39:39.119,0:39:43.280
-and you get the the new you know value
-
-0:39:41.520,0:39:45.839
-right within the network
+and you get the the new you know value right within the network
 
 0:39:43.280,0:39:46.880
-but then if that input 42 happens in a
-
-0:39:45.839,0:39:48.720
-different batch
+but then if that input 42 happens in a different batch
 
 0:39:46.880,0:39:51.040
-then the mean of the different batch is
-
-0:39:48.720,0:39:52.960
-gonna be a different mean
+then the mean of the different batch is gonna be a different mean
 
 0:39:51.040,0:39:54.480
-and therefore you're gonna get a
-
-0:39:52.960,0:39:56.640
-slightly different
+and therefore you're gonna get a slightly different
 
 0:39:54.480,0:39:58.560
-input every time so you never actually
-
-0:39:56.640,0:40:00.880
-observe the same input because they
+input every time so you never actually observe the same input because they
 
 0:39:58.560,0:40:02.800
-happen to be packed in a different batch
-
-0:40:00.880,0:40:04.720
-and therefore the statistics of that
+happen to be packed in a different batch and therefore the statistics of that
 
 0:40:02.800,0:40:06.720
-specific patch
-
-0:40:04.720,0:40:08.560
-will be just specific to that batch and
+specific patch will be just specific to that batch and
 
 0:40:06.720,0:40:10.160
-you know it's going to change every time
-
-0:40:08.560,0:40:12.350
-you're going to have a different batch
+you know it's going to change every time you're going to have a different batch
 
 0:40:10.160,0:40:13.440
-so same input get a different
-
-0:40:12.350,0:40:16.000
-[Music]
+so same input get a different [Music]
 
 0:40:13.440,0:40:18.000
-correction let's say this way if it
-
-0:40:16.000,0:40:20.000
-appears in a different batch so it
+correction let's say this way if it appears in a different batch so it
 
 0:40:18.000,0:40:21.839
-you never see the same input twice so
-
-0:40:20.000,0:40:24.960
-this technique is all i use usually for
+you never see the same input twice so this technique is all i use usually for
 
 0:40:21.839,0:40:27.440
-training my network um
-
-0:40:24.960,0:40:28.880
-and it works but again recently i've
+training my network um and it works but again recently i've
 
 0:40:27.440,0:40:31.440
-been using dropout for a different
-
-0:40:28.880,0:40:34.960
-reason so we're gonna be
+been using dropout for a different reason so we're gonna be
 
 0:40:31.440,0:40:38.160
-um okay we are gonna see this in
-
-0:40:34.960,0:40:41.200
-a few minutes uh
+um okay we are gonna see this in a few minutes uh
 
 0:40:38.160,0:40:43.040
-more data of course just providing more
-
-0:40:41.200,0:40:44.400
-data you're gonna find all over fitting
+more data of course just providing more data you're gonna find all over fitting
 
 0:40:43.040,0:40:47.520
-but then you know
-
-0:40:44.400,0:40:50.000
-ding ding ding okay
+but then you know ding ding ding okay
 
 0:40:47.520,0:40:52.079
-uh finally data augmentation so data
-
-0:40:50.000,0:40:54.319
-augmentation is also a very valid
+uh finally data augmentation so data augmentation is also a very valid
 
 0:40:52.079,0:40:55.680
-technique in order to you know prove
-
-0:40:54.319,0:40:58.319
-provide some kind of
+technique in order to you know prove provide some kind of
 
 0:40:55.680,0:41:00.960
-uh deformed version of the input if
-
-0:40:58.319,0:41:04.839
-you're talking about images we have
+uh deformed version of the input if you're talking about images we have
 
 0:41:00.960,0:41:07.440
-center crop color jitter different crops
-
-0:41:04.839,0:41:08.400
-transformations like i find random
+center crop color jitter different crops transformations like i find random
 
 0:41:07.440,0:41:11.200
-transformations
-
-0:41:08.400,0:41:13.200
-crops random rotation horizontal flip
+transformations crops random rotation horizontal flip
 
 0:41:11.200,0:41:16.560
-right if you see myself like that and
-
-0:41:13.200,0:41:20.000
-you flip my face i'm still me kind of
+right if you see myself like that and you flip my face i'm still me kind of
 
 0:41:16.560,0:41:23.440
-right so uh if it's upside down well
-
-0:41:20.000,0:41:24.800
-maybe not quite uh nevertheless you can
+right so uh if it's upside down well maybe not quite uh nevertheless you can
 
 0:41:23.440,0:41:28.160
-see that if you
-
-0:41:24.800,0:41:30.000
-provide some alterations that are
+see that if you provide some alterations that are
 
 0:41:28.160,0:41:31.680
-perturbation that you are if you like to
-
-0:41:30.000,0:41:33.680
-be insensitive against
+perturbation that you are if you like to be insensitive against
 
 0:41:31.680,0:41:35.359
-then you can improve your performance of
-
-0:41:33.680,0:41:37.119
-the network which is going to be
+then you can improve your performance of the network which is going to be
 
 0:41:35.359,0:41:38.160
-learning how to be insensitive to this
-
-0:41:37.119,0:41:41.760
-kind of
+learning how to be insensitive to this kind of
 
 0:41:38.160,0:41:43.599
-uh you know variations
-
-0:41:41.760,0:41:45.520
-okay okay okay so quickly quickly
+uh you know variations okay okay okay so quickly quickly
 
 0:41:43.599,0:41:47.359
-quickly oh okay transfer learning
-
-0:41:45.520,0:41:48.880
-we already know about transfer learning
+quickly oh okay transfer learning we already know about transfer learning
 
 0:41:47.359,0:41:50.160
-i think but again so you get your
-
-0:41:48.880,0:41:51.760
-network you already trained on a
+i think but again so you get your network you already trained on a
 
 0:41:50.160,0:41:53.599
-specific task you
-
-0:41:51.760,0:41:54.960
-just leave the first classifier there
+specific task you just leave the first classifier there
 
 0:41:53.599,0:41:57.839
-you move everything
-
-0:41:54.960,0:41:58.720
-you plug a new a new classifier or
+you move everything you plug a new a new classifier or
 
 0:41:57.839,0:42:01.280
-whatever
-
-0:41:58.720,0:42:03.520
-and then if you have you know a few data
+whatever and then if you have you know a few data
 
 0:42:01.280,0:42:04.400
-with a similar kind of training
-
-0:42:03.520,0:42:06.800
-distribution
+with a similar kind of training distribution
 
 0:42:04.400,0:42:07.680
-you just do transfer learning which is
-
-0:42:06.800,0:42:11.040
-again
+you just do transfer learning which is again
 
 0:42:07.680,0:42:14.960
-training just the final classifier
-
-0:42:11.040,0:42:18.079
-uh if you have lots of data
+training just the final classifier uh if you have lots of data
 
 0:42:14.960,0:42:18.720
-you should fine-tune because you would
-
-0:42:18.079,0:42:21.839
-like to
+you should fine-tune because you would like to
 
 0:42:18.720,0:42:22.400
-also improve this uh the performance of
-
-0:42:21.839,0:42:24.960
-the
+also improve this uh the performance of the
 
 0:42:22.400,0:42:26.960
-like you would like also to tweak the uh
-
-0:42:24.960,0:42:29.839
-feature extractor the blue
+like you would like also to tweak the uh feature extractor the blue
 
 0:42:26.960,0:42:31.520
-the blue layers and the colors are
-
-0:42:29.839,0:42:32.640
-flipped here damn the hidden layer
+the blue layers and the colors are flipped here damn the hidden layer
 
 0:42:31.520,0:42:35.680
-should have been green and the
-
-0:42:32.640,0:42:38.079
-output blue okay
+should have been green and the output blue okay
 
 0:42:35.680,0:42:39.359
-few data and different from training or
-
-0:42:38.079,0:42:41.440
-you want to do early
+few data and different from training or you want to do early
 
 0:42:39.359,0:42:42.839
-uh transfer learning which means you
-
-0:42:41.440,0:42:46.240
-know you start
+uh transfer learning which means you know you start
 
 0:42:42.839,0:42:49.760
-changing um also you know
-
-0:42:46.240,0:42:50.240
-a little bit of the the of the other
+changing um also you know a little bit of the the of the other
 
 0:42:49.760,0:42:53.280
-layers
-
-0:42:50.240,0:42:54.800
-as well not all of them and then
+layers as well not all of them and then
 
 0:42:53.280,0:42:56.800
-yeah you want to remove a few more
-
-0:42:54.800,0:42:58.319
-layers actually yeah oh
+yeah you want to remove a few more layers actually yeah oh
 
 0:42:56.800,0:43:00.079
-my bad so you would like to remove a few
-
-0:42:58.319,0:43:01.839
-of those uh
+my bad so you would like to remove a few of those uh
 
 0:43:00.079,0:43:05.359
-final hidden layers because they are
-
-0:43:01.839,0:43:08.000
-kind of already specialized
+final hidden layers because they are kind of already specialized
 
 0:43:05.359,0:43:09.520
-so you want to retrain the base features
-
-0:43:08.000,0:43:11.040
-extractor here
+so you want to retrain the base features extractor here
 
 0:43:09.520,0:43:12.640
-and if you have lots of data which are
-
-0:43:11.040,0:43:16.960
-different from the training the
+and if you have lots of data which are different from the training the
 
 0:43:12.640,0:43:20.240
-distribution just train okay um
-
-0:43:16.960,0:43:21.200
-okay also you can use different
+distribution just train okay um okay also you can use different
 
 0:43:20.240,0:43:24.000
-learnings
-
-0:43:21.200,0:43:24.560
-learning rate for different layers right
+learnings learning rate for different layers right
 
 0:43:24.000,0:43:28.000
-to
-
-0:43:24.560,0:43:31.200
-improve performance so maybe you um
+to improve performance so maybe you um
 
 0:43:28.000,0:43:34.400
-you'd like to change um yeah
-
-0:43:31.200,0:43:36.400
-so you you can you can see that usually
+you'd like to change um yeah so you you can you can see that usually
 
 0:43:34.400,0:43:38.160
-these final layers are the ones that are
-
-0:43:36.400,0:43:39.440
-changing uh quicker because they are
+these final layers are the ones that are changing uh quicker because they are
 
 0:43:38.160,0:43:42.480
-close to the
-
-0:43:39.440,0:43:44.640
-uh to the loss but then again if you use
+close to the uh to the loss but then again if you use
 
 0:43:42.480,0:43:45.839
-uh batch norm all these layers are kind
-
-0:43:44.640,0:43:48.400
-of training the same
+uh batch norm all these layers are kind of training the same
 
 0:43:45.839,0:43:50.079
-speed otherwise again you can see
-
-0:43:48.400,0:43:51.119
-whether you want to change learning rate
+speed otherwise again you can see whether you want to change learning rate
 
 0:43:50.079,0:43:54.560
-maybe change
-
-0:43:51.119,0:43:56.480
-these guys slower or not did you say is
+maybe change these guys slower or not did you say is
 
 0:43:54.560,0:43:56.960
-the difference between transfer learning
-
-0:43:56.480,0:43:59.520
-and fine
+the difference between transfer learning and fine
 
 0:43:56.960,0:44:01.920
-tuning uh transfer learning i just train
-
-0:43:59.520,0:44:03.520
-define a classifier
+tuning uh transfer learning i just train define a classifier
 
 0:44:01.920,0:44:05.200
-because i don't have if you have few
-
-0:44:03.520,0:44:06.960
-data you don't have
+because i don't have if you have few data you don't have
 
 0:44:05.200,0:44:09.040
-enough you know you don't want to
-
-0:44:06.960,0:44:11.040
-overfit so you
+enough you know you don't want to overfit so you
 
 0:44:09.040,0:44:12.640
-if you have a few data you want to just
-
-0:44:11.040,0:44:14.560
-reuse the whole
+if you have a few data you want to just reuse the whole
 
 0:44:12.640,0:44:16.880
-network from the previous task and you
-
-0:44:14.560,0:44:18.480
-just train the final classifier
+network from the previous task and you just train the final classifier
 
 0:44:16.880,0:44:21.119
-if you have lots of data then you can
-
-0:44:18.480,0:44:23.760
-actually even um
+if you have lots of data then you can actually even um
 
 0:44:21.119,0:44:25.040
-try to have like some changes you can
-
-0:44:23.760,0:44:27.760
-also you know you can start
+try to have like some changes you can also you know you can start
 
 0:44:25.040,0:44:29.040
-you have a uh lower learning rate you
-
-0:44:27.760,0:44:32.560
-also change for
+you have a uh lower learning rate you also change for
 
 0:44:29.040,0:44:34.640
-this feature extractor if they are
-
-0:44:32.560,0:44:36.480
-similarly transfer learning you freeze
+this feature extractor if they are similarly transfer learning you freeze
 
 0:44:34.640,0:44:38.640
-the the base
-
-0:44:36.480,0:44:39.520
-network yeah i would say the transfer
+the the base network yeah i would say the transfer
 
 0:44:38.640,0:44:41.760
-learning you just
-
-0:44:39.520,0:44:43.200
-freeze the the blue guy and you just
+learning you just freeze the the blue guy and you just
 
 0:44:41.760,0:44:46.640
-train the orange
-
-0:44:43.200,0:44:48.800
-in uh fine tuning you actually tune
+train the orange in uh fine tuning you actually tune
 
 0:44:46.640,0:44:51.520
-all the other parameters as well maybe
-
-0:44:48.800,0:44:54.720
-with smaller learning rate
+all the other parameters as well maybe with smaller learning rate
 
 0:44:51.520,0:44:55.760
-this is the number 12 notebook here i'm
-
-0:44:54.720,0:44:59.040
-classifying the
+this is the number 12 notebook here i'm classifying the
 
 0:44:55.760,0:45:01.200
-sentiment of these reviews on the imdb
-
-0:44:59.040,0:45:02.839
-data set all right and so i'd like to
+sentiment of these reviews on the imdb data set all right and so i'd like to
 
 0:45:01.200,0:45:04.079
-compare different regularization
-
-0:45:02.839,0:45:07.200
-techniques
+compare different regularization techniques
 
 0:45:04.079,0:45:09.920
-so i'm just keeping everything because
-
-0:45:07.200,0:45:12.240
-i just like to show you the final result
+so i'm just keeping everything because i just like to show you the final result
 
 0:45:09.920,0:45:14.880
-let me see where is the optimizer
-
-0:45:12.240,0:45:16.000
-so you can toggle different things at
+let me see where is the optimizer so you can toggle different things at
 
 0:45:14.880,0:45:18.319
-the beginning we have
-
-0:45:16.000,0:45:20.800
-no weight decay nothing right so we
+the beginning we have no weight decay nothing right so we
 
 0:45:18.319,0:45:22.560
-train with this regularizer
-
-0:45:20.800,0:45:25.040
-let's check what is the model so the
+train with this regularizer let's check what is the model so the
 
 0:45:22.560,0:45:28.480
-model is just a feed forward neural net
-
-0:45:25.040,0:45:30.880
-which is fifo on neural net we have some
+model is just a feed forward neural net which is fifo on neural net we have some
 
 0:45:28.480,0:45:32.480
-embeddings a linear a linear
-
-0:45:30.880,0:45:34.480
-and then my forward is going to be
+embeddings a linear a linear and then my forward is going to be
 
 0:45:32.480,0:45:37.680
-getting my embeddings sending to the
-
-0:45:34.480,0:45:39.920
-forward the fully connected relo
+getting my embeddings sending to the forward the fully connected relo
 
 0:45:37.680,0:45:41.839
-uh and then you know you get the output
-
-0:45:39.920,0:45:43.599
-from this so second fully connected and
+uh and then you know you get the output from this so second fully connected and
 
 0:45:41.839,0:45:44.800
-i'm outputting a sigmoid because i'm
-
-0:45:43.599,0:45:47.680
-just doing
+i'm outputting a sigmoid because i'm just doing
 
 0:45:44.800,0:45:49.200
-um i think a two-class classification
-
-0:45:47.680,0:45:50.800
-problem so we'd like to figure out if
+um i think a two-class classification problem so we'd like to figure out if
 
 0:45:49.200,0:45:52.880
-it's a positive review or a negative
-
-0:45:50.800,0:45:56.560
-review
+it's a positive review or a negative review
 
 0:45:52.880,0:46:00.400
-um and so this is the initial training
-
-0:45:56.560,0:46:03.440
-and we got you know
+um and so this is the initial training and we got you know
 
 0:46:00.400,0:46:05.680
-the validation curve climbs up as crazy
-
-0:46:03.440,0:46:07.119
-whereas the training curve goes down to
+the validation curve climbs up as crazy whereas the training curve goes down to
 
 0:46:05.680,0:46:09.599
-zero
-
-0:46:07.119,0:46:11.680
-and so here you can see uh the
+zero and so here you can see uh the
 
 0:46:09.599,0:46:15.520
-validation accuracy which goes up to
-
-0:46:11.680,0:46:16.560
-64 more or less so and here we just
+validation accuracy which goes up to 64 more or less so and here we just
 
 0:46:15.520,0:46:19.599
-store
-
-0:46:16.560,0:46:21.200
-the weight of the network
+store the weight of the network
 
 0:46:19.599,0:46:23.440
-for when there is no kind of
-
-0:46:21.200,0:46:25.839
-regularization okay
+for when there is no kind of regularization okay
 
 0:46:23.440,0:46:28.160
-then first thing i'd like to do is going
-
-0:46:25.839,0:46:31.599
-to be trying to do the
+then first thing i'd like to do is going to be trying to do the
 
 0:46:28.160,0:46:34.800
-weight l1 the l1 regularization
-
-0:46:31.599,0:46:34.800
-so let's see how to do that
+weight l1 the l1 regularization so let's see how to do that
 
 0:46:35.520,0:46:42.079
-so l1 regularization
-
-0:46:38.960,0:46:44.800
-okay toggle this one to do
+so l1 regularization okay toggle this one to do
 
 0:46:42.079,0:46:45.520
-l1 regularization so here i'm extracting
-
-0:46:44.800,0:46:48.400
-the
+l1 regularization so here i'm extracting the
 
 0:46:45.520,0:46:49.040
-model parameters and then i'm going to
-
-0:46:48.400,0:46:52.160
-be adding
+model parameters and then i'm going to be adding
 
 0:46:49.040,0:46:53.839
-some term to the
-
-0:46:52.160,0:46:56.400
-to the loss okay so the loss is going to
+some term to the to the loss okay so the loss is going to
 
 0:46:53.839,0:46:59.680
-be some part of this
-
-0:46:56.400,0:47:04.079
-uh like i'm gonna sum the the one norm
+be some part of this uh like i'm gonna sum the the one norm
 
 0:46:59.680,0:47:06.319
-of the fc1 to the loss okay
-
-0:47:04.079,0:47:10.079
-because there is no other way to do this
+of the fc1 to the loss okay because there is no other way to do this
 
 0:47:06.319,0:47:14.880
-in a pi torch for the moment
-
-0:47:10.079,0:47:18.160
-okay so let me re-initialize the network
+in a pi torch for the moment okay so let me re-initialize the network
 
 0:47:14.880,0:47:21.510
-so i start here
-
-0:47:18.160,0:47:21.510
-[Music]
+so i start here [Music]
 
 0:47:22.079,0:47:25.760
-i get
-
-0:47:23.870,0:47:29.040
-[Music]
+i get [Music]
 
 0:47:25.760,0:47:31.680
-this one and then i start training here
-
-0:47:29.040,0:47:32.400
-so this guy is training uh how many
+this one and then i start training here so this guy is training uh how many
 
 0:47:31.680,0:47:36.000
-iterations
-
-0:47:32.400,0:47:38.960
-let's check 10 epochs okay one two three
+iterations let's check 10 epochs okay one two three
 
 0:47:36.000,0:47:39.760
-four five six all right so before we
-
-0:47:38.960,0:47:41.920
-were checking
+four five six all right so before we were checking
 
 0:47:39.760,0:47:46.160
-we can go down here we had the
-
-0:47:41.920,0:47:49.680
-validation accuracy was around 64.
+we can go down here we had the validation accuracy was around 64.
 
 0:47:46.160,0:47:52.000
-and now we have validation accuracy
-
-0:47:49.680,0:47:53.119
-went to 66 right so we actually have
+and now we have validation accuracy went to 66 right so we actually have
 
 0:47:52.000,0:47:56.240
-improved
-
-0:47:53.119,0:47:59.280
-the performance by getting these guys
+improved the performance by getting these guys
 
 0:47:56.240,0:48:03.119
-uh to be
-
-0:47:59.280,0:48:07.280
-oh it's getting down down
+uh to be oh it's getting down down
 
 0:48:03.119,0:48:09.920
-oh back up 67 looks good 68
-
-0:48:07.280,0:48:11.119
-okay it's finished so i can show you in
+oh back up 67 looks good 68 okay it's finished so i can show you in
 
 0:48:09.920,0:48:14.079
-this case what happened with
-
-0:48:11.119,0:48:15.359
-l1 oh it's not yet finished okay it's
+this case what happened with l1 oh it's not yet finished okay it's
 
 0:48:14.079,0:48:17.760
-taking forever
-
-0:48:15.359,0:48:19.119
-okay while this is training okay okay
+taking forever okay while this is training okay okay
 
 0:48:17.760,0:48:20.640
-i'm gonna show you the the output of
-
-0:48:19.119,0:48:24.079
-this guy and then i'm gonna be
+i'm gonna show you the the output of this guy and then i'm gonna be
 
 0:48:20.640,0:48:26.640
-showing just briefly the uh
-
-0:48:24.079,0:48:27.599
-second usage of the dropout should we
+showing just briefly the uh second usage of the dropout should we
 
 0:48:26.640,0:48:31.680
-stop this guy
-
-0:48:27.599,0:48:34.000
-69 so you can see now here we are at 69
+stop this guy 69 so you can see now here we are at 69
 
 0:48:31.680,0:48:36.640
-in validation at 30 right
-
-0:48:34.000,0:48:37.920
-okay cool and here you can see both the
+in validation at 30 right okay cool and here you can see both the
 
 0:48:36.640,0:48:40.319
-training and the validation
-
-0:48:37.920,0:48:43.040
-they are both losses they go down and
+training and the validation they are both losses they go down and
 
 0:48:40.319,0:48:46.480
-then here i show you the validation
-
-0:48:43.040,0:48:48.640
-which went up to 67 and 68 okay
+then here i show you the validation which went up to 67 and 68 okay
 
 0:48:46.480,0:48:50.319
-and so here i just show i gonna be
-
-0:48:48.640,0:48:53.280
-storing these weights
+and so here i just show i gonna be storing these weights
 
 0:48:50.319,0:48:53.280
-for the l1
-
-0:48:53.520,0:48:57.760
-so here i just store this l1 over here
+for the l1 so here i just store this l1 over here
 
 0:48:56.800,0:49:01.040
-okay
-
-0:48:57.760,0:49:04.720
-i'm gonna go back here
+okay i'm gonna go back here
 
 0:49:01.040,0:49:08.400
-uh we are gonna be undoing
-
-0:49:04.720,0:49:11.599
-this one right because we don't
+uh we are gonna be undoing this one right because we don't
 
 0:49:08.400,0:49:13.040
-want uh l1 we're gonna be choosing now a
-
-0:49:11.599,0:49:16.480
-l2 regularizer
+want uh l1 we're gonna be choosing now a l2 regularizer
 
 0:49:13.040,0:49:18.880
-right so i can toggle this one
-
-0:49:16.480,0:49:19.920
-and toggle this on alright so now we
+right so i can toggle this one and toggle this on alright so now we
 
 0:49:18.880,0:49:23.920
-have
-
-0:49:19.920,0:49:23.920
-a weight decay of this value
+have a weight decay of this value
 
 0:49:24.240,0:49:30.720
-model i execute this one
-
-0:49:27.599,0:49:33.280
-and i execute these guys all right
+model i execute this one and i execute these guys all right
 
 0:49:30.720,0:49:33.760
-so while the l2 is training i'll just
-
-0:49:33.280,0:49:36.000
-show you
+so while the l2 is training i'll just show you
 
 0:49:33.760,0:49:37.520
-a quick uh overview about bayesian
-
-0:49:36.000,0:49:40.720
-neural nets
+a quick uh overview about bayesian neural nets
 
 0:49:37.520,0:49:43.520
-so estimating a predictive distribution
-
-0:49:40.720,0:49:44.480
-so why to care about uncertainty many
+so estimating a predictive distribution so why to care about uncertainty many
 
 0:49:43.520,0:49:46.400
-reasons
-
-0:49:44.480,0:49:48.240
-uh if you have a cat declassifier and
+reasons uh if you have a cat declassifier and
 
 0:49:46.400,0:49:49.839
-you show a hippopotamus
-
-0:49:48.240,0:49:52.000
-the network is going to tell you oh this
+you show a hippopotamus the network is going to tell you oh this
 
 0:49:49.839,0:49:53.920
-is a dog no
-
-0:49:52.000,0:49:56.079
-it doesn't know i cannot tell you oh
+is a dog no it doesn't know i cannot tell you oh
 
 0:49:53.920,0:49:57.920
-this is not of any of the above right
-
-0:49:56.079,0:49:59.440
-you can think about oh let's make a
+this is not of any of the above right you can think about oh let's make a
 
 0:49:57.920,0:50:01.920
-third category
-
-0:49:59.440,0:50:03.040
-but then how can you show you how can
+third category but then how can you show you how can
 
 0:50:01.920,0:50:05.599
-you show the network
-
-0:50:03.040,0:50:06.960
-not a cat and not a dog uh it doesn't
+you show the network not a cat and not a dog uh it doesn't
 
 0:50:05.599,0:50:10.000
-quite work like that
-
-0:50:06.960,0:50:10.640
-so you can't really find i mean cat is
+quite work like that so you can't really find i mean cat is
 
 0:50:10.000,0:50:13.599
-an object
-
-0:50:10.640,0:50:15.200
-dog is a object not a cat or not a dog
+an object dog is a object not a cat or not a dog
 
 0:50:13.599,0:50:16.319
-is not an object so you can't really
-
-0:50:15.200,0:50:20.240
-train your network
+is not an object so you can't really train your network
 
 0:50:16.319,0:50:22.319
-to say everything else um
-
-0:50:20.240,0:50:24.079
-reliability on steering control let's
+to say everything else um reliability on steering control let's
 
 0:50:22.319,0:50:25.200
-say you're training your car to steer
-
-0:50:24.079,0:50:27.839
-right and left
+say you're training your car to steer right and left
 
 0:50:25.200,0:50:28.960
-and then your car say steer to the right
-
-0:50:27.839,0:50:32.160
-okay hold on
+and then your car say steer to the right okay hold on
 
 0:50:28.960,0:50:35.839
-how certain are you about this
-
-0:50:32.160,0:50:38.319
-action is it is it gonna kill me right
+how certain are you about this action is it is it gonna kill me right
 
 0:50:35.839,0:50:39.359
-uh physics simulator prediction if you
-
-0:50:38.319,0:50:41.359
-know about
+uh physics simulator prediction if you know about
 
 0:50:39.359,0:50:42.400
-physics or physicists they always want
-
-0:50:41.359,0:50:44.960
-to know
+physics or physicists they always want to know
 
 0:50:42.400,0:50:46.400
-how certain you are about your value
-
-0:50:44.960,0:50:48.000
-right so measurements
+how certain you are about your value right so measurements
 
 0:50:46.400,0:50:49.920
-uh in physics always have you know you
-
-0:50:48.000,0:50:51.040
-have the value plus minus the
+uh in physics always have you know you have the value plus minus the
 
 0:50:49.920,0:50:52.800
-uncertainty
-
-0:50:51.040,0:50:54.800
-so you know your network should be able
+uncertainty so you know your network should be able
 
 0:50:52.800,0:50:58.000
-to tell you as well how certain
-
-0:50:54.800,0:50:59.920
-uh some number or what is the
+to tell you as well how certain uh some number or what is the
 
 0:50:58.000,0:51:02.480
-in the confidence interval for a
-
-0:50:59.920,0:51:04.559
-specific prediction
+in the confidence interval for a specific prediction
 
 0:51:02.480,0:51:06.720
-moreover you can think to use this for
-
-0:51:04.559,0:51:09.440
-minimizing action randomness when
+moreover you can think to use this for minimizing action randomness when
 
 0:51:06.720,0:51:10.400
-connected to a reward what the heck does
-
-0:51:09.440,0:51:13.040
-this mean
+connected to a reward what the heck does this mean
 
 0:51:10.400,0:51:13.839
-so if there is some uncertainty with
-
-0:51:13.040,0:51:15.839
-some
+so if there is some uncertainty with some
 
 0:51:13.839,0:51:17.839
-associated some to some actions you can
-
-0:51:15.839,0:51:20.880
-actually exploit that
+associated some to some actions you can actually exploit that
 
 0:51:17.839,0:51:24.480
-and train your model to minimize that
-
-0:51:20.880,0:51:25.440
-uncertainty and this is so cool because
+and train your model to minimize that uncertainty and this is so cool because
 
 0:51:24.480,0:51:27.520
-we
-
-0:51:25.440,0:51:29.599
-use something similar in my in our
+we use something similar in my in our
 
 0:51:27.520,0:51:32.480
-project right
-
-0:51:29.599,0:51:34.640
-so dropout i told you about before uh so
+project right so dropout i told you about before uh so
 
 0:51:32.480,0:51:36.559
-how this neural network dropout works
-
-0:51:34.640,0:51:39.040
-i just gonna be quickly going through
+how this neural network dropout works i just gonna be quickly going through
 
 0:51:36.559,0:51:42.800
-this i multiply my input and my
-
-0:51:39.040,0:51:47.440
-hidden layer with these random
+this i multiply my input and my hidden layer with these random
 
 0:51:42.800,0:51:47.440
-zero one masks okay and
-
-0:51:47.520,0:51:51.359
-you can have the activation function to
+zero one masks okay and you can have the activation function to
 
 0:51:49.520,0:51:52.720
-be some non-linearity and then here you
-
-0:51:51.359,0:51:55.280
-have this bernoulli
+be some non-linearity and then here you have this bernoulli
 
 0:51:52.720,0:51:56.000
-with the probability of one minus the
-
-0:51:55.280,0:51:58.640
-dropping out
+with the probability of one minus the dropping out
 
 0:51:56.000,0:52:00.319
-rate so this the dropping out rate and
-
-0:51:58.640,0:52:03.440
-then you want to scale
+rate so this the dropping out rate and then you want to scale
 
 0:52:00.319,0:52:04.800
-the delta such that you know you resize
-
-0:52:03.440,0:52:07.920
-the amplitude
+the delta such that you know you resize the amplitude
 
 0:52:04.800,0:52:09.200
-of those weights the training has just
-
-0:52:07.920,0:52:11.280
-finished so i'm gonna be
+of those weights the training has just finished so i'm gonna be
 
 0:52:09.200,0:52:12.240
-switching that i'm sorry for the context
-
-0:52:11.280,0:52:16.240
-switching
+switching that i'm sorry for the context switching
 
 0:52:12.240,0:52:18.000
-oh okay good call all right uh
-
-0:52:16.240,0:52:19.520
-calculate the variance yes someone was
+oh okay good call all right uh calculate the variance yes someone was
 
 0:52:18.000,0:52:22.000
-saying calculate the variance i know i'm
-
-0:52:19.520,0:52:25.200
-switching i'm sorry it's the last lesson
+saying calculate the variance i know i'm switching i'm sorry it's the last lesson
 
 0:52:22.000,0:52:28.559
-i'm making a mess okay so this is train
-
-0:52:25.200,0:52:32.160
-and we got 64 uh
+i'm making a mess okay so this is train and we got 64 uh
 
 0:52:28.559,0:52:32.720
-which is so these are also going both
-
-0:52:32.160,0:52:35.520
-down
+which is so these are also going both down
 
 0:52:32.720,0:52:38.000
-this is both the the l2 regularization
-
-0:52:35.520,0:52:39.200
-and before we were getting to 68 with
+this is both the the l2 regularization and before we were getting to 68 with
 
 0:52:38.000,0:52:42.640
-the l1
-
-0:52:39.200,0:52:44.319
-here we get something else maybe
+the l1 here we get something else maybe
 
 0:52:42.640,0:52:46.079
-oh you can see it's still climbing right
-
-0:52:44.319,0:52:48.160
-so maybe i just stopped too early
+oh you can see it's still climbing right so maybe i just stopped too early
 
 0:52:46.079,0:52:50.960
-so if you keep training you're gonna get
-
-0:52:48.160,0:52:53.520
-a better performance
+so if you keep training you're gonna get a better performance
 
 0:52:50.960,0:52:54.640
-it's it's monotonic non-decreasing right
-
-0:52:53.520,0:52:57.359
-so i think
+it's it's monotonic non-decreasing right so i think
 
 0:52:54.640,0:52:59.760
-kind of so i think you can squeeze more
-
-0:52:57.359,0:53:03.599
-and here i'm gonna be saving
+kind of so i think you can squeeze more and here i'm gonna be saving
 
 0:52:59.760,0:53:05.839
-these weights in these l2 weights
-
-0:53:03.599,0:53:06.640
-okay so i saved that and the last one
+these weights in these l2 weights okay so i saved that and the last one
 
 0:53:05.839,0:53:08.720
-then i sh
-
-0:53:06.640,0:53:09.680
-then it's gonna be exactly the dropout
+then i sh then it's gonna be exactly the dropout
 
 0:53:08.720,0:53:13.200
-right so
-
-0:53:09.680,0:53:16.800
-go back here uh we turn off
+right so go back here uh we turn off
 
 0:53:13.200,0:53:19.599
-the l2 so we
-
-0:53:16.800,0:53:20.640
-turn off this guy we turn back the
+the l2 so we turn off this guy we turn back the
 
 0:53:19.599,0:53:22.400
-simple one
-
-0:53:20.640,0:53:24.720
-but then we have to go back in this
+simple one but then we have to go back in this
 
 0:53:22.400,0:53:29.599
-network we would like to turn
-
-0:53:24.720,0:53:34.800
-on the dropout rate
+network we would like to turn on the dropout rate
 
 0:53:29.599,0:53:34.800
-true there you go boom boom boom
-
-0:53:34.880,0:53:40.839
-okay is it training yeah it's training
+true there you go boom boom boom okay is it training yeah it's training
 
 0:53:38.480,0:53:43.040
-all right cool cool cool back to the
-
-0:53:40.839,0:53:46.720
-presentation
+all right cool cool cool back to the presentation
 
 0:53:43.040,0:53:50.559
-i i know i'm sorry i'm going over time
-
-0:53:46.720,0:53:50.559
-what a bad teacher
+i i know i'm sorry i'm going over time what a bad teacher
 
 0:53:51.200,0:53:54.800
-okay so this is actually what we are
-
-0:53:52.720,0:53:57.440
-doing the dropout part right
+okay so this is actually what we are doing the dropout part right
 
 0:53:54.800,0:53:58.000
-okay cool cool all right so this is my
-
-0:53:57.440,0:53:59.760
-dropout
+okay cool cool all right so this is my dropout
 
 0:53:58.000,0:54:01.520
-and i mean i mean i am basically
-
-0:53:59.760,0:54:03.599
-multiplying these inputs and hidden
+and i mean i mean i am basically multiplying these inputs and hidden
 
 0:54:01.520,0:54:06.400
-layers with masks
-
-0:54:03.599,0:54:07.040
-here you just have like a network which
+layers with masks here you just have like a network which
 
 0:54:06.400,0:54:08.880
-is trying
-
-0:54:07.040,0:54:10.160
-is trying to train that you know uh
+is trying is trying to train that you know uh
 
 0:54:08.880,0:54:12.880
-prediction uh
-
-0:54:10.160,0:54:14.720
-that is weakly prediction is like a co2
+prediction uh that is weakly prediction is like a co2
 
 0:54:12.880,0:54:17.440
-concentration level
-
-0:54:14.720,0:54:18.240
-uh if you use a gaussian kernel with a
+concentration level uh if you use a gaussian kernel with a
 
 0:54:17.440,0:54:20.800
-square
-
-0:54:18.240,0:54:21.920
-exponential kernel you can get you know
+square exponential kernel you can get you know
 
 0:54:20.800,0:54:24.240
-after the
-
-0:54:21.920,0:54:26.720
-dashed line the network say that you
+after the dashed line the network say that you
 
 0:54:24.240,0:54:28.640
-know the the model says i have no clue
-
-0:54:26.720,0:54:30.480
-so i give you my prediction which is
+know the the model says i have no clue so i give you my prediction which is
 
 0:54:28.640,0:54:31.280
-zero but then this is my confidence
-
-0:54:30.480,0:54:32.880
-level
+zero but then this is my confidence level
 
 0:54:31.280,0:54:34.400
-can we do something similar with neural
-
-0:54:32.880,0:54:37.839
-nets yes we can
+can we do something similar with neural nets yes we can
 
 0:54:34.400,0:54:38.799
-so this is a uh uncertainty estimation
-
-0:54:37.839,0:54:40.880
-we're using the
+so this is a uh uncertainty estimation we're using the
 
 0:54:38.799,0:54:42.319
-reload non-linearity in the network and
-
-0:54:40.880,0:54:45.599
-this is instead
+reload non-linearity in the network and this is instead
 
 0:54:42.319,0:54:48.400
-using tanh which is is actually nothing
-
-0:54:45.599,0:54:49.920
-um if i'd like to do a binary
+using tanh which is is actually nothing um if i'd like to do a binary
 
 0:54:48.400,0:54:52.880
-classification
-
-0:54:49.920,0:54:53.359
-in the first case are gonna be my logic
+classification in the first case are gonna be my logic
 
 0:54:52.880,0:54:56.720
-uh
-
-0:54:53.359,0:54:57.280
-on this section -3 to 2.5 is the
+uh on this section -3 to 2.5 is the
 
 0:54:56.720,0:55:01.119
-training
-
-0:54:57.280,0:55:04.319
-training training interval and then
+training training training interval and then
 
 0:55:01.119,0:55:06.000
-if i show you if i show my network uh if
-
-0:55:04.319,0:55:09.040
-i ask oh what is the prediction for
+if i show you if i show my network uh if i ask oh what is the prediction for
 
 0:55:06.000,0:55:09.839
-x hat no x star if i don't use any
-
-0:55:09.040,0:55:12.640
-uncertainty
+x hat no x star if i don't use any uncertainty
 
 0:55:09.839,0:55:13.040
-estimation you're gonna get a very high
-
-0:55:12.640,0:55:15.280
-value
+estimation you're gonna get a very high value
 
 0:55:13.040,0:55:16.319
-right which is corresponding to oh this
-
-0:55:15.280,0:55:18.880
-is uh
+right which is corresponding to oh this is uh
 
 0:55:16.319,0:55:20.000
-one so this is my one class if i just
-
-0:55:18.880,0:55:22.880
-use the the
+one so this is my one class if i just use the the
 
 0:55:20.000,0:55:24.799
-white big thick line instead if you use
-
-0:55:22.880,0:55:27.599
-this uncertainty estimation
+white big thick line instead if you use this uncertainty estimation
 
 0:55:24.799,0:55:28.960
-you get this network to get those logics
-
-0:55:27.599,0:55:32.319
-here with it kind of
+you get this network to get those logics here with it kind of
 
 0:55:28.960,0:55:34.799
-you know blur
-
-0:55:32.319,0:55:36.880
-foggy shadow and therefore if you apply
+you know blur foggy shadow and therefore if you apply
 
 0:55:34.799,0:55:39.119
-the sigmoid you get basically
-
-0:55:36.880,0:55:40.559
-that to flip down from zero to one right
+the sigmoid you get basically that to flip down from zero to one right
 
 0:55:39.119,0:55:43.680
-so you
-
-0:55:40.559,0:55:46.839
-no longer say it's one you can say
+so you no longer say it's one you can say
 
 0:55:43.680,0:55:48.000
-it's one with some specific probability
-
-0:55:46.839,0:55:51.839
-right
+it's one with some specific probability right
 
 0:55:48.000,0:55:53.440
-um and here i'm showing you a network
-
-0:55:51.839,0:55:54.079
-that is trying to it was trained on
+um and here i'm showing you a network that is trying to it was trained on
 
 0:55:53.440,0:55:56.079
-ammunite
-
-0:55:54.079,0:55:57.680
-and then you provide a one that is you
+ammunite and then you provide a one that is you
 
 0:55:56.079,0:55:59.839
-know tilting
-
-0:55:57.680,0:56:00.880
-and then you can see that it begins with
+know tilting and then you can see that it begins with
 
 0:55:59.839,0:56:03.760
-having a high
-
-0:56:00.880,0:56:04.160
-value for the logits for the purple for
+having a high value for the logits for the purple for
 
 0:56:03.760,0:56:06.400
-the
-
-0:56:04.160,0:56:08.400
-for the one and then as you move across
+the for the one and then as you move across
 
 0:56:06.400,0:56:08.960
-it becomes like a five and then becomes
-
-0:56:08.400,0:56:11.839
-a seven
+it becomes like a five and then becomes a seven
 
 0:56:08.960,0:56:12.240
-because it looks like some part of the
-
-0:56:11.839,0:56:15.280
-one
+because it looks like some part of the one
 
 0:56:12.240,0:56:16.240
-like some part of the seven right and
-
-0:56:15.280,0:56:19.680
-these are the output
+like some part of the seven right and these are the output
 
 0:56:16.240,0:56:22.799
-after the uh soft arc max so you see
-
-0:56:19.680,0:56:25.920
-that uh you know after you tilt they get
+after the uh soft arc max so you see that uh you know after you tilt they get
 
 0:56:22.799,0:56:26.640
-very blur and very spread around so how
-
-0:56:25.920,0:56:28.799
-can we
+very blur and very spread around so how can we
 
 0:56:26.640,0:56:30.079
-have something like that and this is the
-
-0:56:28.799,0:56:33.119
-other notebook
+have something like that and this is the other notebook
 
 0:56:30.079,0:56:36.799
-so we are done here with the
-
-0:56:33.119,0:56:38.640
-regularization let me give you the final
+so we are done here with the regularization let me give you the final
 
 0:56:36.799,0:56:40.799
-thing so here we can see with the
-
-0:56:38.640,0:56:41.599
-dropout you always have the validation
+thing so here we can see with the dropout you always have the validation
 
 0:56:40.799,0:56:43.680
-and train
-
-0:56:41.599,0:56:44.880
-curves they are one on the other and
+and train curves they are one on the other and
 
 0:56:43.680,0:56:47.760
-then this was the
-
-0:56:44.880,0:56:48.480
-l2 regularization i can execute this
+then this was the l2 regularization i can execute this
 
 0:56:47.760,0:56:50.480
-other one
-
-0:56:48.480,0:56:52.400
-which shows you also that this is keep
+other one which shows you also that this is keep
 
 0:56:50.480,0:56:53.760
-increasing right so although the model
-
-0:56:52.400,0:56:56.000
-is over parameterized we are not
+increasing right so although the model is over parameterized we are not
 
 0:56:53.760,0:56:58.640
-overfitting which was the case
-
-0:56:56.000,0:56:59.599
-uh at the beginning finally here let's
+overfitting which was the case uh at the beginning finally here let's
 
 0:56:58.640,0:57:02.640
-store
-
-0:56:59.599,0:57:05.839
-these weights in the dropout version
+store these weights in the dropout version
 
 0:57:02.640,0:57:08.480
-okay so i save all of them
-
-0:57:05.839,0:57:09.440
-uh and so i can start showing you a few
+okay so i save all of them uh and so i can start showing you a few
 
 0:57:08.480,0:57:12.480
-things
-
-0:57:09.440,0:57:15.760
-um for example this one
+things um for example this one
 
 0:57:12.480,0:57:19.040
-let's see if it works boom
-
-0:57:15.760,0:57:22.400
-so here you can see that the red
+let's see if it works boom so here you can see that the red
 
 0:57:19.040,0:57:23.119
-are the l1 and the red one are basically
-
-0:57:22.400,0:57:26.000
-all
+are the l1 and the red one are basically all
 
 0:57:23.119,0:57:26.400
-in the center bam and all the other reds
-
-0:57:26.000,0:57:28.799
-are
+in the center bam and all the other reds are
 
 0:57:26.400,0:57:30.240
-to zero right so n1 i just show you the
-
-0:57:28.799,0:57:31.680
-histogram of the weights
+to zero right so n1 i just show you the histogram of the weights
 
 0:57:30.240,0:57:34.079
-when i train the network with the l1
-
-0:57:31.680,0:57:35.520
-regularizer you get all of these are
+when i train the network with the l1 regularizer you get all of these are
 
 0:57:34.079,0:57:38.720
-here
-
-0:57:35.520,0:57:40.319
-in the purple case you actually have
+here in the purple case you actually have
 
 0:57:38.720,0:57:42.240
-it looks like it's higher i'm not
-
-0:57:40.319,0:57:46.720
-entirely sure
+it looks like it's higher i'm not entirely sure
 
 0:57:42.240,0:57:49.680
-why you have a higher peak at zero in l2
-
-0:57:46.720,0:57:51.680
-but then the purple one have some values
+why you have a higher peak at zero in l2 but then the purple one have some values
 
 0:57:49.680,0:57:53.440
-as well here in the tails
-
-0:57:51.680,0:57:55.040
-whereas if there is no regularization
+as well here in the tails whereas if there is no regularization
 
 0:57:53.440,0:57:56.480
-you get something that is you know
-
-0:57:55.040,0:58:00.319
-resembling a much
+you get something that is you know resembling a much
 
 0:57:56.480,0:58:02.640
-spread a much spread
-
-0:58:00.319,0:58:03.839
-gaussian right so you get values that
+spread a much spread gaussian right so you get values that
 
 0:58:02.640,0:58:07.040
-are much much more
-
-0:58:03.839,0:58:09.760
-much larger okay instead the l1
+are much much more much larger okay instead the l1
 
 0:58:07.040,0:58:10.240
-should be all towards you know very very
-
-0:58:09.760,0:58:12.640
-short
+should be all towards you know very very short
 
 0:58:10.240,0:58:14.400
-again i'm not sure why this purple is
-
-0:58:12.640,0:58:16.319
-taller than the the red here i think
+again i'm not sure why this purple is taller than the the red here i think
 
 0:58:14.400,0:58:19.520
-it's an issue
-
-0:58:16.319,0:58:23.440
-so this i i show you the the the weights
+it's an issue so this i i show you the the the weights
 
 0:58:19.520,0:58:26.799
-we can show lastly last individual one
-
-0:58:23.440,0:58:29.760
-l1 so l1 all are here
+we can show lastly last individual one l1 so l1 all are here
 
 0:58:26.799,0:58:29.760
-and this is
-
-0:58:29.839,0:58:33.359
-these are instead the one with nothing
+and this is these are instead the one with nothing
 
 0:58:31.440,0:58:36.880
-right so these are
-
-0:58:33.359,0:58:39.680
-the one without the regularization
+right so these are the one without the regularization
 
 0:58:36.880,0:58:41.839
-and these are the one with the l1
-
-0:58:39.680,0:58:44.559
-regularization
+and these are the one with the l1 regularization
 
 0:58:41.839,0:58:46.540
-we can also have more bins to have i bet
-
-0:58:44.559,0:58:48.839
-a better understanding of what's going
+we can also have more bins to have i bet a better understanding of what's going
 
 0:58:46.540,0:58:50.000
-[Music]
-
-0:58:48.839,0:58:53.440
-on
+[Music] on
 
 0:58:50.000,0:58:56.559
-okay see boom fantastic right
-
-0:58:53.440,0:59:02.079
-i can show you also the weights
+okay see boom fantastic right i can show you also the weights
 
 0:58:56.559,0:59:02.079
-l2 l2
-
-0:59:02.240,0:59:06.240
-l2 and l1 oh you can tell no what's the
+l2 l2 l2 and l1 oh you can tell no what's the
 
 0:59:04.559,0:59:07.839
-difference
-
-0:59:06.240,0:59:11.280
-but again there are a hundred thousand a
+difference but again there are a hundred thousand a
 
 0:59:07.839,0:59:11.280
-hundred thousand uh
-
-0:59:12.160,0:59:17.760
-not entirely sure but in the point the
+hundred thousand uh not entirely sure but in the point the
 
 0:59:15.839,0:59:19.040
-point is that in the l1 in the l1 you
-
-0:59:17.760,0:59:22.319
-have so many more weights
+point is that in the l1 in the l1 you have so many more weights
 
 0:59:19.040,0:59:25.760
-a cluster at the zero
-
-0:59:22.319,0:59:28.480
-but there are a few larger weights
+a cluster at the zero but there are a few larger weights
 
 0:59:25.760,0:59:28.960
-in the l2 you have all the weights are
-
-0:59:28.480,0:59:31.119
-pretty
+in the l2 you have all the weights are pretty
 
 0:59:28.960,0:59:32.559
-small can you see right there is no
-
-0:59:31.119,0:59:35.440
-large weights
+small can you see right there is no large weights
 
 0:59:32.559,0:59:36.160
-so l1 doesn't shrink the weight l1 just
-
-0:59:35.440,0:59:38.240
-get them
+so l1 doesn't shrink the weight l1 just get them
 
 0:59:36.160,0:59:39.760
-towards zero okay that's why you had
-
-0:59:38.240,0:59:43.520
-this big guy here
+towards zero okay that's why you had this big guy here
 
 0:59:39.760,0:59:48.720
-boom okay
-
-0:59:43.520,0:59:52.160
-um finally i know i'm over time
+boom okay um finally i know i'm over time
 
 0:59:48.720,0:59:57.520
-the last notebook which is the
-
-0:59:52.160,1:00:01.200
-one that is computing the uncertainty
+the last notebook which is the one that is computing the uncertainty
 
 0:59:57.520,1:00:05.839
-uh through user usage of the
-
-1:00:01.200,1:00:05.839
-dropout right so kernel execute all
+uh through user usage of the dropout right so kernel execute all
 
 1:00:06.160,1:00:09.760
-uh where is it run all
-
-1:00:08.400,1:00:11.760
-[Music]
+uh where is it run all [Music]
 
 1:00:09.760,1:00:13.200
-so what are we doing here how do we
-
-1:00:11.760,1:00:15.680
-compute the uncertainty
+so what are we doing here how do we compute the uncertainty
 
 1:00:13.200,1:00:17.520
-in the previous uh in the in the in the
-
-1:00:15.680,1:00:20.319
-previous
+in the previous uh in the in the in the previous
 
 1:00:17.520,1:00:22.000
-uh in the previous lesson right in the
-
-1:00:20.319,1:00:24.400
-slides i just showed you
+uh in the previous lesson right in the slides i just showed you
 
 1:00:22.000,1:00:25.440
-so here we have some points i try to fit
-
-1:00:24.400,1:00:28.160
-them
+so here we have some points i try to fit them
 
 1:00:25.440,1:00:28.960
-with my network and you get something
-
-1:00:28.160,1:00:32.079
-like this
+with my network and you get something like this
 
 1:00:28.960,1:00:32.640
-can you tell me what network i used what
-
-1:00:32.079,1:00:35.280
-is the
+can you tell me what network i used what is the
 
 1:00:32.640,1:00:37.599
-uh where is the chat can you tell what
-
-1:00:35.280,1:00:37.599
-is the
+uh where is the chat can you tell what is the
 
 1:00:37.680,1:00:41.680
-non-linearity i used you should know
-
-1:00:40.079,1:00:44.799
-right
+non-linearity i used you should know right
 
 1:00:41.680,1:00:44.799
-you don't answer answer
-
-1:00:44.880,1:00:52.400
-okay um and so here yeah
+you don't answer answer okay um and so here yeah
 
 1:00:49.040,1:00:55.119
-and then here i show you how
-
-1:00:52.400,1:00:55.440
-this uncertainty looks okay so what is
+and then here i show you how this uncertainty looks okay so what is
 
 1:00:55.119,1:00:58.240
-this
-
-1:00:55.440,1:01:01.040
-this i'm using the uh the network with
+this this i'm using the uh the network with
 
 1:00:58.240,1:01:03.280
-the dropout and then i actually don't
-
-1:01:01.040,1:01:04.079
-use the evaluation mode i just use the
+the dropout and then i actually don't use the evaluation mode i just use the
 
 1:01:03.280,1:01:06.079
-training mode
-
-1:01:04.079,1:01:07.440
-such that the dropout is still on and
+training mode such that the dropout is still on and
 
 1:01:06.079,1:01:10.319
-then i compute the variance
-
-1:01:07.440,1:01:11.680
-of the predictions of the network by
+then i compute the variance of the predictions of the network by
 
 1:01:10.319,1:01:14.480
-sending multiple times
-
-1:01:11.680,1:01:15.040
-the data through okay so here you have
+sending multiple times the data through okay so here you have
 
 1:01:14.480,1:01:18.000
-range
-
-1:01:15.040,1:01:18.559
-in hundred you know i just provide 100
+range in hundred you know i just provide 100
 
 1:01:18.000,1:01:22.160
-times
-
-1:01:18.559,1:01:24.160
-my data inside the network okay so this
+times my data inside the network okay so this
 
 1:01:22.160,1:01:26.319
-is a network with the relu
-
-1:01:24.160,1:01:27.920
-let me show you how a network with it
+is a network with the relu let me show you how a network with it
 
 1:01:26.319,1:01:31.040
-hyperbolic dungeon
-
-1:01:27.920,1:01:34.000
-works so oh yeah
+hyperbolic dungeon works so oh yeah
 
 1:01:31.040,1:01:36.559
-let me kill this one so here i create
-
-1:01:34.000,1:01:36.559
-the network
+let me kill this one so here i create the network
 
 1:01:37.680,1:01:41.040
-and this is the network train with the
-
-1:01:39.760,1:01:44.240
-hyperbolic tangent
+and this is the network train with the hyperbolic tangent
 
 1:01:41.040,1:01:44.720
-such it's much nicer right and then i
-
-1:01:44.240,1:01:47.040
-show you
+such it's much nicer right and then i show you
 
 1:01:44.720,1:01:48.079
-the network is in train mode right but
-
-1:01:47.040,1:01:50.559
-then i i feed
+the network is in train mode right but then i i feed
 
 1:01:48.079,1:01:51.359
-several times i feed 100 times my data
-
-1:01:50.559,1:01:54.240
-points
+several times i feed 100 times my data points
 
 1:01:51.359,1:01:56.000
-inside and then i evaluate the mean you
-
-1:01:54.240,1:01:58.880
-can see now
+inside and then i evaluate the mean you can see now
 
 1:01:56.000,1:02:01.119
-that the network mean the network
-
-1:01:58.880,1:02:04.240
-outputs a uncertainty which is constant
+that the network mean the network outputs a uncertainty which is constant
 
 1:02:01.119,1:02:06.960
-even if you move outside this
-
-1:02:04.240,1:02:07.920
-interval which was the region where the
+even if you move outside this interval which was the region where the
 
 1:02:06.960,1:02:10.000
-training data
-
-1:02:07.920,1:02:12.000
-were coming so you can see now that
+training data were coming so you can see now that
 
 1:02:10.000,1:02:12.880
-these uncertainty estimation are a bit
-
-1:02:12.000,1:02:15.359
-you know funky
+these uncertainty estimation are a bit you know funky
 
 1:02:12.880,1:02:17.119
-as in different activation functions
-
-1:02:15.359,1:02:19.039
-give you different kind of estimation
+as in different activation functions give you different kind of estimation
 
 1:02:17.119,1:02:22.559
-they are not even calibrated
-
-1:02:19.039,1:02:24.079
-nevertheless you have the uncertainty
+they are not even calibrated nevertheless you have the uncertainty
 
 1:02:22.559,1:02:26.240
-close to the data points
-
-1:02:24.079,1:02:28.240
-it's very very very tiny right so you
+close to the data points it's very very very tiny right so you
 
 1:02:26.240,1:02:31.039
-can tell how far you are
-
-1:02:28.240,1:02:32.000
-from the training region and we use this
+can tell how far you are from the training region and we use this
 
 1:02:31.039,1:02:35.520
-this trick here
-
-1:02:32.000,1:02:38.079
-this this this part in order to
+this trick here this this this part in order to
 
 1:02:35.520,1:02:39.039
-so again this variance here is like it's
-
-1:02:38.079,1:02:40.880
-a it's a
+so again this variance here is like it's a it's a
 
 1:02:39.039,1:02:42.799
-differentiable function and so you can
-
-1:02:40.880,1:02:44.960
-run gradient descent
+differentiable function and so you can run gradient descent
 
 1:02:42.799,1:02:46.000
-right in this in order to minimize the
-
-1:02:44.960,1:02:48.240
-variance
+right in this in order to minimize the variance
 
 1:02:46.000,1:02:49.920
-and this would allow you to move towards
-
-1:02:48.240,1:02:53.520
-the region
+and this would allow you to move towards the region
 
 1:02:49.920,1:02:53.839
-where the uh where the uh data points
-
-1:02:53.520,1:02:56.000
-where
+where the uh where the uh data points where
 
 1:02:53.839,1:02:58.079
-basically the the training region this
-
-1:02:56.000,1:03:00.960
-this is what we use for the
+basically the the training region this this is what we use for the
 
 1:02:58.079,1:03:02.319
-our policy right in our uh driving
-
-1:03:00.960,1:03:05.440
-scenario
+our policy right in our uh driving scenario
 
 1:03:02.319,1:03:08.079
-so oh
-
-1:03:05.440,1:03:09.920
-that was it right uh we reached the end
+so oh that was it right uh we reached the end
 
 1:03:08.079,1:03:13.440
-of the class the end of the semester
-
-1:03:09.920,1:03:16.720
-uh it was such a great honor to be
+of the class the end of the semester uh it was such a great honor to be
 
 1:03:13.440,1:03:18.880
-your teacher for this semester i
-
-1:03:16.720,1:03:20.160
-screw up a little bit maybe halfway
+your teacher for this semester i screw up a little bit maybe halfway
 
 1:03:18.880,1:03:22.720
-through
-
-1:03:20.160,1:03:23.520
-thank you for you know helping me
+through thank you for you know helping me
 
 1:03:22.720,1:03:27.280
-getting back
-
-1:03:23.520,1:03:28.799
-uh on my feet uh
+getting back uh on my feet uh
 
 1:03:27.280,1:03:30.400
-if you need anything right really
-
-1:03:28.799,1:03:34.319
-anything just let me know i
+if you need anything right really anything just let me know i
 
 1:03:30.400,1:03:34.720
-i'm always open to discuss and help out
-
-1:03:34.319,1:03:37.280
-and
+i'm always open to discuss and help out and
 
 1:03:34.720,1:03:38.319
-explain and again as i told you before
-
-1:03:37.280,1:03:40.559
-we can even think
+explain and again as i told you before we can even think
 
 1:03:38.319,1:03:42.240
-to have one more extra lesson in a month
-
-1:03:40.559,1:03:45.440
-time if you want
+to have one more extra lesson in a month time if you want
 
 1:03:42.240,1:03:49.520
-the same way zoom and whatever uh
-
-1:03:45.440,1:03:52.000
-we about the energy based models um
+the same way zoom and whatever uh we about the energy based models um
 
 1:03:49.520,1:03:53.440
-again if you have any question about all
-
-1:03:52.000,1:03:56.319
-any of the lessons you can
+again if you have any question about all any of the lessons you can
 
 1:03:53.440,1:03:57.520
-write on youtube in the comments below i
-
-1:03:56.319,1:04:00.000
-will answer
+write on youtube in the comments below i will answer
 
 1:03:57.520,1:04:01.200
-if you have like specific uh if you are
-
-1:04:00.000,1:04:03.680
-interested in making
+if you have like specific uh if you are interested in making
 
 1:04:01.200,1:04:04.640
-drawings and visualization uh you can
-
-1:04:03.680,1:04:06.240
-always
+drawings and visualization uh you can always
 
 1:04:04.640,1:04:08.079
-actually should talk to me because i'm
-
-1:04:06.240,1:04:10.400
-actually uh
+actually should talk to me because i'm actually uh
 
 1:04:08.079,1:04:12.079
-creating a group for visualizing machine
-
-1:04:10.400,1:04:16.720
-learning stuff
+creating a group for visualizing machine learning stuff
 
 1:04:12.079,1:04:18.960
-um and we have the website we have
-
-1:04:16.720,1:04:21.119
-plenty of things to do english has to be
+um and we have the website we have plenty of things to do english has to be
 
 1:04:18.960,1:04:24.319
-fixed in many of the
-
-1:04:21.119,1:04:27.280
-uh in many of the of the of the
+fixed in many of the uh in many of the of the of the
 
 1:04:24.319,1:04:29.039
-contributions some math is broken and
-
-1:04:27.280,1:04:32.319
-you know there is plenty of
+contributions some math is broken and you know there is plenty of
 
 1:04:29.039,1:04:33.680
-things uh open source things to do if
-
-1:04:32.319,1:04:37.200
-you are
+things uh open source things to do if you are
 
 1:04:33.680,1:04:37.200
-inclined if you are interested and
-
-1:04:37.280,1:04:44.960
-and yeah i think pretty much that's it
+inclined if you are interested and and yeah i think pretty much that's it
 
 1:04:41.440,1:04:47.039
-um i i'll see you next monday right
-
-1:04:44.960,1:04:49.359
-again you should submit the three video
+um i i'll see you next monday right again you should submit the three video
 
 1:04:47.039,1:04:51.039
-presentation i made a um
-
-1:04:49.359,1:04:54.079
-i made a tutorial about how to make a
+presentation i made a um i made a tutorial about how to make a
 
 1:04:51.039,1:04:56.079
-presentation if you like how i teach and
-
-1:04:54.079,1:04:58.319
-you may want to hear my opinion about
+presentation if you like how i teach and you may want to hear my opinion about
 
 1:04:56.079,1:05:01.839
-how you should present your work
-
-1:04:58.319,1:05:05.760
-uh it's on again on youtube
+how you should present your work uh it's on again on youtube
 
 1:05:01.839,1:05:09.039
-and yeah i think that's it all right
-
-1:05:05.760,1:05:12.799
-so again thank you so much and
+and yeah i think that's it all right so again thank you so much and
 
 1:05:09.039,1:05:13.920
-i can't wait to see all your results for
-
-1:05:12.799,1:05:16.010
-the
+i can't wait to see all your results for the
 
 1:05:13.920,1:05:17.440
-for for the project um
-
-1:05:16.010,1:05:22.559
-[Music]
+for for the project um [Music]
 
 1:05:17.440,1:05:25.680
-see you on monday good luck bye
-
-1:05:22.559,1:05:28.319
-about the class ah [ __ ] there was one
+see you on monday good luck bye about the class ah [ __ ] there was one
 
 1:05:25.680,1:05:28.319
-more notebook
-
-1:05:28.480,1:05:31.680
-damn okay
+more notebook damn okay
 
 1:05:32.160,1:05:36.640
-okay let me ah okay i can't go over i'm
-
-1:05:34.880,1:05:37.599
-too late right in the extra and there is
+okay let me ah okay i can't go over i'm too late right in the extra and there is
 
 1:05:36.640,1:05:41.200
-one more notebook
-
-1:05:37.599,1:05:44.720
-i wanted to talk about which is the
+one more notebook i wanted to talk about which is the
 
 1:05:41.200,1:05:47.839
-so this is the projection notebook yeah
-
-1:05:44.720,1:05:47.839
-okay so
+so this is the projection notebook yeah okay so
 
 1:05:48.240,1:05:52.799
-ah okay maybe we can do an extra lesson
-
-1:05:50.640,1:05:53.839
-with the projection uh and i talk about
+ah okay maybe we can do an extra lesson with the projection uh and i talk about
 
 1:05:52.799,1:05:57.039
-this next week
-
-1:05:53.839,1:06:01.839
-up to you guys more questions i know i
+this next week up to you guys more questions i know i
 
 1:05:57.039,1:06:01.839
-it's late and uh there was this notebook
-
-1:06:02.839,1:06:06.079
-it's
+it's late and uh there was this notebook it's
 
 1:06:04.160,1:06:08.319
-okay yeah you know i want to be teaching
-
-1:06:06.079,1:06:08.319
-more
+okay yeah you know i want to be teaching more
 
 1:06:09.280,1:06:14.079
-okay no no question there is a question
-
-1:06:12.240,1:06:16.559
-google uses
+okay no no question there is a question google uses
 
 1:06:14.079,1:06:17.680
-visor to select either parameters for
-
-1:06:16.559,1:06:20.720
-its neural for
+visor to select either parameters for its neural for
 
 1:06:17.680,1:06:22.480
-its networks those tend to be either
-
-1:06:20.720,1:06:24.799
-random search or gaussian process for
+its networks those tend to be either random search or gaussian process for
 
 1:06:22.480,1:06:27.920
-hyper parameter optimize exactly
-
-1:06:24.799,1:06:29.359
-uh yeah but i haven't worked like i
+hyper parameter optimize exactly uh yeah but i haven't worked like i
 
 1:06:27.920,1:06:32.720
-haven't tried them out so i can't
-
-1:06:29.359,1:06:36.319
-really give you a opinion so i
+haven't tried them out so i can't really give you a opinion so i
 
 1:06:32.720,1:06:36.319
-i know they exist but i'm not
-
-1:06:36.400,1:06:40.559
-i don't exactly know everything yet
+i know they exist but i'm not i don't exactly know everything yet
 
 1:06:41.839,1:06:48.720
-okay uh i think that's it right
-
-1:06:45.280,1:06:51.839
-okay so see you monday thanks yeah
+okay uh i think that's it right okay so see you monday thanks yeah
 
 1:06:48.720,1:06:53.839
-of course boy post
-
-1:06:51.839,1:06:55.680
-a lasagna oh i put the i put the lemon
+of course boy post a lasagna oh i put the i put the lemon
 
 1:06:53.839,1:06:57.280
-cake
-
-1:06:55.680,1:06:59.280
-right keep the teaching going yeah
+cake right keep the teaching going yeah
 
 1:06:57.280,1:07:01.359
-that's for sure
-
-1:06:59.280,1:07:02.720
-i think we are there jan is teaching
+that's for sure i think we are there jan is teaching
 
 1:07:01.359,1:07:06.079
-also in the in the fall
-
-1:07:02.720,1:07:08.160
-actually jan and jung are pairing up
+also in the in the fall actually jan and jung are pairing up
 
 1:07:06.079,1:07:10.000
-and they are teaching in the fall and i
-
-1:07:08.160,1:07:12.799
-will be also teaching the labs
+and they are teaching in the fall and i will be also teaching the labs
 
 1:07:10.000,1:07:14.079
-but i don't know we haven't yet
-
-1:07:12.799,1:07:17.839
-discussed the content
+but i don't know we haven't yet discussed the content
 
 1:07:14.079,1:07:18.319
-i'm like oh boy more teaching but it's
-
-1:07:17.839,1:07:22.559
-fun
+i'm like oh boy more teaching but it's fun
 
 1:07:18.319,1:07:24.880
-but okay
-
-1:07:22.559,1:07:24.880
-bye
+but okay bye
 
 1:07:26.030,1:07:31.039
-[Music]
-
-1:07:27.520,1:07:34.720
-okay so i think
+[Music] okay so i think
 
 1:07:31.039,1:07:36.000
-that was it for today unless there are
-
-1:07:34.720,1:07:38.880
-some questions for me
+that was it for today unless there are some questions for me
 
 1:07:36.000,1:07:39.520
-for jan uh i know you send me emails i
-
-1:07:38.880,1:07:42.799
-have
+for jan uh i know you send me emails i have
 
 1:07:39.520,1:07:43.119
-a few i think a few hundred emails from
-
-1:07:42.799,1:07:46.799
-you
+a few i think a few hundred emails from you
 
 1:07:43.119,1:07:49.440
-i will answer uh
-
-1:07:46.799,1:07:50.960
-i will answer don't worry uh don't don't
+i will answer uh i will answer don't worry uh don't don't
 
 1:07:49.440,1:07:52.319
-don't worry too much we can figure out
-
-1:07:50.960,1:07:53.440
-what's happening right don't don't freak
+don't worry too much we can figure out what's happening right don't don't freak
 
 1:07:52.319,1:07:55.359
-out
-
-1:07:53.440,1:07:57.119
-as i told you before we can have an
+out as i told you before we can have an
 
 1:07:55.359,1:07:59.760
-extra lesson in one month
-
-1:07:57.119,1:08:00.400
-for the energy based models uh whenever
+extra lesson in one month for the energy based models uh whenever
 
 1:07:59.760,1:08:03.520
-i'm done
-
-1:08:00.400,1:08:04.000
-preparing it uh again this is like up to
+i'm done preparing it uh again this is like up to
 
 1:08:03.520,1:08:06.319
-you
-
-1:08:04.000,1:08:08.079
-voluntary it's not it's completely off
+you voluntary it's not it's completely off
 
 1:08:06.319,1:08:10.000
-class right it's like
-
-1:08:08.079,1:08:11.520
-i was thinking that it makes sense since
+class right it's like i was thinking that it makes sense since
 
 1:08:10.000,1:08:13.280
-someone asked to
-
-1:08:11.520,1:08:15.520
-create like a lab for the energy based
+someone asked to create like a lab for the energy based
 
 1:08:13.280,1:08:18.880
-model and i said yes well i
-
-1:08:15.520,1:08:20.960
-i always keep my word so uh i didn't
+model and i said yes well i i always keep my word so uh i didn't
 
 1:08:18.880,1:08:24.080
-manage to do it on time but you know
-
-1:08:20.960,1:08:27.520
-i will do i will work for this
+manage to do it on time but you know i will do i will work for this
 
 1:08:24.080,1:08:27.520
-um questions
-
-1:08:28.080,1:08:31.600
-nope all right so it was has been an
+um questions nope all right so it was has been an
 
 1:08:30.319,1:08:35.279
-honor uh seriously
-
-1:08:31.600,1:08:37.600
-i i loved being uh been teaching
+honor uh seriously i i loved being uh been teaching
 
 1:08:35.279,1:08:38.480
-to you this semester uh you had so many
-
-1:08:37.600,1:08:40.080
-questions and
+to you this semester uh you had so many questions and
 
 1:08:38.480,1:08:41.920
-especially when we switched to this
-
-1:08:40.080,1:08:45.679
-online format
+especially when we switched to this online format
 
 1:08:41.920,1:08:47.600
-i think i personally loved it right so
-
-1:08:45.679,1:08:49.920
-at least in my opinion before we had jan
+i think i personally loved it right so at least in my opinion before we had jan
 
 1:08:47.600,1:08:53.440
-lecturing and maybe you are a bit shy
-
-1:08:49.920,1:08:55.839
-uh i'm not shy i mean i i don't care
+lecturing and maybe you are a bit shy uh i'm not shy i mean i i don't care
 
 1:08:53.440,1:08:58.239
-so i i think this format where you write
-
-1:08:55.839,1:09:01.120
-questions and i just read out whatever
+so i i think this format where you write questions and i just read out whatever
 
 1:08:58.239,1:09:02.640
-uh it's in your mind uh it really worked
-
-1:09:01.120,1:09:04.719
-well in terms of
+uh it's in your mind uh it really worked well in terms of
 
 1:09:02.640,1:09:06.159
-you know figuring out what are those
-
-1:09:04.719,1:09:09.440
-aspects that are at least
+you know figuring out what are those aspects that are at least
 
 1:09:06.159,1:09:12.560
-a little bit uh harder to uh to to
-
-1:09:09.440,1:09:15.040
-to to catch right uh because again we
+a little bit uh harder to uh to to to to catch right uh because again we
 
 1:09:12.560,1:09:16.080
-we may not be able to figure out what is
-
-1:09:15.040,1:09:19.520
-the part that is
+we may not be able to figure out what is the part that is
 
 1:09:16.080,1:09:21.520
-less um less clear
-
-1:09:19.520,1:09:23.520
-maybe because we've been talking about
+less um less clear maybe because we've been talking about
 
 1:09:21.520,1:09:25.359
-these things for a while now
-
-1:09:23.520,1:09:26.719
-so again i think if you write those
+these things for a while now so again i think if you write those
 
 1:09:25.359,1:09:28.480
-questions i read them
-
-1:09:26.719,1:09:30.799
-and we have like a speaker we have like
+questions i read them and we have like a speaker we have like
 
 1:09:28.480,1:09:32.799
-some kind of conversation
-
-1:09:30.799,1:09:34.159
-presentation it's much more effective in
+some kind of conversation presentation it's much more effective in
 
 1:09:32.799,1:09:37.359
-terms of
-
-1:09:34.159,1:09:39.359
-content and delivery right yeah i want
+terms of content and delivery right yeah i want
 
 1:09:37.359,1:09:40.719
-to echo what alfredo said it was a
-
-1:09:39.359,1:09:42.880
-it was a pleasure teaching the class as
+to echo what alfredo said it was a it was a pleasure teaching the class as
 
 1:09:40.719,1:09:46.880
-well you know despite the circumstances
-
-1:09:42.880,1:09:48.000
-and uh um you know i'm very thankful to
+well you know despite the circumstances and uh um you know i'm very thankful to
 
 1:09:46.880,1:09:49.759
-alfredo i think
-
-1:09:48.000,1:09:52.159
-you know he's putting his heart into
+alfredo i think you know he's putting his heart into
 
 1:09:49.759,1:09:55.280
-this as you can tell
-
-1:09:52.159,1:09:57.679
-and um and and
+this as you can tell and um and and
 
 1:09:55.280,1:10:00.840
-you know i'm i'm i'm really i'm really
-
-1:09:57.679,1:10:03.520
-thankful for for him to do all this job
+you know i'm i'm i'm really i'm really thankful for for him to do all this job
 
 1:10:00.840,1:10:05.600
-um because i think it uh it makes a huge
-
-1:10:03.520,1:10:08.719
-difference in terms of the
+um because i think it uh it makes a huge difference in terms of the
 
 1:10:05.600,1:10:10.320
-uh usefulness of the class and um so
-
-1:10:08.719,1:10:12.320
-thank you alfredo
+uh usefulness of the class and um so thank you alfredo
 
 1:10:10.320,1:10:14.159
-thank you and jacquin right justin made
-
-1:10:12.320,1:10:16.400
-the whole the challenge
+thank you and jacquin right justin made the whole the challenge
 
 1:10:14.159,1:10:17.280
-actually did a huge amount oh my god
-
-1:10:16.400,1:10:19.199
-this last month
+actually did a huge amount oh my god this last month
 
 1:10:17.280,1:10:21.040
-that's the biggest competition possible
-
-1:10:19.199,1:10:24.159
-to put together the data
+that's the biggest competition possible to put together the data
 
 1:10:21.040,1:10:26.400
-the basic code the data loader uh
-
-1:10:24.159,1:10:27.199
-this was i mean he worked on this for
+the basic code the data loader uh this was i mean he worked on this for
 
 1:10:26.400,1:10:29.679
-you know a lot
-
-1:10:27.199,1:10:30.640
-for the last few months and and then you
+you know a lot for the last few months and and then you
 
 1:10:29.679,1:10:32.960
-know gathering
-
-1:10:30.640,1:10:34.159
-gathering all the other results so thank
+know gathering gathering all the other results so thank
 
 1:10:32.960,1:10:36.159
-you
-
-1:10:34.159,1:10:37.360
-yeah i think it's been two months now
+you yeah i think it's been two months now
 
 1:10:36.159,1:10:40.560
-he's been working
-
-1:10:37.360,1:10:42.880
-on this stuff all right guys
+he's been working on this stuff all right guys
 
 1:10:40.560,1:10:44.080
-thank you you always get me uh you know
-
-1:10:42.880,1:10:47.440
-just tweet me
+thank you you always get me uh you know just tweet me
 
 1:10:44.080,1:10:49.760
-i answer every time uh uh anything you
-
-1:10:47.440,1:10:52.000
-need you know you can find me my door
+i answer every time uh uh anything you need you know you can find me my door
 
 1:10:49.760,1:10:54.880
-is always open uh or in the office or
-
-1:10:52.000,1:10:57.679
-here on on zoom right so
+is always open uh or in the office or here on on zoom right so
 
 1:10:54.880,1:10:59.920
-as alfredo said this this project uh we
-
-1:10:57.679,1:11:03.440
-have this uh autonomous driving project
+as alfredo said this this project uh we have this uh autonomous driving project
 
 1:10:59.920,1:11:05.600
-and uh you know uh we need all the help
-
-1:11:03.440,1:11:07.040
-we can get with this so if you
+and uh you know uh we need all the help we can get with this so if you
 
 1:11:05.600,1:11:09.199
-are in some of the top teams and you are
-
-1:11:07.040,1:11:11.920
-interested in participating uh
+are in some of the top teams and you are interested in participating uh
 
 1:11:09.199,1:11:12.640
-get in touch with alfredo and you know
-
-1:11:11.920,1:11:14.960
-you could
+get in touch with alfredo and you know you could
 
 1:11:12.640,1:11:17.040
-work on this during the summer or or
-
-1:11:14.960,1:11:20.880
-perhaps beyond
+work on this during the summer or or perhaps beyond
 
 1:11:17.040,1:11:20.880
-all right all right um goodbye guys
-
-1:11:22.000,1:11:27.840
-all right okay bye bye guys
-
-1:11:25.440,1:11:27.840
-bye
-
+all right all right um goodbye guys all right okay bye bye guys
diff --git a/docs/en/week15/practicum15A.sbv b/docs/en/week15/practicum15A.sbv
index 8d11aa945..d81e370a4 100644
--- a/docs/en/week15/practicum15A.sbv
+++ b/docs/en/week15/practicum15A.sbv
@@ -1,4599 +1,2297 @@
 0:00:02.720,0:00:06.000
-all right all right all right
-
-0:00:04.240,0:00:08.080
-so today we're gonna be talking again
+all right all right all right so today we're gonna be talking again
 
 0:00:06.000,0:00:10.240
-about foundations of deep learning
-
-0:00:08.080,0:00:11.040
-that's me alfredo and you can find me on
+about foundations of deep learning that's me alfredo and you can find me on
 
 0:00:10.240,0:00:13.840
-twitter
-
-0:00:11.040,0:00:15.679
-on the handle alfcnz actually if you
+twitter on the handle alfcnz actually if you
 
 0:00:13.840,0:00:17.840
-check twitter you can find some
-
-0:00:15.679,0:00:18.960
-you could find some news about today's
+check twitter you can find some you could find some news about today's
 
 0:00:17.840,0:00:21.359
-lesson
-
-0:00:18.960,0:00:22.560
-since i posted online like yesterday
+lesson since i posted online like yesterday
 
 0:00:21.359,0:00:25.359
-night
-
-0:00:22.560,0:00:27.199
-so the deal is always the same as soon
+night so the deal is always the same as soon
 
 0:00:25.359,0:00:28.720
-as you don't understand as
-
-0:00:27.199,0:00:30.400
-soon as i don't make sense since i
+as you don't understand as soon as i don't make sense since i
 
 0:00:28.720,0:00:32.239
-didn't sleep and i've been working on
-
-0:00:30.400,0:00:34.160
-this stuff for the last 30 hours
+didn't sleep and i've been working on this stuff for the last 30 hours
 
 0:00:32.239,0:00:36.079
-it's very likely i'm not gonna be making
-
-0:00:34.160,0:00:38.399
-much sense at some times
+it's very likely i'm not gonna be making much sense at some times
 
 0:00:36.079,0:00:39.520
-so every time something is not clear
-
-0:00:38.399,0:00:41.920
-just stop me
+so every time something is not clear just stop me
 
 0:00:39.520,0:00:42.559
-ask me anything because again if we keep
-
-0:00:41.920,0:00:44.320
-going
+ask me anything because again if we keep going
 
 0:00:42.559,0:00:47.360
-and you're not following then we are not
-
-0:00:44.320,0:00:47.360
-going anywhere okay
+and you're not following then we are not going anywhere okay
 
 0:00:53.520,0:00:56.879
-all right so today we're going to be
-
-0:00:55.120,0:00:59.520
-talking about inference
+all right so today we're going to be talking about inference
 
 0:00:56.879,0:01:01.120
-for latent variable energy-based models
-
-0:00:59.520,0:01:04.799
-ebns
+for latent variable energy-based models ebns
 
 0:01:01.120,0:01:07.920
-for example the ellipse likewise
-
-0:01:04.799,0:01:10.640
-we have cover only inference only
+for example the ellipse likewise we have cover only inference only
 
 0:01:07.920,0:01:12.320
-inference in our first lab
-
-0:01:10.640,0:01:14.799
-today we're gonna be only covering
+inference in our first lab today we're gonna be only covering
 
 0:01:12.320,0:01:18.080
-inference for energy based models
-
-0:01:14.799,0:01:19.040
-i will not say the word training ever
+inference for energy based models i will not say the word training ever
 
 0:01:18.080,0:01:21.680
-again
-
-0:01:19.040,0:01:22.560
-okay i'll try at least so today we're
+again okay i'll try at least so today we're
 
 0:01:21.680,0:01:25.280
-gonna be talking about
-
-0:01:22.560,0:01:26.159
-inference what is this stuff and where
+gonna be talking about inference what is this stuff and where
 
 0:01:25.280,0:01:28.080
-do we start
-
-0:01:26.159,0:01:29.360
-we're going to be starting from our
+do we start we're going to be starting from our
 
 0:01:28.080,0:01:30.960
-training examples
-
-0:01:29.360,0:01:32.880
-training training samples i said i
+training examples training training samples i said i
 
 0:01:30.960,0:01:34.880
-wasn't going to say this word
-
-0:01:32.880,0:01:36.560
-all right so let's see what kind of data
+wasn't going to say this word all right so let's see what kind of data
 
 0:01:34.880,0:01:38.799
-we're going to be working on and why we
-
-0:01:36.560,0:01:42.240
-need these energy based models
+we're going to be working on and why we need these energy based models
 
 0:01:38.799,0:01:45.920
-so we can think about our data why
-
-0:01:42.240,0:01:49.200
-bold y is uh it's been living
+so we can think about our data why bold y is uh it's been living
 
 0:01:45.920,0:01:52.000
-having two components y one y two
-
-0:01:49.200,0:01:52.960
-y one is gonna be this row one function
+having two components y one y two y one is gonna be this row one function
 
 0:01:52.000,0:01:55.840
-of x
-
-0:01:52.960,0:01:57.439
-which is going to be my input multiplied
+of x which is going to be my input multiplied
 
 0:01:55.840,0:01:59.920
-by the cosine of theta
-
-0:01:57.439,0:02:00.479
-which is some you know angle we don't
+by the cosine of theta which is some you know angle we don't
 
 0:01:59.920,0:02:03.040
-know
-
-0:02:00.479,0:02:04.960
-plus some epsilon noise and then row 2
+know plus some epsilon noise and then row 2
 
 0:02:03.040,0:02:07.680
-is going to be again a function of my
-
-0:02:04.960,0:02:08.080
-input x and then it's multiplied by a
+is going to be again a function of my input x and then it's multiplied by a
 
 0:02:07.680,0:02:11.360
-sine
-
-0:02:08.080,0:02:14.480
-of this theta we have no axis plus some
+sine of this theta we have no axis plus some
 
 0:02:11.360,0:02:18.480
-uh noise epsilon
-
-0:02:14.480,0:02:21.840
-rho is a function that maps the input
+uh noise epsilon rho is a function that maps the input
 
 0:02:18.480,0:02:22.560
-one dimensional r into a r2 and so it's
-
-0:02:21.840,0:02:25.440
-mapping my
+one dimensional r into a r2 and so it's mapping my
 
 0:02:22.560,0:02:26.480
-x into something that is this alpha x
-
-0:02:25.440,0:02:28.480
-plus beta
+x into something that is this alpha x plus beta
 
 0:02:26.480,0:02:29.840
-1 minus x for the first component and
-
-0:02:28.480,0:02:33.680
-the other is beta
+1 minus x for the first component and the other is beta
 
 0:02:29.840,0:02:36.239
-times x plus alpha multiply 1 minus x
-
-0:02:33.680,0:02:38.720
-and and then everything is multiplied by
+times x plus alpha multiply 1 minus x and and then everything is multiplied by
 
 0:02:36.239,0:02:41.840
-this exponential
-
-0:02:38.720,0:02:45.440
-of x so alpha and beta
+this exponential of x so alpha and beta
 
 0:02:41.840,0:02:45.760
-are simply 1.5 and 2 so this is simply
-
-0:02:45.440,0:02:50.319
-the
+are simply 1.5 and 2 so this is simply the
 
 0:02:45.760,0:02:53.760
-equation for a ellipse but then if x
-
-0:02:50.319,0:02:56.640
-goes from zero to one as i show you here
+equation for a ellipse but then if x goes from zero to one as i show you here
 
 0:02:53.760,0:02:57.840
-you're gonna have that this is gonna be
-
-0:02:56.640,0:03:00.400
-drawing
+you're gonna have that this is gonna be drawing
 
 0:02:57.840,0:03:02.000
-some sort of horn that is exponentially
-
-0:03:00.400,0:03:03.840
-no in the profile
+some sort of horn that is exponentially no in the profile
 
 0:03:02.000,0:03:05.280
-and then it starts as like something
-
-0:03:03.840,0:03:07.360
-like this and then eventually
+and then it starts as like something like this and then eventually
 
 0:03:05.280,0:03:09.440
-like horizontal ellipse and eventually
-
-0:03:07.360,0:03:12.720
-end up as a vertical ellipse
+like horizontal ellipse and eventually end up as a vertical ellipse
 
 0:03:09.440,0:03:15.840
-okay x here is going to be sample from
-
-0:03:12.720,0:03:18.400
-the uniform distribution
+okay x here is going to be sample from the uniform distribution
 
 0:03:15.840,0:03:22.640
-similarly theta is also sampled from the
-
-0:03:18.400,0:03:25.040
-uniform distribution from 0 to 2 pi
+similarly theta is also sampled from the uniform distribution from 0 to 2 pi
 
 0:03:22.640,0:03:26.239
-epsilon instead is sampled from a normal
-
-0:03:25.040,0:03:28.159
-distribution
+epsilon instead is sampled from a normal distribution
 
 0:03:26.239,0:03:31.680
-with mean 0 and then a standard
-
-0:03:28.159,0:03:34.319
-deviation of 1 over 20.
+with mean 0 and then a standard deviation of 1 over 20.
 
 0:03:31.680,0:03:34.879
-so again as you might have seen from
-
-0:03:34.319,0:03:37.440
-twitter
+so again as you might have seen from twitter
 
 0:03:34.879,0:03:38.159
-this stuff looks pretty cool and it
-
-0:03:37.440,0:03:40.959
-looks like
+this stuff looks pretty cool and it looks like
 
 0:03:38.159,0:03:42.720
-that but then since we have magic on
-
-0:03:40.959,0:03:45.200
-this side we can do this
+that but then since we have magic on this side we can do this
 
 0:03:42.720,0:03:47.280
-and so you can see here how we're gonna
-
-0:03:45.200,0:03:50.879
-be having this exponential
+and so you can see here how we're gonna be having this exponential
 
 0:03:47.280,0:03:53.840
-uh side right this exponential envelope
-
-0:03:50.879,0:03:55.680
-we start with the uh ellipse that is
+uh side right this exponential envelope we start with the uh ellipse that is
 
 0:03:53.840,0:03:57.360
-like vertical and then we end up with
-
-0:03:55.680,0:04:01.040
-this horizontal one
+like vertical and then we end up with this horizontal one
 
 0:03:57.360,0:04:04.080
-okay um
-
-0:04:01.040,0:04:07.280
-what we want to pay attention here
+okay um what we want to pay attention here
 
 0:04:04.080,0:04:10.480
-is that at a given specific location
-
-0:04:07.280,0:04:13.760
-x there is no
+is that at a given specific location x there is no
 
 0:04:10.480,0:04:16.880
-one y only right so we cannot really
-
-0:04:13.760,0:04:18.959
-train a neural net that is like a
+one y only right so we cannot really train a neural net that is like a
 
 0:04:16.880,0:04:20.799
-vector to vector mapping because there
-
-0:04:18.959,0:04:23.199
-is no vector to map
+vector to vector mapping because there is no vector to map
 
 0:04:20.799,0:04:24.320
-well there is a bunch of vectors right
-
-0:04:23.199,0:04:27.919
-so given one
+well there is a bunch of vectors right so given one
 
 0:04:24.320,0:04:28.639
-single input x there are many many many
-
-0:04:27.919,0:04:31.520
-possible
+single input x there are many many many possible
 
 0:04:28.639,0:04:32.160
-y's there is like a whole uh ellipse
-
-0:04:31.520,0:04:35.600
-right
+y's there is like a whole uh ellipse right
 
 0:04:32.160,0:04:38.080
-uh per given x so we can't really use
-
-0:04:35.600,0:04:38.960
-normal uh feed forward neural net to do
+uh per given x so we can't really use normal uh feed forward neural net to do
 
 0:04:38.080,0:04:41.199
-this
-
-0:04:38.960,0:04:42.479
-uh similarly if we are just talking
+this uh similarly if we are just talking
 
 0:04:41.199,0:04:45.360
-about y's
-
-0:04:42.479,0:04:45.759
-given one value of y one i cannot even
+about y's given one value of y one i cannot even
 
 0:04:45.360,0:04:48.240
-tell
-
-0:04:45.759,0:04:49.120
-what is the other corresponding y two
+tell what is the other corresponding y two
 
 0:04:48.240,0:04:51.680
-because there are
-
-0:04:49.120,0:04:54.080
-always two almost always two values for
+because there are always two almost always two values for
 
 0:04:51.680,0:04:57.120
-y two given one y one right
-
-0:04:54.080,0:04:59.759
-and so using vectors to vectors mapping
+y two given one y one right and so using vectors to vectors mapping
 
 0:04:57.120,0:05:00.320
-as we've been learning so far is not
-
-0:04:59.759,0:05:02.400
-quite
+as we've been learning so far is not quite
 
 0:05:00.320,0:05:04.400
-uh sufficient so today we're going to be
-
-0:05:02.400,0:05:06.400
-figuring out how to use these latent
+uh sufficient so today we're going to be figuring out how to use these latent
 
 0:05:04.400,0:05:07.600
-variable energy-based models to deal
-
-0:05:06.400,0:05:11.840
-with this kind of
+variable energy-based models to deal with this kind of
 
 0:05:07.600,0:05:11.840
-multimodal you know outcome
-
-0:05:12.320,0:05:16.639
-so to make things simple and make my
+multimodal you know outcome so to make things simple and make my
 
 0:05:15.600,0:05:19.120
-life easier
-
-0:05:16.639,0:05:21.280
-we're gonna do a few simplifications uh
+life easier we're gonna do a few simplifications uh
 
 0:05:19.120,0:05:25.360
-the first one i'm gonna be removing the
-
-0:05:21.280,0:05:28.720
-input so there will be no input data
+the first one i'm gonna be removing the input so there will be no input data
 
 0:05:25.360,0:05:32.560
-my model will not have input data
-
-0:05:28.720,0:05:36.320
-and this is like what anyhow i i fix my
+my model will not have input data and this is like what anyhow i i fix my
 
 0:05:32.560,0:05:38.320
-x to zero so by fixing the x to zero i'm
-
-0:05:36.320,0:05:39.919
-gonna have that my exponential becomes
+x to zero so by fixing the x to zero i'm gonna have that my exponential becomes
 
 0:05:38.320,0:05:42.639
-simply 1.
-
-0:05:39.919,0:05:43.520
-and then basically we turn out having
+simply 1. and then basically we turn out having
 
 0:05:42.639,0:05:46.720
-row 1
-
-0:05:43.520,0:05:48.560
-that becomes 2 right so alpha
+row 1 that becomes 2 right so alpha
 
 0:05:46.720,0:05:50.240
-gets deleted by 0 and then you just have
-
-0:05:48.560,0:05:53.199
-the beta multiplied by
+gets deleted by 0 and then you just have the beta multiplied by
 
 0:05:50.240,0:05:54.080
-a 1 and then row two automatically is
-
-0:05:53.199,0:05:57.360
-gonna get
+a 1 and then row two automatically is gonna get
 
 0:05:54.080,0:06:00.400
-the amplitude of 1.5 right and so
-
-0:05:57.360,0:06:04.240
-my data points y are going to be simply
+the amplitude of 1.5 right and so my data points y are going to be simply
 
 0:06:00.400,0:06:05.520
-points coming from this twice the cosine
-
-0:06:04.240,0:06:08.160
-of this uniform
+points coming from this twice the cosine of this uniform
 
 0:06:05.520,0:06:09.199
-simply uniformly sample theta and then
-
-0:06:08.160,0:06:13.360
-1.5
+simply uniformly sample theta and then 1.5
 
 0:06:09.199,0:06:16.720
-sine this uniform theta
-
-0:06:13.360,0:06:18.720
-the collection of all my y's
+sine this uniform theta the collection of all my y's
 
 0:06:16.720,0:06:20.319
-will give me capital y so capital y is
-
-0:06:18.720,0:06:22.400
-going to be the collection of all my
+will give me capital y so capital y is going to be the collection of all my
 
 0:06:20.319,0:06:25.600
-sample and here i decided to just use
-
-0:06:22.400,0:06:27.759
-24 samples so i have 24
+sample and here i decided to just use 24 samples so i have 24
 
 0:06:25.600,0:06:28.800
-different samples from the uniform
-
-0:06:27.759,0:06:31.440
-distribution
+different samples from the uniform distribution
 
 0:06:28.800,0:06:33.600
-okay and per each of these samples there
-
-0:06:31.440,0:06:35.680
-will be
+okay and per each of these samples there will be
 
 0:06:33.600,0:06:39.120
-one epsilon for the first component and
-
-0:06:35.680,0:06:42.160
-one epsilon for the second component
+one epsilon for the first component and one epsilon for the second component
 
 0:06:39.120,0:06:44.800
-all right so what we try to do today
-
-0:06:42.160,0:06:45.199
-is going to be to learn well to learn
+all right so what we try to do today is going to be to learn well to learn
 
 0:06:44.800,0:06:48.160
-wrong
-
-0:06:45.199,0:06:49.680
-we are not learning anything we um we
+wrong we are not learning anything we um we
 
 0:06:48.160,0:06:52.479
-imagine that someone gave us
-
-0:06:49.680,0:06:53.120
-a already trained already learned
+imagine that someone gave us a already trained already learned
 
 0:06:52.479,0:06:54.319
-network
-
-0:06:53.120,0:06:57.280
-we're going to be learning how to
+network we're going to be learning how to
 
 0:06:54.319,0:07:00.960
-perform inference how we can use a model
-
-0:06:57.280,0:07:02.960
-to figure out if one point it belongs or
+perform inference how we can use a model to figure out if one point it belongs or
 
 0:07:00.960,0:07:04.080
-doesn't belong to what was the training
-
-0:07:02.960,0:07:07.919
-manifold
+doesn't belong to what was the training manifold
 
 0:07:04.080,0:07:10.800
-okay so this is my
-
-0:07:07.919,0:07:11.919
-training data these are my ys which are
+okay so this is my training data these are my ys which are
 
 0:07:10.800,0:07:14.800
-again an ellipse
-
-0:07:11.919,0:07:15.440
-you can see here the major radii radius
+again an ellipse you can see here the major radii radius
 
 0:07:14.800,0:07:18.160
-is
-
-0:07:15.440,0:07:21.120
-two you can see right there are one two
+is two you can see right there are one two
 
 0:07:18.160,0:07:24.400
-three four boxes each box is 0.5
-
-0:07:21.120,0:07:26.400
-so this radius here is two and then the
+three four boxes each box is 0.5 so this radius here is two and then the
 
 0:07:24.400,0:07:29.680
-minor radius is gonna have one two
-
-0:07:26.400,0:07:32.960
-three boxes uh each box is 0.5
+minor radius is gonna have one two three boxes uh each box is 0.5
 
 0:07:29.680,0:07:36.560
-and so this is the minor radius of 1.5
-
-0:07:32.960,0:07:38.240
-when you said there's no input just
+and so this is the minor radius of 1.5 when you said there's no input just
 
 0:07:36.560,0:07:40.800
-what is theta do you consider that an
-
-0:07:38.240,0:07:41.520
-input or so theta we don't have access
+what is theta do you consider that an input or so theta we don't have access
 
 0:07:40.800,0:07:44.879
-to right
-
-0:07:41.520,0:07:47.840
-so theta is something we don't see x
+to right so theta is something we don't see x
 
 0:07:44.879,0:07:48.720
-could be the input we provide the model
-
-0:07:47.840,0:07:51.440
-to figure out
+could be the input we provide the model to figure out
 
 0:07:48.720,0:07:51.919
-at what location we are at that kind of
-
-0:07:51.440,0:07:54.080
-uh
+at what location we are at that kind of uh
 
 0:07:51.919,0:07:56.400
-horn allows us to figure out the
-
-0:07:54.080,0:07:59.759
-dimension of those
+horn allows us to figure out the dimension of those
 
 0:07:56.400,0:08:01.440
-ellipse ellipses but then we theta
-
-0:07:59.759,0:08:03.199
-here is something we don't have access
+ellipse ellipses but then we theta here is something we don't have access
 
 0:08:01.440,0:08:06.319
-to so theta was
-
-0:08:03.199,0:08:09.599
-is simply a variable which is
+to so theta was is simply a variable which is
 
 0:08:06.319,0:08:10.400
-uh missing which was used for generating
-
-0:08:09.599,0:08:12.160
-our data
+uh missing which was used for generating our data
 
 0:08:10.400,0:08:13.919
-but we don't have access to so it's a
-
-0:08:12.160,0:08:18.080
-missing variable it's a missing
+but we don't have access to so it's a missing variable it's a missing
 
 0:08:13.919,0:08:21.360
-input okay so we don't have access okay
-
-0:08:18.080,0:08:25.440
-all right so let's look at what the
+input okay so we don't have access okay all right so let's look at what the
 
 0:08:21.360,0:08:28.479
-model manifold is so in this case
-
-0:08:25.440,0:08:31.360
-i'm gonna have a latent input
+model manifold is so in this case i'm gonna have a latent input
 
 0:08:28.479,0:08:33.200
-which is something uh latent mean means
-
-0:08:31.360,0:08:33.680
-it's missing i don't have access to this
+which is something uh latent mean means it's missing i don't have access to this
 
 0:08:33.200,0:08:36.640
-input
-
-0:08:33.680,0:08:38.080
-still there is some you know potential
+input still there is some you know potential
 
 0:08:36.640,0:08:39.839
-input
-
-0:08:38.080,0:08:41.680
-you notice here is the same color as
+input you notice here is the same color as
 
 0:08:39.839,0:08:45.120
-that theta right
-
-0:08:41.680,0:08:47.920
-anyhow so i have my z z which is uh
+that theta right anyhow so i have my z z which is uh
 
 0:08:45.120,0:08:49.040
-i can decide to like take it from zero
-
-0:08:47.920,0:08:51.839
-to two pi
+i can decide to like take it from zero to two pi
 
 0:08:49.040,0:08:53.440
-uh without the so that square bracket
-
-0:08:51.839,0:08:55.680
-flipped square bracket means
+uh without the so that square bracket flipped square bracket means
 
 0:08:53.440,0:08:56.800
-i'm considering a vector that goes from
-
-0:08:55.680,0:09:00.240
-0 to 2 pi
+i'm considering a vector that goes from 0 to 2 pi
 
 0:08:56.800,0:09:03.440
-with 2 pi excluded with a step
-
-0:09:00.240,0:09:06.560
-you know pi over 24. and so
+with 2 pi excluded with a step you know pi over 24. and so
 
 0:09:03.440,0:09:09.760
-this one basically is like a line where
-
-0:09:06.560,0:09:11.120
-there are many points there are uh 40 48
+this one basically is like a line where there are many points there are uh 40 48
 
 0:09:09.760,0:09:14.320
-points right
-
-0:09:11.120,0:09:17.680
-from zero to two pi excluded so this
+points right from zero to two pi excluded so this
 
 0:09:14.320,0:09:19.839
-latent input goes inside a decoder
-
-0:09:17.680,0:09:21.040
-and then the decoder is going to give me
+latent input goes inside a decoder and then the decoder is going to give me
 
 0:09:19.839,0:09:23.519
-this y
-
-0:09:21.040,0:09:25.519
-tilde and y is bold because again it
+this y tilde and y is bold because again it
 
 0:09:23.519,0:09:27.920
-lives in two dimensions
-
-0:09:25.519,0:09:28.959
-uh more precisely we're gonna have that
+lives in two dimensions uh more precisely we're gonna have that
 
 0:09:27.920,0:09:32.560
-by varying
-
-0:09:28.959,0:09:33.279
-z over one line y tilde is gonna be
+by varying z over one line y tilde is gonna be
 
 0:09:32.560,0:09:37.760
-varying
-
-0:09:33.279,0:09:39.040
-around a uh ellipse okay
+varying around a uh ellipse okay
 
 0:09:37.760,0:09:41.760
-on the other side instead we're going to
-
-0:09:39.040,0:09:44.560
-have these bold y which are my
+on the other side instead we're going to have these bold y which are my
 
 0:09:41.760,0:09:45.920
-observations so how do i know these are
-
-0:09:44.560,0:09:48.560
-observations
+observations so how do i know these are observations
 
 0:09:45.920,0:09:49.120
-because it's uh this circle it's shaded
-
-0:09:48.560,0:09:51.040
-whereas
+because it's uh this circle it's shaded whereas
 
 0:09:49.120,0:09:52.160
-those other circles are simply
-
-0:09:51.040,0:09:54.080
-transparent
+those other circles are simply transparent
 
 0:09:52.160,0:09:56.640
-the bottom one is a little bit gray
-
-0:09:54.080,0:09:58.399
-which means i have access to this data
+the bottom one is a little bit gray which means i have access to this data
 
 0:09:56.640,0:10:01.760
-okay
-
-0:09:58.399,0:10:02.399
-cool so this is how these points look
+okay cool so this is how these points look
 
 0:10:01.760,0:10:05.120
-right the
-
-0:10:02.399,0:10:06.160
-blue points are the one sample from my
+right the blue points are the one sample from my
 
 0:10:05.120,0:10:07.760
-data generation
-
-0:10:06.160,0:10:09.440
-generated distribution we already
+data generation generated distribution we already
 
 0:10:07.760,0:10:12.399
-sampled them we have 24
-
-0:10:09.440,0:10:13.839
-and then here i just decided to plot uh
+sampled them we have 24 and then here i just decided to plot uh
 
 0:10:12.399,0:10:17.760
-48
-
-0:10:13.839,0:10:19.519
-of these values from like
+48 of these values from like
 
 0:10:17.760,0:10:21.519
-reconstruction of those latent variables
-
-0:10:19.519,0:10:24.800
-right such that i can clearly see
+reconstruction of those latent variables right such that i can clearly see
 
 0:10:21.519,0:10:25.519
-what the network thinks uh the the true
-
-0:10:24.800,0:10:28.720
-manifold
+what the network thinks uh the the true manifold
 
 0:10:25.519,0:10:30.000
-is okay in the second episode
-
-0:10:28.720,0:10:32.480
-when we are gonna be learning we're
+is okay in the second episode when we are gonna be learning we're
 
 0:10:30.000,0:10:35.600
-gonna be figuring out how to match
-
-0:10:32.480,0:10:38.480
-my internal belief the the violet one
+gonna be figuring out how to match my internal belief the the violet one
 
 0:10:35.600,0:10:39.760
-with actual the the data we have but
-
-0:10:38.480,0:10:40.880
-we're not going to be seeing that this
+with actual the the data we have but we're not going to be seeing that this
 
 0:10:39.760,0:10:42.800
-time this time
-
-0:10:40.880,0:10:45.040
-we already have this model which is
+time this time we already have this model which is
 
 0:10:42.800,0:10:48.160
-pretty bad since it's not
-
-0:10:45.040,0:10:51.440
-already matching the data and still
+pretty bad since it's not already matching the data and still
 
 0:10:48.160,0:10:53.600
-going to be seeing how to use this model
-
-0:10:51.440,0:10:54.560
-so what what determines the shape of the
+going to be seeing how to use this model so what what determines the shape of the
 
 0:10:53.600,0:10:56.560
-red or
-
-0:10:54.560,0:10:58.000
-orange points is it the alpha and the
+red or orange points is it the alpha and the
 
 0:10:56.560,0:11:00.959
-beta
-
-0:10:58.000,0:11:01.600
-uh alpha and beta are determining the
+beta uh alpha and beta are determining the
 
 0:11:00.959,0:11:03.760
-side the
-
-0:11:01.600,0:11:06.000
-the shape of that blue thing right so
+side the the shape of that blue thing right so
 
 0:11:03.760,0:11:07.519
-the overall thing it was that horn
-
-0:11:06.000,0:11:10.079
-uh i showed you before the one that was
+the overall thing it was that horn uh i showed you before the one that was
 
 0:11:07.519,0:11:12.000
-spinning and then we decided to slice it
-
-0:11:10.079,0:11:14.640
-at a specific value of
+spinning and then we decided to slice it at a specific value of
 
 0:11:12.000,0:11:15.040
-x right so this is like a cross section
-
-0:11:14.640,0:11:18.000
-which
+x right so this is like a cross section which
 
 0:11:15.040,0:11:19.279
-gives us this potato the blue potato on
-
-0:11:18.000,0:11:21.120
-the other side i'm going to be telling
+gives us this potato the blue potato on the other side i'm going to be telling
 
 0:11:19.279,0:11:24.959
-you what is inside the decoder
-
-0:11:21.120,0:11:27.440
-we have a internal belief for what the
+you what is inside the decoder we have a internal belief for what the
 
 0:11:24.959,0:11:28.560
-true data manifold is right that's the
-
-0:11:27.440,0:11:32.079
-net network
+true data manifold is right that's the net network
 
 0:11:28.560,0:11:33.839
-that the model believe about the uh
-
-0:11:32.079,0:11:35.839
-you know the the how the data is
+that the model believe about the uh you know the the how the data is
 
 0:11:33.839,0:11:38.800
-supposed to look
-
-0:11:35.839,0:11:39.839
-okay let me let me show you in the next
+supposed to look okay let me let me show you in the next
 
 0:11:38.800,0:11:42.880
-slide a little bit
-
-0:11:39.839,0:11:45.920
-more information so maybe we can get uh
+slide a little bit more information so maybe we can get uh
 
 0:11:42.880,0:11:48.959
-you know sync so here
-
-0:11:45.920,0:11:49.680
-we're going to be looking at this energy
+you know sync so here we're going to be looking at this energy
 
 0:11:48.959,0:11:52.800
-function
-
-0:11:49.680,0:11:54.000
-so what is this energy function so this
+function so what is this energy function so this
 
 0:11:52.800,0:11:57.760
-energy function
-
-0:11:54.000,0:12:01.200
-it's um something that tells me
+energy function it's um something that tells me
 
 0:11:57.760,0:12:04.880
-what is the compatibility between this y
-
-0:12:01.200,0:12:07.360
-tilde and y the blue y right
+what is the compatibility between this y tilde and y the blue y right
 
 0:12:04.880,0:12:09.040
-and so basically in this case here
-
-0:12:07.360,0:12:12.399
-measures the distance between
+and so basically in this case here measures the distance between
 
 0:12:09.040,0:12:16.079
-my given training sample and my
-
-0:12:12.399,0:12:19.040
-reconstruction my given my best guess
+my given training sample and my reconstruction my given my best guess
 
 0:12:16.079,0:12:20.880
-about what i think it should be the real
-
-0:12:19.040,0:12:23.600
-data point
+about what i think it should be the real data point
 
 0:12:20.880,0:12:24.240
-so let's give more context here right so
-
-0:12:23.600,0:12:27.839
-my
+so let's give more context here right so my
 
 0:12:24.240,0:12:31.440
-energy e function of my
-
-0:12:27.839,0:12:34.639
-y data point and my latent variable z
+energy e function of my y data point and my latent variable z
 
 0:12:31.440,0:12:38.079
-it's gonna be the sum of the square
-
-0:12:34.639,0:12:38.880
-euclidean distances of the two
+it's gonna be the sum of the square euclidean distances of the two
 
 0:12:38.079,0:12:42.240
-components
-
-0:12:38.880,0:12:44.959
-so we have component one of the y minus
+components so we have component one of the y minus
 
 0:12:42.240,0:12:46.160
-component one of this g which is our
-
-0:12:44.959,0:12:48.160
-decoder
+component one of this g which is our decoder
 
 0:12:46.160,0:12:50.560
-function of z squared and then we have
-
-0:12:48.160,0:12:52.560
-the other one is going to be y2 minus
+function of z squared and then we have the other one is going to be y2 minus
 
 0:12:50.560,0:12:54.000
-g2 which is the second component of this
-
-0:12:52.560,0:12:57.200
-output of the decoder
+g2 which is the second component of this output of the decoder
 
 0:12:54.000,0:13:01.279
-squared and this importantly
-
-0:12:57.200,0:13:04.320
-uh happens for every y we pick
+squared and this importantly uh happens for every y we pick
 
 0:13:01.279,0:13:07.519
-from capital y so in this case
-
-0:13:04.320,0:13:10.560
-we have 24 different
+from capital y so in this case we have 24 different
 
 0:13:07.519,0:13:11.120
-e's right so we can index 24 different
-
-0:13:10.560,0:13:15.279
-e's
+e's right so we can index 24 different e's
 
 0:13:11.120,0:13:18.560
-based on the specific why you pick
-
-0:13:15.279,0:13:21.440
-more about this in the next slide so
+based on the specific why you pick more about this in the next slide so
 
 0:13:18.560,0:13:22.079
-what is this decoder so this decoder is
-
-0:13:21.440,0:13:24.639
-a little bit
+what is this decoder so this decoder is a little bit
 
 0:13:22.079,0:13:25.920
-cooked as in you know i know what is the
-
-0:13:24.639,0:13:28.720
-data generating
+cooked as in you know i know what is the data generating
 
 0:13:25.920,0:13:29.839
-uh process so i can put inside the g
-
-0:13:28.720,0:13:32.639
-what is quite
+uh process so i can put inside the g what is quite
 
 0:13:29.839,0:13:32.880
-uh you know align with what i think you
-
-0:13:32.639,0:13:36.000
-know
+uh you know align with what i think you know
 
 0:13:32.880,0:13:39.839
-it's a very good guess about how
-
-0:13:36.000,0:13:42.560
-the output should look uh so my g
+it's a very good guess about how the output should look uh so my g
 
 0:13:39.839,0:13:43.519
-which is a two component function g one
-
-0:13:42.560,0:13:47.120
-g two
+which is a two component function g one g two
 
 0:13:43.519,0:13:50.160
-maps uh the real line to this r2
-
-0:13:47.120,0:13:52.399
-and therefore it maps my z into these
+maps uh the real line to this r2 and therefore it maps my z into these
 
 0:13:50.160,0:13:54.959
-two components which are going to be w1
-
-0:13:52.399,0:13:56.720
-cosine cosine of z and then the second
+two components which are going to be w1 cosine cosine of z and then the second
 
 0:13:54.959,0:13:59.760
-component is going to be w2 because
-
-0:13:56.720,0:14:03.120
-the sine of z to notice here
+component is going to be w2 because the sine of z to notice here
 
 0:13:59.760,0:14:06.720
-the only parameters we have available
-
-0:14:03.120,0:14:07.600
-in this network in this decoder are w1
+the only parameters we have available in this network in this decoder are w1
 
 0:14:06.720,0:14:11.040
-and w2
-
-0:14:07.600,0:14:13.279
-okay cosine x and sine z
+and w2 okay cosine x and sine z
 
 0:14:11.040,0:14:14.880
-sorry cosine z and sine z are you know
-
-0:14:13.279,0:14:15.680
-knowledge a priori you know i know
+sorry cosine z and sine z are you know knowledge a priori you know i know
 
 0:14:14.880,0:14:18.800
-already and i
-
-0:14:15.680,0:14:21.199
-put there my best guess for that
+already and i put there my best guess for that
 
 0:14:18.800,0:14:22.800
-and so again this network has two
-
-0:14:21.199,0:14:25.440
-parameters nevertheless
+and so again this network has two parameters nevertheless
 
 0:14:22.800,0:14:26.480
-with two parameters we can still do many
-
-0:14:25.440,0:14:30.560
-things
+with two parameters we can still do many things
 
 0:14:26.480,0:14:33.920
-so again stress once again uh this e
-
-0:14:30.560,0:14:37.600
-happens to exist for any peak
+so again stress once again uh this e happens to exist for any peak
 
 0:14:33.920,0:14:40.560
-of y in this set of all y's
-
-0:14:37.600,0:14:42.320
-so let's put uh this e on on the top
+of y in this set of all y's so let's put uh this e on on the top
 
 0:14:40.560,0:14:45.440
-here just so i can
-
-0:14:42.320,0:14:49.440
-i can clear the screen below and so
+here just so i can i can clear the screen below and so
 
 0:14:45.440,0:14:52.959
-now i show you all 24
-
-0:14:49.440,0:14:55.519
-energies we have how do we
+now i show you all 24 energies we have how do we
 
 0:14:52.959,0:14:57.600
-how do i get this stuff right so these
-
-0:14:55.519,0:14:58.160
-energies are coming from the fact that i
+how do i get this stuff right so these energies are coming from the fact that i
 
 0:14:57.600,0:15:02.000
-pick
-
-0:14:58.160,0:15:04.560
-a specific y so the first one i pick
+pick a specific y so the first one i pick
 
 0:15:02.000,0:15:05.760
-y prime which is like my peak of y is
-
-0:15:04.560,0:15:08.320
-going to be the first
+y prime which is like my peak of y is going to be the first
 
 0:15:05.760,0:15:09.040
-of my training sample and therefore i
-
-0:15:08.320,0:15:11.680
-can call
+of my training sample and therefore i can call
 
 0:15:09.040,0:15:12.639
-the first energy my e1 right so i can
-
-0:15:11.680,0:15:14.240
-index them
+the first energy my e1 right so i can index them
 
 0:15:12.639,0:15:16.240
-right now since i have you know a
-
-0:15:14.240,0:15:18.720
-discrete number of training samples
+right now since i have you know a discrete number of training samples
 
 0:15:16.240,0:15:19.519
-i have a discrete number of energies in
-
-0:15:18.720,0:15:22.399
-this case
+i have a discrete number of energies in this case
 
 0:15:19.519,0:15:23.920
-so this is my e1 and then the last one
-
-0:15:22.399,0:15:25.519
-on the row is going to be in the one
+so this is my e1 and then the last one on the row is going to be in the one
 
 0:15:23.920,0:15:28.000
-associated to the sixth
-
-0:15:25.519,0:15:28.560
-sample of my training sample my training
+associated to the sixth sample of my training sample my training
 
 0:15:28.000,0:15:32.480
-set
-
-0:15:28.560,0:15:34.880
-and therefore i have my e6
+set and therefore i have my e6
 
 0:15:32.480,0:15:35.519
-uh if we go down until the last row
-
-0:15:34.880,0:15:38.880
-we're gonna be
+uh if we go down until the last row we're gonna be
 
 0:15:35.519,0:15:40.959
-seeing uh i'm gonna be picking the 19th
-
-0:15:38.880,0:15:43.839
-sample from my training set and then i'm
+seeing uh i'm gonna be picking the 19th sample from my training set and then i'm
 
 0:15:40.959,0:15:46.880
-going to have this e 19 over there
-
-0:15:43.839,0:15:49.759
-and finally if i pick my y prime
+going to have this e 19 over there and finally if i pick my y prime
 
 0:15:46.880,0:15:50.399
-being the last the 24th example then
-
-0:15:49.759,0:15:54.079
-i'll be
+being the last the 24th example then i'll be
 
 0:15:50.399,0:15:56.560
-ending up with the e24 on the x axis
-
-0:15:54.079,0:15:58.959
-of each of these little cells you're
+ending up with the e24 on the x axis of each of these little cells you're
 
 0:15:56.560,0:16:02.160
-going to be having z
-
-0:15:58.959,0:16:05.839
-so each of these e's know e 1 e 2 e 3
+going to be having z so each of these e's know e 1 e 2 e 3
 
 0:16:02.160,0:16:08.800
-e until 24 are functions
-
-0:16:05.839,0:16:10.720
-of my z latent variable which is
+e until 24 are functions of my z latent variable which is
 
 0:16:08.800,0:16:14.560
-spanning as we said before
-
-0:16:10.720,0:16:15.360
-zero to two pi in this uh drawing here i
+spanning as we said before zero to two pi in this uh drawing here i
 
 0:16:14.560,0:16:18.880
-just have them
-
-0:16:15.360,0:16:19.680
-separated by uh pi over 12. so i have
+just have them separated by uh pi over 12. so i have
 
 0:16:18.880,0:16:23.680
-nice
-
-0:16:19.680,0:16:27.199
-separation for drawing this function
+nice separation for drawing this function
 
 0:16:23.680,0:16:30.240
-so moreover the range of this energy
-
-0:16:27.199,0:16:31.920
-in this case is going to be 0 to 12 and
+so moreover the range of this energy in this case is going to be 0 to 12 and
 
 0:16:30.240,0:16:33.040
-we are we're going to be computing these
-
-0:16:31.920,0:16:35.120
-values in a
+we are we're going to be computing these values in a
 
 0:16:33.040,0:16:36.480
-just in a short moment such that we can
-
-0:16:35.120,0:16:39.040
-better understand
+just in a short moment such that we can better understand
 
 0:16:36.480,0:16:41.519
-what the heck i'm talking about right so
-
-0:16:39.040,0:16:45.360
-again until yesterday i had no clue
+what the heck i'm talking about right so again until yesterday i had no clue
 
 0:16:41.519,0:16:47.759
-about what these were okay so i am
-
-0:16:45.360,0:16:48.720
-very new to this topic as well and
+about what these were okay so i am very new to this topic as well and
 
 0:16:47.759,0:16:51.279
-therefore we are
-
-0:16:48.720,0:16:52.240
-exploring together what is this jungle
+therefore we are exploring together what is this jungle
 
 0:16:51.279,0:16:55.519
-of very
-
-0:16:52.240,0:16:57.680
-funny weird wiggly functions okay
+of very funny weird wiggly functions okay
 
 0:16:55.519,0:16:58.639
-we are gonna start by cherry picking two
-
-0:16:57.680,0:17:01.839
-of them
+we are gonna start by cherry picking two of them
 
 0:16:58.639,0:17:05.839
-uh for example the e23
-
-0:17:01.839,0:17:09.199
-the e23 looks pretty you know
+uh for example the e23 the e23 looks pretty you know
 
 0:17:05.839,0:17:12.160
-kind of okay it looks very uh
-
-0:17:09.199,0:17:12.720
-mostly smooth and i think it looks like
+kind of okay it looks very uh mostly smooth and i think it looks like
 
 0:17:12.160,0:17:14.640
-uh
-
-0:17:12.720,0:17:16.480
-you know even convex in the in the
+uh you know even convex in the in the
 
 0:17:14.640,0:17:18.319
-central part
-
-0:17:16.480,0:17:20.079
-and then i'm gonna be of course if i
+central part and then i'm gonna be of course if i
 
 0:17:18.319,0:17:21.839
-pick the nice one and smooth one i'm
-
-0:17:20.079,0:17:24.880
-gonna be also picking some weird stuff
+pick the nice one and smooth one i'm gonna be also picking some weird stuff
 
 0:17:21.839,0:17:25.199
-like like the the double wiggly the one
-
-0:17:24.880,0:17:27.760
-which
+like like the the double wiggly the one which
 
 0:17:25.199,0:17:28.559
-is wiggling yeah but as i said let's
-
-0:17:27.760,0:17:30.480
-start with
+is wiggling yeah but as i said let's start with
 
 0:17:28.559,0:17:33.360
-ease and let's start with the with a
-
-0:17:30.480,0:17:35.200
-simple version okay so far everything is
+ease and let's start with the with a simple version okay so far everything is
 
 0:17:33.360,0:17:36.720
-all right no one is writing anything on
-
-0:17:35.200,0:17:39.120
-the chat and
+all right no one is writing anything on the chat and
 
 0:17:36.720,0:17:40.559
-you know sean just asked a few questions
-
-0:17:39.120,0:17:46.720
-so so far we are
+you know sean just asked a few questions so so far we are
 
 0:17:40.559,0:17:50.480
-all on board or i lost some of you
-
-0:17:46.720,0:17:54.160
-no yeah so so basically like this square
+all on board or i lost some of you no yeah so so basically like this square
 
 0:17:50.480,0:17:54.960
-would be y23 and then the x-axis is
-
-0:17:54.160,0:17:57.039
-showing
+would be y23 and then the x-axis is showing
 
 0:17:54.960,0:17:58.480
-as you vary z you're going to be
-
-0:17:57.039,0:18:02.400
-evaluating this e
+as you vary z you're going to be evaluating this e
 
 0:17:58.480,0:18:03.039
-y 23 and z yeah yeah this is the e23 the
-
-0:18:02.400,0:18:05.360
-the one i
+y 23 and z yeah yeah this is the e23 the the one i
 
 0:18:03.039,0:18:06.080
-show you right now great the lecture's
-
-0:18:05.360,0:18:08.080
-going
+show you right now great the lecture's going
 
 0:18:06.080,0:18:09.760
-great i'm understanding as well okay
-
-0:18:08.080,0:18:12.880
-okay that's fantastic
+great i'm understanding as well okay okay that's fantastic
 
 0:18:09.760,0:18:16.080
-okay so let's look at this first
-
-0:18:12.880,0:18:16.720
-example on this kind of u shape so how
+okay so let's look at this first example on this kind of u shape so how
 
 0:18:16.080,0:18:19.760
-does
-
-0:18:16.720,0:18:23.039
-this u shape uh arise right
+does this u shape uh arise right
 
 0:18:19.760,0:18:26.320
-and so this is the current configuration
-
-0:18:23.039,0:18:28.160
-we have y prime is going to be the 23rd
+and so this is the current configuration we have y prime is going to be the 23rd
 
 0:18:26.320,0:18:31.360
-example from my training
-
-0:18:28.160,0:18:34.400
-set which is refigured here by that
+example from my training set which is refigured here by that
 
 0:18:31.360,0:18:38.320
-green x on the right hand side okay
-
-0:18:34.400,0:18:41.600
-so over here whenever i start
+green x on the right hand side okay so over here whenever i start
 
 0:18:38.320,0:18:43.840
-my z my z and i start with z equals zero
-
-0:18:41.600,0:18:44.960
-it actually turns out that z zero
+my z my z and i start with z equals zero it actually turns out that z zero
 
 0:18:43.840,0:18:47.760
-corresponds
-
-0:18:44.960,0:18:48.080
-to this location over here so if i send
+corresponds to this location over here so if i send
 
 0:18:47.760,0:18:50.400
-z
-
-0:18:48.080,0:18:52.160
-equals zero inside the decoder i'm gonna
+z equals zero inside the decoder i'm gonna
 
 0:18:50.400,0:18:55.520
-get a point over here
-
-0:18:52.160,0:18:58.240
-why is that oh because simply the w1
+get a point over here why is that oh because simply the w1
 
 0:18:55.520,0:18:58.880
-we just randomly generated is a negative
-
-0:18:58.240,0:19:02.960
-number
+we just randomly generated is a negative number
 
 0:18:58.880,0:19:05.440
-and so this uh this size over here from
-
-0:19:02.960,0:19:07.520
-zero like this the point from here to
+and so this uh this size over here from zero like this the point from here to
 
 0:19:05.440,0:19:10.000
-here this is my w1
-
-0:19:07.520,0:19:11.280
-and instead w2 is going to be a positive
+here this is my w1 and instead w2 is going to be a positive
 
 0:19:10.000,0:19:14.000
-number over here
-
-0:19:11.280,0:19:15.600
-so whenever we have z equals 0 you're
+number over here so whenever we have z equals 0 you're
 
 0:19:14.000,0:19:17.200
-going to have that the cosine of
-
-0:19:15.600,0:19:19.039
-0 is going to be equal to 1 so it
+going to have that the cosine of 0 is going to be equal to 1 so it
 
 0:19:17.200,0:19:21.440
-becomes 1 multiplied by
-
-0:19:19.039,0:19:22.320
-a negative number i go down here and
+becomes 1 multiplied by a negative number i go down here and
 
 0:19:21.440,0:19:25.360
-then
-
-0:19:22.320,0:19:27.039
-0 sine of zero is going to be zero so
+then 0 sine of zero is going to be zero so
 
 0:19:25.360,0:19:28.400
-you're gonna be you're gonna be on the x
-
-0:19:27.039,0:19:31.120
-axis
+you're gonna be you're gonna be on the x axis
 
 0:19:28.400,0:19:32.080
-so over here this is gonna be my initial
-
-0:19:31.120,0:19:34.799
-point
+so over here this is gonna be my initial point
 
 0:19:32.080,0:19:35.360
-uh how far is this point from the green
-
-0:19:34.799,0:19:38.480
-x
+uh how far is this point from the green x
 
 0:19:35.360,0:19:41.840
-let's count so we have one two boxes
-
-0:19:38.480,0:19:45.120
-three four boxes five six
+let's count so we have one two boxes three four boxes five six
 
 0:19:41.840,0:19:48.480
-six boxes and seven right so
-
-0:19:45.120,0:19:50.799
-two boxes are one right so seven
+six boxes and seven right so two boxes are one right so seven
 
 0:19:48.480,0:19:51.520
-boxes means we have three and a half
-
-0:19:50.799,0:19:54.799
-right
+boxes means we have three and a half right
 
 0:19:51.520,0:19:57.440
-so if i count it correctly one two three
-
-0:19:54.799,0:19:58.160
-three and a half so the distance between
+so if i count it correctly one two three three and a half so the distance between
 
 0:19:57.440,0:20:00.720
-this point
-
-0:19:58.160,0:20:01.520
-over here and the green guy over here
+this point over here and the green guy over here
 
 0:20:00.720,0:20:04.159
-it's roughly
-
-0:20:01.520,0:20:04.799
-three and a half now if you take three
+it's roughly three and a half now if you take three
 
 0:20:04.159,0:20:07.919
-and a half
-
-0:20:04.799,0:20:10.320
-and you square it you get
+and a half and you square it you get
 
 0:20:07.919,0:20:11.919
-yeah you guess it's right it's 12 right
-
-0:20:10.320,0:20:12.480
-and that's why we get this point over
+yeah you guess it's right it's 12 right and that's why we get this point over
 
 0:20:11.919,0:20:13.840
-here
-
-0:20:12.480,0:20:16.480
-you don't trust me take out the
+here you don't trust me take out the
 
 0:20:13.840,0:20:17.039
-calculator and check how much is 3.5
-
-0:20:16.480,0:20:20.240
-squared
+calculator and check how much is 3.5 squared
 
 0:20:17.039,0:20:23.280
-okay anyhow so that's why we start
-
-0:20:20.240,0:20:26.159
-at this location uh here 12 right
+okay anyhow so that's why we start at this location uh here 12 right
 
 0:20:23.280,0:20:26.480
-as we keep uh increasing z and we go
-
-0:20:26.159,0:20:29.280
-from
+as we keep uh increasing z and we go from
 
 0:20:26.480,0:20:30.880
-zero to pi half we end up at this
-
-0:20:29.280,0:20:34.080
-location over here
+zero to pi half we end up at this location over here
 
 0:20:30.880,0:20:36.000
-and then as we keep going until pi
-
-0:20:34.080,0:20:37.200
-you're gonna get and ending up in this
+and then as we keep going until pi you're gonna get and ending up in this
 
 0:20:36.000,0:20:40.480
-location over here
-
-0:20:37.200,0:20:44.159
-and as you can tell uh pi
+location over here and as you can tell uh pi
 
 0:20:40.480,0:20:46.640
-you're gonna be at one square away from
-
-0:20:44.159,0:20:47.760
-this green boy and so one square is
+you're gonna be at one square away from this green boy and so one square is
 
 0:20:46.640,0:20:52.159
-gonna be
-
-0:20:47.760,0:20:55.360
-0.5 0.5 square is 0.25
+gonna be 0.5 0.5 square is 0.25
 
 0:20:52.159,0:20:55.840
-and therefore the height of this red
-
-0:20:55.360,0:20:59.280
-curve
+and therefore the height of this red curve
 
 0:20:55.840,0:21:02.640
-at this location over here it's 0.25
-
-0:20:59.280,0:21:05.760
-very close to zero okay and then
+at this location over here it's 0.25 very close to zero okay and then
 
 0:21:02.640,0:21:06.559
-we still keep cranking up that z and we
-
-0:21:05.760,0:21:10.080
-go to
+we still keep cranking up that z and we go to
 
 0:21:06.559,0:21:12.159
-three three half pi and then you keep
-
-0:21:10.080,0:21:13.840
-going up to two pi right and two pi
+three three half pi and then you keep going up to two pi right and two pi
 
 0:21:12.159,0:21:14.880
-we're gonna be basically getting up to
-
-0:21:13.840,0:21:18.000
-the same location
+we're gonna be basically getting up to the same location
 
 0:21:14.880,0:21:19.760
-where we started okay and then if you
-
-0:21:18.000,0:21:21.280
-keep going you're gonna repeat this one
+where we started okay and then if you keep going you're gonna repeat this one
 
 0:21:19.760,0:21:23.919
-it's gonna be going up and down
-
-0:21:21.280,0:21:25.360
-up and down up and down all right all
+it's gonna be going up and down up and down up and down all right all
 
 0:21:23.919,0:21:28.159
-right so this looks pretty
-
-0:21:25.360,0:21:30.000
-okay i think no no no crazy stuff but
+right so this looks pretty okay i think no no no crazy stuff but
 
 0:21:28.159,0:21:32.000
-then we saw the other one was kind of
-
-0:21:30.000,0:21:33.679
-wiggly right what happened there so
+then we saw the other one was kind of wiggly right what happened there so
 
 0:21:32.000,0:21:36.720
-instead of using the y
-
-0:21:33.679,0:21:38.720
-23 we're going to be using now the y10
+instead of using the y 23 we're going to be using now the y10
 
 0:21:36.720,0:21:40.480
-which is this
-
-0:21:38.720,0:21:42.480
-thing right like a signature like yarn
+which is this thing right like a signature like yarn
 
 0:21:40.480,0:21:45.039
-signature all right
-
-0:21:42.480,0:21:45.600
-so what happened here so in this case
+signature all right so what happened here so in this case
 
 0:21:45.039,0:21:48.799
-our
-
-0:21:45.600,0:21:52.400
-y prime which is the peak we have
+our y prime which is the peak we have
 
 0:21:48.799,0:21:54.000
-from my possible wise is this guy over
-
-0:21:52.400,0:21:56.400
-here the top
+from my possible wise is this guy over here the top
 
 0:21:54.000,0:21:58.080
-x over here and again as i told you
-
-0:21:56.400,0:22:00.159
-before whenever z
+x over here and again as i told you before whenever z
 
 0:21:58.080,0:22:01.440
-is equal to 0 we start at this location
-
-0:22:00.159,0:22:03.120
-over here
+is equal to 0 we start at this location over here
 
 0:22:01.440,0:22:04.480
-so if you have understood what i'm
-
-0:22:03.120,0:22:06.559
-talking about and now we're going to be
+so if you have understood what i'm talking about and now we're going to be
 
 0:22:04.480,0:22:07.760
-doing an exercise such that you answer
-
-0:22:06.559,0:22:09.760
-me
+doing an exercise such that you answer me
 
 0:22:07.760,0:22:11.679
-can you tell me what is the distance
-
-0:22:09.760,0:22:15.440
-between this location over here
+can you tell me what is the distance between this location over here
 
 0:22:11.679,0:22:17.120
-and this point over here so question for
-
-0:22:15.440,0:22:21.120
-the people at home
+and this point over here so question for the people at home
 
 0:22:17.120,0:22:24.840
-can anyone tell me what is the length
-
-0:22:21.120,0:22:28.320
-of this segment i just
+can anyone tell me what is the length of this segment i just
 
 0:22:24.840,0:22:31.760
-draw and i'm
-
-0:22:28.320,0:22:35.280
-okay 1.5 times 1.4
+draw and i'm okay 1.5 times 1.4
 
 0:22:31.760,0:22:37.520
-which is square root of 2. yes so
-
-0:22:35.280,0:22:39.200
-that's correct and if you square it
+which is square root of 2. yes so that's correct and if you square it
 
 0:22:37.520,0:22:43.840
-you're gonna have what
-
-0:22:39.200,0:22:47.760
-is going to be 1.5 times 1.5 times 2
+you're gonna have what is going to be 1.5 times 1.5 times 2
 
 0:22:43.840,0:22:49.280
-right i just squared so you said 1.5
-
-0:22:47.760,0:22:50.559
-times square root of 2. i'm just
+right i just squared so you said 1.5 times square root of 2. i'm just
 
 0:22:49.280,0:22:50.880
-squaring everything so we're going to
-
-0:22:50.559,0:22:54.720
-get
+squaring everything so we're going to get
 
 0:22:50.880,0:22:55.760
-1.5 squared times 2. so 1.5 times 2 it's
-
-0:22:54.720,0:23:00.320
-3
+1.5 squared times 2. so 1.5 times 2 it's 3
 
 0:22:55.760,0:23:02.960
-and 3 times 1.4 1.5 is 4.5 right
-
-0:23:00.320,0:23:03.679
-and so we can determine that my initial
+and 3 times 1.4 1.5 is 4.5 right and so we can determine that my initial
 
 0:23:02.960,0:23:05.919
-energy
-
-0:23:03.679,0:23:06.720
-which is the square length of this
+energy which is the square length of this
 
 0:23:05.919,0:23:10.000
-segment
-
-0:23:06.720,0:23:13.679
-is going to be 4.5 which is exactly what
+segment is going to be 4.5 which is exactly what
 
 0:23:10.000,0:23:16.640
-this initial value over here is okay so
-
-0:23:13.679,0:23:16.640
-this point over here
+this initial value over here is okay so this point over here
 
 0:23:16.880,0:23:24.720
-it's 4.5 um
-
-0:23:21.360,0:23:27.039
-can you just repeat why you know that
+it's 4.5 um can you just repeat why you know that
 
 0:23:24.720,0:23:28.159
-z equals zero corresponds to the
-
-0:23:27.039,0:23:31.760
-leftmost point
+z equals zero corresponds to the leftmost point
 
 0:23:28.159,0:23:32.799
-yes so i know that uh this is because i
-
-0:23:31.760,0:23:37.520
-checked the code
+yes so i know that uh this is because i checked the code
 
 0:23:32.799,0:23:40.960
-i know that my w1 it's um
-
-0:23:37.520,0:23:44.400
-it's equal to something uh that is
+i know that my w1 it's um it's equal to something uh that is
 
 0:23:40.960,0:23:47.840
-uh minus 1.5 right
-
-0:23:44.400,0:23:51.840
-something like that minus one
+uh minus 1.5 right something like that minus one
 
 0:23:47.840,0:23:51.840
-point five
-
-0:23:52.400,0:23:57.200
-okay and then we have the w tool
+point five okay and then we have the w tool
 
 0:24:01.039,0:24:04.159
-and i'm drawing with the touchpad so
-
-0:24:03.200,0:24:07.200
-it's crazy
+and i'm drawing with the touchpad so it's crazy
 
 0:24:04.159,0:24:10.880
-uh this is 0.3
-
-0:24:07.200,0:24:10.880
-0.4 something like that
+uh this is 0.3 0.4 something like that
 
 0:24:11.200,0:24:15.600
-zero point let's say four
-
-0:24:15.919,0:24:22.240
-that looks like a one but okay
+zero point let's say four that looks like a one but okay
 
 0:24:19.520,0:24:22.720
-okay believe me it's a four okay when we
-
-0:24:22.240,0:24:26.080
-go pi
+okay believe me it's a four okay when we go pi
 
 0:24:22.720,0:24:28.720
-half we are roughly uh one unit
-
-0:24:26.080,0:24:30.000
-away from this point so one square is
+half we are roughly uh one unit away from this point so one square is
 
 0:24:28.720,0:24:32.559
-going to be roughly one
-
-0:24:30.000,0:24:33.679
-right i mean some something roughly one
+going to be roughly one right i mean some something roughly one
 
 0:24:32.559,0:24:35.600
-square is gonna still be
-
-0:24:33.679,0:24:36.960
-there so this height over here is gonna
+square is gonna still be there so this height over here is gonna
 
 0:24:35.600,0:24:39.120
-be one
-
-0:24:36.960,0:24:40.480
-and then we climb up to this location
+be one and then we climb up to this location
 
 0:24:39.120,0:24:43.600
-over here
-
-0:24:40.480,0:24:48.000
-uh in this location over here
+over here uh in this location over here
 
 0:24:43.600,0:24:51.120
-we should basically get the same
-
-0:24:48.000,0:24:52.480
-point over here so then over
+we should basically get the same point over here so then over
 
 0:24:51.120,0:24:54.400
-here we're going to get a similar value
-
-0:24:52.480,0:24:56.960
-a little bit smaller and then we
+here we're going to get a similar value a little bit smaller and then we
 
 0:24:54.400,0:24:57.600
-oh what happened here so when we go to
-
-0:24:56.960,0:25:00.799
-three
+oh what happened here so when we go to three
 
 0:24:57.600,0:25:02.720
-three half pi we actually are
-
-0:25:00.799,0:25:04.080
-at this location over here and we have
+three half pi we actually are at this location over here and we have
 
 0:25:02.720,0:25:06.000
-another minimum right
-
-0:25:04.080,0:25:07.360
-what happened here so basically you had
+another minimum right what happened here so basically you had
 
 0:25:06.000,0:25:10.559
-this point is
-
-0:25:07.360,0:25:13.360
-closer to my green guy than
+this point is closer to my green guy than
 
 0:25:10.559,0:25:14.320
-a point over here right and so in this
-
-0:25:13.360,0:25:16.720
-case
+a point over here right and so in this case
 
 0:25:14.320,0:25:17.840
-this function here this energy has a
-
-0:25:16.720,0:25:20.320
-local minima
+this function here this energy has a local minima
 
 0:25:17.840,0:25:23.520
-which is happening at three three half
-
-0:25:20.320,0:25:26.960
-pi at this location over here
+which is happening at three three half pi at this location over here
 
 0:25:23.520,0:25:28.000
-all right cool uh let's go back to the
-
-0:25:26.960,0:25:29.840
-arrow
+all right cool uh let's go back to the arrow
 
 0:25:28.000,0:25:31.440
-okay so now we determine that this
-
-0:25:29.840,0:25:34.400
-height was 4.5
+okay so now we determine that this height was 4.5
 
 0:25:31.440,0:25:36.000
-this was one and then this something we
-
-0:25:34.400,0:25:39.760
-can figure this is gonna be like two
+this was one and then this something we can figure this is gonna be like two
 
 0:25:36.000,0:25:39.760
-square this is gonna be four okay
-
-0:25:40.159,0:25:44.880
-okay so what happened now oh all this
+square this is gonna be four okay okay so what happened now oh all this
 
 0:25:43.600,0:25:48.000
-stuff is still here
-
-0:25:44.880,0:25:49.360
-okay clean all right free energy so what
+stuff is still here okay clean all right free energy so what
 
 0:25:48.000,0:25:51.840
-is this free energy
-
-0:25:49.360,0:25:53.679
-so we're gonna figure that out right now
+is this free energy so we're gonna figure that out right now
 
 0:25:51.840,0:25:56.320
-so the free energy
-
-0:25:53.679,0:25:58.159
-um actually this is the zero temperature
+so the free energy um actually this is the zero temperature
 
 0:25:56.320,0:26:00.400
-limit of the free energy
-
-0:25:58.159,0:26:01.279
-it's going to be simply the mean minimum
+limit of the free energy it's going to be simply the mean minimum
 
 0:26:00.400,0:26:04.480
-value of this
-
-0:26:01.279,0:26:07.840
-e function with respect to z
+value of this e function with respect to z
 
 0:26:04.480,0:26:11.200
-so we can compute this
-
-0:26:07.840,0:26:13.440
-z check which is gonna be equal
+so we can compute this z check which is gonna be equal
 
 0:26:11.200,0:26:15.520
-we can define it as being the arc mean
-
-0:26:13.440,0:26:18.559
-of this energy function
+we can define it as being the arc mean of this energy function
 
 0:26:15.520,0:26:21.039
-uh with respect to z why the check
-
-0:26:18.559,0:26:23.600
-well because the check is pointing
+uh with respect to z why the check well because the check is pointing
 
 0:26:21.039,0:26:26.320
-downwards right so whenever i minimize
-
-0:26:23.600,0:26:27.520
-my energy i found the location that is
+downwards right so whenever i minimize my energy i found the location that is
 
 0:26:26.320,0:26:29.279
-the lowest one
-
-0:26:27.520,0:26:30.559
-and theref that's why i'm gonna put the
+the lowest one and theref that's why i'm gonna put the
 
 0:26:29.279,0:26:33.840
-check means means
-
-0:26:30.559,0:26:36.880
-that z is the location where the uh
+check means means that z is the location where the uh
 
 0:26:33.840,0:26:40.880
-the energy is the lowest okay
-
-0:26:36.880,0:26:41.279
-um and how do can we find that set right
+the energy is the lowest okay um and how do can we find that set right
 
 0:26:40.880,0:26:44.720
-so
-
-0:26:41.279,0:26:47.360
-if um if z it's basically
+so if um if z it's basically
 
 0:26:44.720,0:26:48.320
-uh discrete like let's say we have like
-
-0:26:47.360,0:26:50.240
-k means
+uh discrete like let's say we have like k means
 
 0:26:48.320,0:26:51.520
-uh we have we can do exhaustive search
-
-0:26:50.240,0:26:55.120
-we can check every z
+uh we have we can do exhaustive search we can check every z
 
 0:26:51.520,0:26:57.120
-we have otherwise we can use
-
-0:26:55.120,0:26:59.039
-techniques like gradient based
+we have otherwise we can use techniques like gradient based
 
 0:26:57.120,0:27:02.080
-techniques such as
-
-0:26:59.039,0:27:03.919
-gradient descent and keep
+techniques such as gradient descent and keep
 
 0:27:02.080,0:27:05.760
-like pay attention i didn't say
-
-0:27:03.919,0:27:07.919
-stochastic gradient descent
+like pay attention i didn't say stochastic gradient descent
 
 0:27:05.760,0:27:09.840
-because here we are not doing any
-
-0:27:07.919,0:27:13.440
-stochastic something right
+because here we are not doing any stochastic something right
 
 0:27:09.840,0:27:15.760
-uh e is a function of what we know
-
-0:27:13.440,0:27:17.679
-everything uh when we do stochastic a in
+uh e is a function of what we know everything uh when we do stochastic a in
 
 0:27:15.760,0:27:19.760
-the same we are minimizing that
-
-0:27:17.679,0:27:21.200
-loss function which is expressed as an
+the same we are minimizing that loss function which is expressed as an
 
 0:27:19.760,0:27:24.000
-average of those pair
-
-0:27:21.200,0:27:25.120
-sample loss functions right here instead
+average of those pair sample loss functions right here instead
 
 0:27:24.000,0:27:28.320
-we are minimizing
-
-0:27:25.120,0:27:30.159
-this specific value of e there is no
+we are minimizing this specific value of e there is no
 
 0:27:28.320,0:27:32.000
-average so it's not stochastic
-
-0:27:30.159,0:27:33.919
-and therefore you're going to be using
+average so it's not stochastic and therefore you're going to be using
 
 0:27:32.000,0:27:35.279
-you can use algorithms such that
-
-0:27:33.919,0:27:39.279
-conjugate gradient
+you can use algorithms such that conjugate gradient
 
 0:27:35.279,0:27:42.480
-line search lbf gs and so on okay
-
-0:27:39.279,0:27:43.279
-so let's look at and let's figure out
+line search lbf gs and so on okay so let's look at and let's figure out
 
 0:27:42.480,0:27:45.919
-what is this
-
-0:27:43.279,0:27:47.520
-free energy right how it works so given
+what is this free energy right how it works so given
 
 0:27:45.919,0:27:51.440
-that we have defined this
-
-0:27:47.520,0:27:53.200
-uh z check this uh free energy
+that we have defined this uh z check this uh free energy
 
 0:27:51.440,0:27:54.960
-the the zero limit for the free energy
-
-0:27:53.200,0:27:57.679
-is going to simply be this
+the the zero limit for the free energy is going to simply be this
 
 0:27:54.960,0:27:58.000
-energy e computed in the location of my
-
-0:27:57.679,0:28:02.159
-z
+energy e computed in the location of my z
 
 0:27:58.000,0:28:05.279
-check so let's visualize here
-
-0:28:02.159,0:28:07.600
-this uh e uh so this
+check so let's visualize here this uh e uh so this
 
 0:28:05.279,0:28:09.760
-e10 all the energy for the sample when i
-
-0:28:07.600,0:28:12.799
-pick the sample pen
+e10 all the energy for the sample when i pick the sample pen
 
 0:28:09.760,0:28:17.120
-i initialize my latent variable z
-
-0:28:12.799,0:28:19.360
-tilde the orange one with some volume
+i initialize my latent variable z tilde the orange one with some volume
 
 0:28:17.120,0:28:20.640
-and then i'm gonna be running a gradient
-
-0:28:19.360,0:28:23.440
-base method for
+and then i'm gonna be running a gradient base method for
 
 0:28:20.640,0:28:25.360
-minimization therefore i end up in the
-
-0:28:23.440,0:28:27.600
-blue location which is my z
+minimization therefore i end up in the blue location which is my z
 
 0:28:25.360,0:28:28.720
-check and it's blue because it's cold so
-
-0:28:27.600,0:28:30.640
-it's like low
+check and it's blue because it's cold so it's like low
 
 0:28:28.720,0:28:32.240
-i usually think about this energy as
-
-0:28:30.640,0:28:35.200
-being like a temperature right
+i usually think about this energy as being like a temperature right
 
 0:28:32.240,0:28:37.360
-i mean if you multiply by the boltzmann
-
-0:28:35.200,0:28:40.000
-boltzmann constant no k
+i mean if you multiply by the boltzmann boltzmann constant no k
 
 0:28:37.360,0:28:40.880
-kt uh you're gonna get like the some
-
-0:28:40.000,0:28:43.279
-energy right
+kt uh you're gonna get like the some energy right
 
 0:28:40.880,0:28:45.440
-so energy and temperature are very uh
-
-0:28:43.279,0:28:48.640
-very closely related
+so energy and temperature are very uh very closely related
 
 0:28:45.440,0:28:49.840
-um and so again i use the the blue to
-
-0:28:48.640,0:28:53.279
-show you that is low
+um and so again i use the the blue to show you that is low
 
 0:28:49.840,0:28:55.600
-and cold and so at that location that z
-
-0:28:53.279,0:28:56.480
-check the yeah at that location we
+and cold and so at that location that z check the yeah at that location we
 
 0:28:55.600,0:28:59.520
-reached the minimum
-
-0:28:56.480,0:29:02.640
-of this energy and that is my uh
+reached the minimum of this energy and that is my uh
 
 0:28:59.520,0:29:05.039
-free energy the zero limit for the free
-
-0:29:02.640,0:29:05.039
-energy
+free energy the zero limit for the free energy
 
 0:29:05.200,0:29:10.960
-cool cool cool so so in practice
-
-0:29:08.640,0:29:12.000
-this could depend on the initialization
+cool cool cool so so in practice this could depend on the initialization
 
 0:29:10.960,0:29:15.679
-then
-
-0:29:12.000,0:29:18.960
-oh yeah oh yeah so well
+then oh yeah oh yeah so well
 
 0:29:15.679,0:29:20.240
-yeah the initialization uh so your
-
-0:29:18.960,0:29:23.039
-algorithm
+yeah the initialization uh so your algorithm
 
 0:29:20.240,0:29:25.440
-will screw up depending on the
-
-0:29:23.039,0:29:29.360
-initialization for sure
+will screw up depending on the initialization for sure
 
 0:29:25.440,0:29:31.120
-so i can show you later on that lbfgs
-
-0:29:29.360,0:29:34.240
-actually gets the wrong
+so i can show you later on that lbfgs actually gets the wrong
 
 0:29:31.120,0:29:36.320
-minimum but nevertheless the free energy
-
-0:29:34.240,0:29:39.440
-is the global minimum right
+minimum but nevertheless the free energy is the global minimum right
 
 0:29:36.320,0:29:42.000
-so i'm telling you here that
-
-0:29:39.440,0:29:43.039
-the value the minimum value of e is the
+so i'm telling you here that the value the minimum value of e is the
 
 0:29:42.000,0:29:44.880
-free energy
-
-0:29:43.039,0:29:46.159
-if we don't get there because we don't
+free energy if we don't get there because we don't
 
 0:29:44.880,0:29:48.720
-know how to get there and then
-
-0:29:46.159,0:29:50.640
-it's a different issue right so it's not
+know how to get there and then it's a different issue right so it's not
 
 0:29:48.720,0:29:54.399
-dependent on the initialization
-
-0:29:50.640,0:29:57.200
-uh the initialization will make
+dependent on the initialization uh the initialization will make
 
 0:29:54.399,0:29:58.480
-your algorithm more or less likely to
-
-0:29:57.200,0:30:01.600
-converge to the actual
+your algorithm more or less likely to converge to the actual
 
 0:29:58.480,0:30:04.880
-correct solution all right
-
-0:30:01.600,0:30:08.080
-so what happens here
+correct solution all right so what happens here
 
 0:30:04.880,0:30:10.480
-so in this case here we have uh the blue
-
-0:30:08.080,0:30:11.760
-points are my points from the training
+so in this case here we have uh the blue points are my points from the training
 
 0:30:10.480,0:30:15.200
-distribution
-
-0:30:11.760,0:30:18.880
-the tilde one are you know same poles
+distribution the tilde one are you know same poles
 
 0:30:15.200,0:30:22.559
-from my distrib from my model
-
-0:30:18.880,0:30:25.200
-then my y prime is the peak i have
+from my distrib from my model then my y prime is the peak i have
 
 0:30:22.559,0:30:26.799
-chosen right so this is my 10th item in
-
-0:30:25.200,0:30:31.039
-the training set
+chosen right so this is my 10th item in the training set
 
 0:30:26.799,0:30:33.760
-then my z-tilde which is the initialized
-
-0:30:31.039,0:30:34.799
-the value i initialize z with if i send
+then my z-tilde which is the initialized the value i initialize z with if i send
 
 0:30:33.760,0:30:36.559
-it through the
-
-0:30:34.799,0:30:38.320
-decoder i showed you before it's going
+it through the decoder i showed you before it's going
 
 0:30:36.559,0:30:41.760
-to generate this point
-
-0:30:38.320,0:30:44.720
-here this this location over here
+to generate this point here this this location over here
 
 0:30:41.760,0:30:46.000
-then i can run some some some
-
-0:30:44.720,0:30:48.799
-minimization algorithm
+then i can run some some some minimization algorithm
 
 0:30:46.000,0:30:50.960
-and then you end up in that location the
-
-0:30:48.799,0:30:53.360
-blue location and the blue x
+and then you end up in that location the blue location and the blue x
 
 0:30:50.960,0:30:54.399
-is going to be the you know decoded
-
-0:30:53.360,0:30:58.080
-version of the z
+is going to be the you know decoded version of the z
 
 0:30:54.399,0:31:01.440
-check which is the closest item to this
-
-0:30:58.080,0:31:03.519
-green boy over here so
+check which is the closest item to this green boy over here so
 
 0:31:01.440,0:31:05.120
-why are we doing this stuff here how can
-
-0:31:03.519,0:31:07.679
-we use this model what we use
+why are we doing this stuff here how can we use this model what we use
 
 0:31:05.120,0:31:08.960
-can you what can we use this model for
-
-0:31:07.679,0:31:11.519
-so we can think about
+can you what can we use this model for so we can think about
 
 0:31:08.960,0:31:13.200
-you know if we have someone has trained
-
-0:31:11.519,0:31:17.360
-this model and has given
+you know if we have someone has trained this model and has given
 
 0:31:13.200,0:31:20.960
-that to us we can potentially
-
-0:31:17.360,0:31:24.799
-find what is the closest value
+that to us we can potentially find what is the closest value
 
 0:31:20.960,0:31:26.559
-in our possible you know
-
-0:31:24.799,0:31:28.640
-set of all possible values we can
+in our possible you know set of all possible values we can
 
 0:31:26.559,0:31:32.000
-generate which is the closest
-
-0:31:28.640,0:31:34.240
-to your sample but so we can use this
+generate which is the closest to your sample but so we can use this
 
 0:31:32.000,0:31:36.799
-for performing denoising for example so
-
-0:31:34.240,0:31:38.480
-if i have an image which is corrupted
+for performing denoising for example so if i have an image which is corrupted
 
 0:31:36.799,0:31:41.360
-which is going to be there for far from
-
-0:31:38.480,0:31:44.080
-my manifold the model manifold
+which is going to be there for far from my manifold the model manifold
 
 0:31:41.360,0:31:45.120
-then i can ask my model hey model can
-
-0:31:44.080,0:31:48.399
-you tell me what
+then i can ask my model hey model can you tell me what
 
 0:31:45.120,0:31:51.919
-is the latent which is gonna give you
-
-0:31:48.399,0:31:54.640
-the uh decoded version the decoded
+is the latent which is gonna give you the uh decoded version the decoded
 
 0:31:51.919,0:31:56.320
-uh item here which is the closest as
-
-0:31:54.640,0:31:58.960
-possible to the
+uh item here which is the closest as possible to the
 
 0:31:56.320,0:32:00.480
-uh image i'm looking at and then
-
-0:31:58.960,0:32:03.279
-potentially we could just
+uh image i'm looking at and then potentially we could just
 
 0:32:00.480,0:32:04.559
-pick this value over here as you know uh
-
-0:32:03.279,0:32:08.080
-cleaned up version
+pick this value over here as you know uh cleaned up version
 
 0:32:04.559,0:32:10.799
-of my uh corrupted input
-
-0:32:08.080,0:32:12.159
-what is the energy uh the free energy
+of my uh corrupted input what is the energy uh the free energy
 
 0:32:10.799,0:32:14.559
-the free energy now
-
-0:32:12.159,0:32:15.600
-it's simply the square distance between
+the free energy now it's simply the square distance between
 
 0:32:14.559,0:32:18.399
-the green point
-
-0:32:15.600,0:32:20.240
-and the blue x right so if you take this
+the green point and the blue x right so if you take this
 
 0:32:18.399,0:32:21.200
-location these two boxes which is
-
-0:32:20.240,0:32:24.000
-basically one
+location these two boxes which is basically one
 
 0:32:21.200,0:32:24.559
-one square which is rough again one is
-
-0:32:24.000,0:32:27.600
-going to be
+one square which is rough again one is going to be
 
 0:32:24.559,0:32:31.279
-the free energy corresponding to this x
-
-0:32:27.600,0:32:33.919
-over here okay so every x every location
+the free energy corresponding to this x over here okay so every x every location
 
 0:32:31.279,0:32:35.279
-here in the training manifold will have
-
-0:32:33.919,0:32:38.720
-a
+here in the training manifold will have a
 
 0:32:35.279,0:32:40.240
-free energy which is determining what is
-
-0:32:38.720,0:32:43.279
-the
+free energy which is determining what is the
 
 0:32:40.240,0:32:44.559
-distance to the the what is the closest
-
-0:32:43.279,0:32:47.600
-distance to the manifold
+distance to the the what is the closest distance to the manifold
 
 0:32:44.559,0:32:50.799
-okay so you can see in this case
-
-0:32:47.600,0:32:51.519
-that let's say uh our model is well
+okay so you can see in this case that let's say uh our model is well
 
 0:32:50.799,0:32:54.159
-trained
-
-0:32:51.519,0:32:54.960
-we can tell that this location over here
+trained we can tell that this location over here
 
 0:32:54.159,0:32:58.000
-has much
-
-0:32:54.960,0:32:58.880
-a much lower free energy than a point
+has much a much lower free energy than a point
 
 0:32:58.000,0:33:01.600
-over here
-
-0:32:58.880,0:33:02.480
-and so these points could be more likely
+over here and so these points could be more likely
 
 0:33:01.600,0:33:05.600
-coming from
-
-0:33:02.480,0:33:08.000
-these you know uh could be compatible
+coming from these you know uh could be compatible
 
 0:33:05.600,0:33:11.200
-with what this model has been trained on
-
-0:33:08.000,0:33:12.159
-or like we show in this case this model
+with what this model has been trained on or like we show in this case this model
 
 0:33:11.200,0:33:14.880
-is definitely
-
-0:33:12.159,0:33:16.159
-not well trained uh so what do you mean
+is definitely not well trained uh so what do you mean
 
 0:33:14.880,0:33:18.799
-by well trained
-
-0:33:16.159,0:33:19.840
-uh in this case here just for you know
+by well trained uh in this case here just for you know
 
 0:33:18.799,0:33:23.440
-pedagogical
-
-0:33:19.840,0:33:24.000
-uh sake i didn't train fully in this
+pedagogical uh sake i didn't train fully in this
 
 0:33:23.440,0:33:27.440
-model
-
-0:33:24.000,0:33:30.559
-such that there are errors in ideally
+model such that there are errors in ideally
 
 0:33:27.440,0:33:33.120
-those purple points should be exactly
-
-0:33:30.559,0:33:34.080
-matching those blue points and that
+those purple points should be exactly matching those blue points and that
 
 0:33:33.120,0:33:36.559
-would be you know
-
-0:33:34.080,0:33:37.840
-a well-trained model which is capturing
+would be you know a well-trained model which is capturing
 
 0:33:36.559,0:33:40.480
-all the dependencies
-
-0:33:37.840,0:33:42.000
-between those y variables and this is
+all the dependencies between those y variables and this is
 
 0:33:40.480,0:33:44.799
-again one cross section
-
-0:33:42.000,0:33:45.360
-right of that horn this is a not well
+again one cross section right of that horn this is a not well
 
 0:33:44.799,0:33:47.679
-trained
-
-0:33:45.360,0:33:48.480
-model which means i stopped training
+trained model which means i stopped training
 
 0:33:47.679,0:33:51.120
-after
-
-0:33:48.480,0:33:53.039
-a few epochs and therefore the model
+after a few epochs and therefore the model
 
 0:33:51.120,0:33:54.799
-tried to get there but it didn't quite
-
-0:33:53.039,0:33:58.399
-manage to get yet there
+tried to get there but it didn't quite manage to get yet there
 
 0:33:54.799,0:34:00.480
-uh we can think about that
-
-0:33:58.399,0:34:02.799
-or we can think about this is a
+uh we can think about that or we can think about this is a
 
 0:34:00.480,0:34:05.039
-well-trained model so you actually learn
-
-0:34:02.799,0:34:06.159
-properly and then these points here are
+well-trained model so you actually learn properly and then these points here are
 
 0:34:05.039,0:34:08.879
-much further away
-
-0:34:06.159,0:34:10.720
-so by computing the free energy of these
+much further away so by computing the free energy of these
 
 0:34:08.879,0:34:14.000
-points you can have like a
-
-0:34:10.720,0:34:17.359
-measure of how far they are from the
+points you can have like a measure of how far they are from the
 
 0:34:14.000,0:34:19.359
-learned distribution okay
-
-0:34:17.359,0:34:20.800
-all right so let's move on and let's
+learned distribution okay all right so let's move on and let's
 
 0:34:19.359,0:34:24.320
-look now instead
-
-0:34:20.800,0:34:27.040
-of the the 23rd right and the 23rd u
+look now instead of the the 23rd right and the 23rd u
 
 0:34:24.320,0:34:28.639
-shape and so in this case instead oh
-
-0:34:27.040,0:34:30.960
-it's much easier we just have a global
+shape and so in this case instead oh it's much easier we just have a global
 
 0:34:28.639,0:34:34.079
-minimum and a global maximum
-
-0:34:30.960,0:34:36.159
-so there is a question if this were
+minimum and a global maximum so there is a question if this were
 
 0:34:34.079,0:34:37.200
-for denoising and the model was trained
-
-0:34:36.159,0:34:40.079
-to the point
+for denoising and the model was trained to the point
 
 0:34:37.200,0:34:41.280
-where the t-points were on top of the
-
-0:34:40.079,0:34:46.879
-till points
+where the t-points were on top of the till points
 
 0:34:41.280,0:34:46.879
-wouldn't it be not do any denoising
-
-0:34:47.280,0:34:51.599
-so i believe that you're saying if the
+wouldn't it be not do any denoising so i believe that you're saying if the
 
 0:34:48.960,0:34:54.560
-till points are far away from the
-
-0:34:51.599,0:34:56.720
-uh from the from the purple one right so
+till points are far away from the uh from the from the purple one right so
 
 0:34:54.560,0:34:59.280
-that would mean that the model would
-
-0:34:56.720,0:35:01.119
-would be not well trained right so if we
+that would mean that the model would would be not well trained right so if we
 
 0:34:59.280,0:35:03.119
-yeah if these blue points were like up
-
-0:35:01.119,0:35:03.760
-here and all of them would have been you
+yeah if these blue points were like up here and all of them would have been you
 
 0:35:03.119,0:35:07.520
-know
-
-0:35:03.760,0:35:10.160
-uh closer to some point over over here
+know uh closer to some point over over here
 
 0:35:07.520,0:35:11.599
-that means that this model is bad is
-
-0:35:10.160,0:35:13.520
-badly trained right
+that means that this model is bad is badly trained right
 
 0:35:11.599,0:35:15.839
-so again today we don't talk about
-
-0:35:13.520,0:35:16.800
-training so this is simply what has been
+so again today we don't talk about training so this is simply what has been
 
 0:35:15.839,0:35:19.040
-given to us
-
-0:35:16.800,0:35:20.560
-and we just play uh with what we have
+given to us and we just play uh with what we have
 
 0:35:19.040,0:35:23.359
-and try to figure out
-
-0:35:20.560,0:35:23.920
-what this energy and what is free energy
+and try to figure out what this energy and what is free energy
 
 0:35:23.359,0:35:26.720
-mean
-
-0:35:23.920,0:35:27.839
-okay so this is how to use this stuff
+mean okay so this is how to use this stuff
 
 0:35:26.720,0:35:31.520
-rather than
-
-0:35:27.839,0:35:33.520
-to learn this stuff learning next time
+rather than to learn this stuff learning next time
 
 0:35:31.520,0:35:36.160
-it's enough to understand how to use
-
-0:35:33.520,0:35:38.800
-this trust me
+it's enough to understand how to use this trust me
 
 0:35:36.160,0:35:40.000
-all right so let's figure out what's
-
-0:35:38.800,0:35:42.720
-going on with this u
+all right so let's figure out what's going on with this u
 
 0:35:40.000,0:35:43.119
-shape so the u shape instead comes from
-
-0:35:42.720,0:35:46.240
-this
+shape so the u shape instead comes from this
 
 0:35:43.119,0:35:49.680
-kind of example here so again here
-
-0:35:46.240,0:35:54.079
-we initialize to the location in
+kind of example here so again here we initialize to the location in
 
 0:35:49.680,0:35:55.680
-orange and then by running some you know
-
-0:35:54.079,0:35:57.280
-gradient based method or whatever
+orange and then by running some you know gradient based method or whatever
 
 0:35:55.680,0:36:00.640
-minimization process
-
-0:35:57.280,0:36:02.800
-we find these blue x which is my z
+minimization process we find these blue x which is my z
 
 0:36:00.640,0:36:04.079
-check so we go from the z tilde which is
-
-0:36:02.800,0:36:07.040
-the initialized
+check so we go from the z tilde which is the initialized
 
 0:36:04.079,0:36:08.400
-uh value for my latent to this z check
-
-0:36:07.040,0:36:11.599
-which is the
+uh value for my latent to this z check which is the
 
 0:36:08.400,0:36:15.040
-value at which i find the minimum
-
-0:36:11.599,0:36:16.640
-for my uh energy
+value at which i find the minimum for my uh energy
 
 0:36:15.040,0:36:19.040
-since this is periodic i'm going to show
-
-0:36:16.640,0:36:22.720
-you just on the next repetition so i
+since this is periodic i'm going to show you just on the next repetition so i
 
 0:36:19.040,0:36:25.119
-don't clutter too much the chart
-
-0:36:22.720,0:36:26.160
-and this came from this configuration
+don't clutter too much the chart and this came from this configuration
 
 0:36:25.119,0:36:28.720
-over here
-
-0:36:26.160,0:36:30.079
-we start with these training points uh
+over here we start with these training points uh
 
 0:36:28.720,0:36:32.480
-these are points from
-
-0:36:30.079,0:36:33.680
-you know i just sampled them from my
+these are points from you know i just sampled them from my
 
 0:36:32.480,0:36:36.720
-mother
-
-0:36:33.680,0:36:40.079
-my peak was this green over here
+mother my peak was this green over here
 
 0:36:36.720,0:36:41.200
-and in this case perhaps i can tell that
-
-0:36:40.079,0:36:43.680
-the model
+and in this case perhaps i can tell that the model
 
 0:36:41.200,0:36:44.560
-we initialized the the latent with the
-
-0:36:43.680,0:36:46.240
-orange
+we initialized the the latent with the orange
 
 0:36:44.560,0:36:48.480
-and then he actually went a little bit
-
-0:36:46.240,0:36:50.800
-too much i think he didn't choose
+and then he actually went a little bit too much i think he didn't choose
 
 0:36:48.480,0:36:54.160
-the exact best right so this is like a
-
-0:36:50.800,0:36:54.720
-bit it overshoot a little bit i think uh
+the exact best right so this is like a bit it overshoot a little bit i think uh
 
 0:36:54.160,0:36:57.280
-anyhow
-
-0:36:54.720,0:36:58.079
-this over free energy of this location
+anyhow this over free energy of this location
 
 0:36:57.280,0:37:03.359
-here
-
-0:36:58.079,0:37:03.359
-is going to be 0.25 right 0.5 square
+here is going to be 0.25 right 0.5 square
 
 0:37:03.520,0:37:07.520
-cool cool cool so what's left to show
-
-0:37:07.040,0:37:09.680
-you
+cool cool cool so what's left to show you
 
 0:37:07.520,0:37:11.040
-well just a few more things but we are
-
-0:37:09.680,0:37:12.880
-almost finished
+well just a few more things but we are almost finished
 
 0:37:11.040,0:37:14.160
-and then i'm looking for all your
-
-0:37:12.880,0:37:17.920
-questions because i
+and then i'm looking for all your questions because i
 
 0:37:14.160,0:37:23.839
-i i really i really know you have
-
-0:37:17.920,0:37:23.839
-questions i i have questions
+i i really i really know you have questions i i have questions
 
 0:37:28.400,0:37:33.280
-so let's in this case compute the free
-
-0:37:32.079,0:37:35.920
-energy
+so let's in this case compute the free energy
 
 0:37:33.280,0:37:36.640
-for every location i show you in this
-
-0:37:35.920,0:37:39.200
-grid
+for every location i show you in this grid
 
 0:37:36.640,0:37:41.040
-what does computing the free energy for
-
-0:37:39.200,0:37:44.160
-every location means
+what does computing the free energy for every location means
 
 0:37:41.040,0:37:46.880
-so just for sake of you know
-
-0:37:44.160,0:37:48.320
-clarity i'm gonna just repeat myself uh
+so just for sake of you know clarity i'm gonna just repeat myself uh
 
 0:37:46.880,0:37:51.440
-because i like to listen
-
-0:37:48.320,0:37:54.560
-or to talk right so i i like to talk so
+because i like to listen or to talk right so i i like to talk so
 
 0:37:51.440,0:37:57.599
-uh so let's let's select in
-
-0:37:54.560,0:38:01.839
-in green here let's say i'm picking this
+uh so let's let's select in in green here let's say i'm picking this
 
 0:37:57.599,0:38:04.079
-sample over here as my first location
-
-0:38:01.839,0:38:05.599
-so given that location there i'm gonna
+sample over here as my first location so given that location there i'm gonna
 
 0:38:04.079,0:38:07.680
-be picking a
-
-0:38:05.599,0:38:10.000
-orange do we have orange there is no
+be picking a orange do we have orange there is no
 
 0:38:07.680,0:38:13.920
-orange okay i have to pick red
-
-0:38:10.000,0:38:17.520
-sorry so let's say i initialize
+orange okay i have to pick red sorry so let's say i initialize
 
 0:38:13.920,0:38:20.880
-my my latent variable such that
-
-0:38:17.520,0:38:24.079
-the g the decoded version of the
+my my latent variable such that the g the decoded version of the
 
 0:38:20.880,0:38:24.400
-z tilde is this point over here then we
-
-0:38:24.079,0:38:28.160
-run
+z tilde is this point over here then we run
 
 0:38:24.400,0:38:30.320
-our minimization process to perform
-
-0:38:28.160,0:38:32.800
-inference right to find out what is that
+our minimization process to perform inference right to find out what is that
 
 0:38:30.320,0:38:35.119
-check so whenever we
-
-0:38:32.800,0:38:36.079
-find z check that process is called
+check so whenever we find z check that process is called
 
 0:38:35.119,0:38:40.000
-inference
-
-0:38:36.079,0:38:43.119
-given an energy given a sample
+inference given an energy given a sample
 
 0:38:40.000,0:38:44.320
-y not given a location y i do inference
-
-0:38:43.119,0:38:47.839
-to figure out
+y not given a location y i do inference to figure out
 
 0:38:44.320,0:38:49.920
-what was the most likely latent variable
-
-0:38:47.839,0:38:51.440
-missing variable that generated that
+what was the most likely latent variable missing variable that generated that
 
 0:38:49.920,0:38:54.079
-point over there
-
-0:38:51.440,0:38:55.359
-so inference again means we are doing
+point over there so inference again means we are doing
 
 0:38:54.079,0:38:58.960
-minimization
-
-0:38:55.359,0:39:00.800
-and therefore we are moving around our
+minimization and therefore we are moving around our
 
 0:38:58.960,0:39:03.440
-model manifold until i get to this
-
-0:39:00.800,0:39:05.280
-location over here
+model manifold until i get to this location over here
 
 0:39:03.440,0:39:07.119
-what is this location over there this
-
-0:39:05.280,0:39:08.480
-location is the location that is the
+what is this location over there this location is the location that is the
 
 0:39:07.119,0:39:11.680
-closest to my
-
-0:39:08.480,0:39:14.880
-uh sample y that i picked
+closest to my uh sample y that i picked
 
 0:39:11.680,0:39:16.720
-therefore what is my free energy so my
-
-0:39:14.880,0:39:17.359
-free energy is going to be simply the
+therefore what is my free energy so my free energy is going to be simply the
 
 0:39:16.720,0:39:21.200
-square
-
-0:39:17.359,0:39:23.920
-distance from this green guy
+square distance from this green guy
 
 0:39:21.200,0:39:24.560
-and the red one right so this segment
-
-0:39:23.920,0:39:26.640
-over here
+and the red one right so this segment over here
 
 0:39:24.560,0:39:28.480
-squared is going to be the free energy
-
-0:39:26.640,0:39:31.839
-of this point over here
+squared is going to be the free energy of this point over here
 
 0:39:28.480,0:39:34.640
-so question from for you how do
-
-0:39:31.839,0:39:36.079
-the free energy of the point on top left
+so question from for you how do the free energy of the point on top left
 
 0:39:34.640,0:39:38.210
-compares
-
-0:39:36.079,0:39:41.269
-with the energy of
+compares with the energy of
 
 0:39:38.210,0:39:41.269
-[Music]
-
-0:39:41.520,0:39:49.440
-the point i circle in yellow over here
+[Music] the point i circle in yellow over here
 
 0:39:46.800,0:39:50.480
-which one is larger which one is smaller
-
-0:39:49.440,0:39:53.839
-and
+which one is larger which one is smaller and
 
 0:39:50.480,0:39:54.720
-where where is my z check for the second
-
-0:39:53.839,0:39:58.079
-example
+where where is my z check for the second example
 
 0:39:54.720,0:39:59.839
-green is larger yes green is larger
-
-0:39:58.079,0:40:03.920
-because this distance here
+green is larger yes green is larger because this distance here
 
 0:39:59.839,0:40:07.200
-square it's gonna be much larger than
-
-0:40:03.920,0:40:10.000
-the which distance so similarly
+square it's gonna be much larger than the which distance so similarly
 
 0:40:07.200,0:40:10.960
-if we initialize you know with luck and
-
-0:40:10.000,0:40:13.440
-we run great in
+if we initialize you know with luck and we run great in
 
 0:40:10.960,0:40:15.520
-the descent like gradient based methods
-
-0:40:13.440,0:40:17.040
-we may end up in a location that is over
+the descent like gradient based methods we may end up in a location that is over
 
 0:40:15.520,0:40:19.520
-here
-
-0:40:17.040,0:40:20.960
-and therefore the free energy is going
+here and therefore the free energy is going
 
 0:40:19.520,0:40:21.359
-to be the square distance between that
-
-0:40:20.960,0:40:24.000
-point
+to be the square distance between that point
 
 0:40:21.359,0:40:25.599
-and that point here so definitely this
-
-0:40:24.000,0:40:27.920
-point would be much larger
+and that point here so definitely this point would be much larger
 
 0:40:25.599,0:40:29.040
-uh the energy free energy with respect
-
-0:40:27.920,0:40:31.040
-to this point
+uh the energy free energy with respect to this point
 
 0:40:29.040,0:40:32.880
-so some other question someone can make
-
-0:40:31.040,0:40:36.640
-is gonna be uh
+so some other question someone can make is gonna be uh
 
 0:40:32.880,0:40:38.960
-how far is the green point from my
-
-0:40:36.640,0:40:40.079
-distribution right how how far is the
+how far is the green point from my distribution right how how far is the
 
 0:40:38.960,0:40:42.000
-green point from my
-
-0:40:40.079,0:40:44.079
-learned distribution and the learn
+green point from my learned distribution and the learn
 
 0:40:42.000,0:40:46.720
-distribution here is represented by the
-
-0:40:44.079,0:40:48.400
-those blue points right so you can tell
+distribution here is represented by the those blue points right so you can tell
 
 0:40:46.720,0:40:50.400
-that that point in the top left
-
-0:40:48.400,0:40:52.000
-it's it's going to have a higher energy
+that that point in the top left it's it's going to have a higher energy
 
 0:40:50.400,0:40:55.200
-so it's further away it's
-
-0:40:52.000,0:40:58.319
-less compatible with you know
+so it's further away it's less compatible with you know
 
 0:40:55.200,0:41:00.480
-uh with respect to the other guy right
-
-0:40:58.319,0:41:01.839
-all right so we are almost almost done
+uh with respect to the other guy right all right so we are almost almost done
 
 0:41:00.480,0:41:06.160
-here
-
-0:41:01.839,0:41:09.040
-so let's to make like some exercise
+here so let's to make like some exercise
 
 0:41:06.160,0:41:10.400
-pay attention to those five values right
-
-0:41:09.040,0:41:13.200
-so i'm picking that
+pay attention to those five values right so i'm picking that
 
 0:41:10.400,0:41:14.319
-uh row there just below the x axis and
-
-0:41:13.200,0:41:17.359
-i'm picking the first
+uh row there just below the x axis and i'm picking the first
 
 0:41:14.319,0:41:19.680
-and then the fourth uh and so on right
-
-0:41:17.359,0:41:20.640
-example and so i'm going to be plotting
+and then the fourth uh and so on right example and so i'm going to be plotting
 
 0:41:19.680,0:41:23.520
-now
-
-0:41:20.640,0:41:23.839
-these energy functions they look pretty
+now these energy functions they look pretty
 
 0:41:23.520,0:41:26.960
-much
-
-0:41:23.839,0:41:30.240
-like this so for the
+much like this so for the
 
 0:41:26.960,0:41:33.520
-blue one as you can expect we extend
-
-0:41:30.240,0:41:36.880
-up to 20 and then we go
+blue one as you can expect we extend up to 20 and then we go
 
 0:41:33.520,0:41:38.720
-down to 2.5 roughly 20 is going to be in
-
-0:41:36.880,0:41:40.640
-this location like the distance between
+down to 2.5 roughly 20 is going to be in this location like the distance between
 
 0:41:38.720,0:41:40.960
-this point and this further point away
-
-0:41:40.640,0:41:44.000
-here
+this point and this further point away here
 
 0:41:40.960,0:41:44.800
-squared and then instead 2.5 square is
-
-0:41:44.000,0:41:47.760
-going to be
+squared and then instead 2.5 square is going to be
 
 0:41:44.800,0:41:49.599
-you know this distance here square
-
-0:41:47.760,0:41:51.440
-similarly you're gonna have you know an
+you know this distance here square similarly you're gonna have you know an
 
 0:41:49.599,0:41:55.520
-energy function for the red one
-
-0:41:51.440,0:41:57.760
-for the purple green and orange
+energy function for the red one for the purple green and orange
 
 0:41:55.520,0:41:59.599
-then given that i compute all these
-
-0:41:57.760,0:42:01.599
-values for the energy
+then given that i compute all these values for the energy
 
 0:41:59.599,0:42:03.760
-i can now compute what is the free
-
-0:42:01.599,0:42:06.160
-energy so the free energy
+i can now compute what is the free energy so the free energy
 
 0:42:03.760,0:42:07.280
-instead of being a function is going to
-
-0:42:06.160,0:42:10.640
-be a
+instead of being a function is going to be a
 
 0:42:07.280,0:42:12.480
-value when i pick a specific location
-
-0:42:10.640,0:42:14.560
-right so it's no longer a function of
+value when i pick a specific location right so it's no longer a function of
 
 0:42:12.480,0:42:15.200
-the latent whenever we compute the free
-
-0:42:14.560,0:42:18.240
-energy
+the latent whenever we compute the free energy
 
 0:42:15.200,0:42:21.040
-the latent disappears and i
-
-0:42:18.240,0:42:22.880
-get that z check which is the optimal
+the latent disappears and i get that z check which is the optimal
 
 0:42:21.040,0:42:26.000
-latent on the latent that is
-
-0:42:22.880,0:42:30.079
-the most likely giving me uh
+latent on the latent that is the most likely giving me uh
 
 0:42:26.000,0:42:32.079
-that that point so here we have that the
-
-0:42:30.079,0:42:34.240
-z check for the blue curve happens over
+that that point so here we have that the z check for the blue curve happens over
 
 0:42:32.079,0:42:36.400
-here similarly the z check for the
-
-0:42:34.240,0:42:39.280
-orange green and purple and red
+here similarly the z check for the orange green and purple and red
 
 0:42:36.400,0:42:39.920
-are happening in these locations uh
-
-0:42:39.280,0:42:42.079
-there
+are happening in these locations uh there
 
 0:42:39.920,0:42:44.480
-for sure we could have ended up caught
-
-0:42:42.079,0:42:47.200
-in this local minimum right that
+for sure we could have ended up caught in this local minimum right that
 
 0:42:44.480,0:42:47.920
-definitely could be a pitfall of you
-
-0:42:47.200,0:42:52.240
-know
+definitely could be a pitfall of you know
 
 0:42:47.920,0:42:56.720
-of using some gradient-based methods
-
-0:42:52.240,0:43:00.400
-so question now for the audience
+of using some gradient-based methods so question now for the audience
 
 0:42:56.720,0:43:04.319
-i'm removing everything what is f
-
-0:43:00.400,0:43:05.680
-infinity so how many dimensions does
+i'm removing everything what is f infinity so how many dimensions does
 
 0:43:04.319,0:43:08.000
-this stuff okay
-
-0:43:05.680,0:43:09.839
-can someone remind me this right so can
+this stuff okay can someone remind me this right so can
 
 0:43:08.000,0:43:10.640
-someone tell me what is the domain and
-
-0:43:09.839,0:43:15.359
-the image
+someone tell me what is the domain and the image
 
 0:43:10.640,0:43:15.359
-of these function on the chart
-
-0:43:15.680,0:43:19.280
-where does f infinity live
+of these function on the chart where does f infinity live
 
 0:43:17.800,0:43:24.640
-[Music]
-
-0:43:19.280,0:43:24.640
-anyone run sir is anyone listening
+[Music] anyone run sir is anyone listening
 
 0:43:25.280,0:43:31.680
-hello okay so y
-
-0:43:28.560,0:43:34.960
-is uh on r24 yeah but that's
+hello okay so y is uh on r24 yeah but that's
 
 0:43:31.680,0:43:38.079
-uh there is just an r i don't know what
-
-0:43:34.960,0:43:41.760
-just r means um
+uh there is just an r i don't know what just r means um
 
 0:43:38.079,0:43:44.880
-the capital y is 20
-
-0:43:41.760,0:43:45.920
-capital y has 24 items each item in
+the capital y is 20 capital y has 24 items each item in
 
 0:43:44.880,0:43:47.920
-capital y
-
-0:43:45.920,0:43:50.160
-are two dimensional right so y is a
+capital y are two dimensional right so y is a
 
 0:43:47.920,0:43:52.160
-matrix but i'm not asking the capital y
-
-0:43:50.160,0:43:55.680
-i'm asking capital f
+matrix but i'm not asking the capital y i'm asking capital f
 
 0:43:52.160,0:43:57.599
-infinity right so capital f infinity is
-
-0:43:55.680,0:43:59.440
-uh someone mentioned here it's
+infinity right so capital f infinity is uh someone mentioned here it's
 
 0:43:57.599,0:44:01.920
-definitely a real value
-
-0:43:59.440,0:44:02.960
-uh in our case is actually positively
+definitely a real value uh in our case is actually positively
 
 0:44:01.920,0:44:05.359
-non-negatively
-
-0:44:02.960,0:44:06.720
-value right because it's a it's a square
+non-negatively value right because it's a it's a square
 
 0:44:05.359,0:44:08.880
-sum of squares
-
-0:44:06.720,0:44:12.160
-and the domain instead what is the
+sum of squares and the domain instead what is the
 
 0:44:08.880,0:44:12.160
-domain of capital f
-
-0:44:16.240,0:44:23.200
-the domain is going to be the uh the
+domain of capital f the domain is going to be the uh the
 
 0:44:19.280,0:44:26.160
-basically the where y uh
-
-0:44:23.200,0:44:28.160
-the bold y belongs to no so the ball y
+basically the where y uh the bold y belongs to no so the ball y
 
 0:44:26.160,0:44:30.960
-it's a vector in two dimensions so
-
-0:44:28.160,0:44:31.520
-that's gonna be r two right so again
+it's a vector in two dimensions so that's gonna be r two right so again
 
 0:44:30.960,0:44:33.520
-these f
-
-0:44:31.520,0:44:35.760
-are scalar values so i'm gonna be
+these f are scalar values so i'm gonna be
 
 0:44:33.520,0:44:36.560
-representing the different intensities
-
-0:44:35.760,0:44:39.920
-of this
+representing the different intensities of this
 
 0:44:36.560,0:44:43.040
-scalar value with this color bar here
-
-0:44:39.920,0:44:45.920
-i will represent in a violet very dark
+scalar value with this color bar here i will represent in a violet very dark
 
 0:44:43.040,0:44:46.720
-maybe not even able to see in this free
-
-0:44:45.920,0:44:50.079
-energy
+maybe not even able to see in this free energy
 
 0:44:46.720,0:44:52.160
-uh equals zero then in aqua
-
-0:44:50.079,0:44:53.200
-i'm gonna be representing this zero
+uh equals zero then in aqua i'm gonna be representing this zero
 
 0:44:52.160,0:44:56.880
-temperature limit
-
-0:44:53.200,0:44:59.440
-uh free energy for uh equal one and then
+temperature limit uh free energy for uh equal one and then
 
 0:44:56.880,0:45:00.079
-everything that is above and beyond the
-
-0:44:59.440,0:45:03.520
-value two
+everything that is above and beyond the value two
 
 0:45:00.079,0:45:06.560
-is going to be yellow and so this is
-
-0:45:03.520,0:45:10.240
-how that grid looks
+is going to be yellow and so this is how that grid looks
 
 0:45:06.560,0:45:12.319
-okay so each location in that grid here
-
-0:45:10.240,0:45:14.160
-and i show you before and those green
+okay so each location in that grid here and i show you before and those green
 
 0:45:12.319,0:45:17.119
-points have a
-
-0:45:14.160,0:45:18.160
-free energy which is here represented by
+points have a free energy which is here represented by
 
 0:45:17.119,0:45:19.839
-this color
-
-0:45:18.160,0:45:22.400
-in this location over here in the bottom
+this color in this location over here in the bottom
 
 0:45:19.839,0:45:24.480
-side you can see it's yellow
-
-0:45:22.400,0:45:25.599
-which means it has a free energy which
+side you can see it's yellow which means it has a free energy which
 
 0:45:24.480,0:45:29.280
-is
-
-0:45:25.599,0:45:32.720
-equal or larger than 2. moreover
+is equal or larger than 2. moreover
 
 0:45:29.280,0:45:34.319
-those arrows are pointing
-
-0:45:32.720,0:45:36.720
-are the gradient right so these are
+those arrows are pointing are the gradient right so these are
 
 0:45:34.319,0:45:39.920
-pointing in the direction of maximum
-
-0:45:36.720,0:45:43.119
-uh ascend as we move closer
+pointing in the direction of maximum uh ascend as we move closer
 
 0:45:39.920,0:45:43.440
-to the uh this region here you're gonna
-
-0:45:43.119,0:45:45.599
-get
+to the uh this region here you're gonna get
 
 0:45:43.440,0:45:47.200
-finally you're gonna see some colors and
-
-0:45:45.599,0:45:48.640
-here you can tell the free energy is
+finally you're gonna see some colors and here you can tell the free energy is
 
 0:45:47.200,0:45:51.599
-getting lower lower lower
-
-0:45:48.640,0:45:53.359
-until we hit the location where this
+getting lower lower lower until we hit the location where this
 
 0:45:51.599,0:45:55.280
-reconstruction happen
-
-0:45:53.359,0:45:58.160
-which is the location the region where
+reconstruction happen which is the location the region where
 
 0:45:55.280,0:46:01.760
-my free energy is zero
-
-0:45:58.160,0:46:05.599
-when we train this modern we try to get
+my free energy is zero when we train this modern we try to get
 
 0:46:01.760,0:46:08.400
-this zero energy level to be matching
-
-0:46:05.599,0:46:08.800
-the location of these blue points of
+this zero energy level to be matching the location of these blue points of
 
 0:46:08.400,0:46:11.280
-course
-
-0:46:08.800,0:46:12.560
-as you can tell this model is very
+course as you can tell this model is very
 
 0:46:11.280,0:46:16.000
-poorly trained
-
-0:46:12.560,0:46:18.560
-and therefore this energy surface is not
+poorly trained and therefore this energy surface is not
 
 0:46:16.000,0:46:20.640
-well matching my training point it's
-
-0:46:18.560,0:46:22.640
-getting close but it's not yet there
+well matching my training point it's getting close but it's not yet there
 
 0:46:20.640,0:46:24.640
-so next time we're gonna see how to
-
-0:46:22.640,0:46:28.240
-stretch this energy
+so next time we're gonna see how to stretch this energy
 
 0:46:24.640,0:46:32.240
-such that it's gonna be you know nicely
-
-0:46:28.240,0:46:35.599
-fitting on these blue points okay
+such that it's gonna be you know nicely fitting on these blue points okay
 
 0:46:32.240,0:46:37.119
-uh why is the energy surface single
-
-0:46:35.599,0:46:40.240
-value
+uh why is the energy surface single value
 
 0:46:37.119,0:46:43.280
-so the energy surface
-
-0:46:40.240,0:46:46.319
-which is the value of f infinity right
+so the energy surface which is the value of f infinity right
 
 0:46:43.280,0:46:47.119
-and f infinity is the minimum of my
-
-0:46:46.319,0:46:51.280
-energy so
+and f infinity is the minimum of my energy so
 
 0:46:47.119,0:46:54.800
-energy the capital e it's a function
-
-0:46:51.280,0:46:57.280
-over all possible latent but then
+energy the capital e it's a function over all possible latent but then
 
 0:46:54.800,0:46:59.440
-given that we have this function we're
-
-0:46:57.280,0:47:01.920
-going to find what is the minimum value
+given that we have this function we're going to find what is the minimum value
 
 0:46:59.440,0:47:03.119
-minimum value that this energy can take
-
-0:47:01.920,0:47:06.960
-that minimum value
+minimum value that this energy can take that minimum value
 
 0:47:03.119,0:47:10.560
-is the uh zero temperature limit
-
-0:47:06.960,0:47:13.839
-of the free energy which is this f
+is the uh zero temperature limit of the free energy which is this f
 
 0:47:10.560,0:47:16.880
-infinity okay and so e
-
-0:47:13.839,0:47:19.680
-y z is a function of y and z
+infinity okay and so e y z is a function of y and z
 
 0:47:16.880,0:47:21.200
-but then whenever we take out the z with
-
-0:47:19.680,0:47:24.240
-the minimization
+but then whenever we take out the z with the minimization
 
 0:47:21.200,0:47:25.359
-we get this f which is going to be a
-
-0:47:24.240,0:47:28.960
-function
+we get this f which is going to be a function
 
 0:47:25.359,0:47:31.760
-of y right so every time i move across
-
-0:47:28.960,0:47:32.640
-the y space here we have y1 and y2 the
+of y right so every time i move across the y space here we have y1 and y2 the
 
 0:47:31.760,0:47:34.960
-two components
-
-0:47:32.640,0:47:37.119
-you're gonna have that f will have you
+two components you're gonna have that f will have you
 
 0:47:34.960,0:47:38.440
-know larger than two larger than two
-
-0:47:37.119,0:47:41.839
-blah blah blah then
+know larger than two larger than two blah blah blah then
 
 0:47:38.440,0:47:42.400
-1.75 1.50 and so on lower values until
-
-0:47:41.839,0:47:44.640
-we get
+1.75 1.50 and so on lower values until we get
 
 0:47:42.400,0:47:46.160
-f roughly zero and then actually it
-
-0:47:44.640,0:47:47.920
-increases a little bit
+f roughly zero and then actually it increases a little bit
 
 0:47:46.160,0:47:50.480
-so maybe next time i also gonna show you
-
-0:47:47.920,0:47:51.359
-this chart uh in a 3d version also
+so maybe next time i also gonna show you this chart uh in a 3d version also
 
 0:47:50.480,0:47:54.720
-rotating
-
-0:47:51.359,0:47:57.040
-i didn't have time to do that um
+rotating i didn't have time to do that um
 
 0:47:54.720,0:47:57.839
-did i answer your question is it clear
-
-0:47:57.040,0:48:00.800
-why this
+did i answer your question is it clear why this
 
 0:47:57.839,0:48:02.559
-energy function is single value like as
-
-0:48:00.800,0:48:03.520
-in a scalar value right you mean single
+energy function is single value like as in a scalar value right you mean single
 
 0:48:02.559,0:48:05.200
-value
-
-0:48:03.520,0:48:07.839
-am i understanding the question
+value am i understanding the question
 
 0:48:05.200,0:48:07.839
-correctly
-
-0:48:08.480,0:48:14.960
-but we have 24 y's so the
+correctly but we have 24 y's so the
 
 0:48:11.520,0:48:18.000
-capital the y's are these blue points
-
-0:48:14.960,0:48:21.839
-right now my y's i'm using
+capital the y's are these blue points right now my y's i'm using
 
 0:48:18.000,0:48:24.160
-are this one so they're not 24 there are
-
-0:48:21.839,0:48:26.000
-so if you count from here let me go a
+are this one so they're not 24 there are so if you count from here let me go a
 
 0:48:24.160,0:48:29.119
-bit larger i can see
-
-0:48:26.000,0:48:32.240
-so here we have blah blah blah 12
+bit larger i can see so here we have blah blah blah 12
 
 0:48:29.119,0:48:35.839
-and 12 here plus one we have 25
-
-0:48:32.240,0:48:38.640
-and then here we had eight
+and 12 here plus one we have 25 and then here we had eight
 
 0:48:35.839,0:48:41.280
-and eight sixteen plus one seventeen so
-
-0:48:38.640,0:48:42.800
-right now we have 17 times 25
+and eight sixteen plus one seventeen so right now we have 17 times 25
 
 0:48:41.280,0:48:46.839
-i don't know how much it is someone can
-
-0:48:42.800,0:48:49.839
-compute okay google how much is 17 times
+i don't know how much it is someone can compute okay google how much is 17 times
 
 0:48:46.839,0:48:53.960
-25
-
-0:48:49.839,0:48:57.359
-okay she's not listening oh 120
+25 okay she's not listening oh 120
 
 0:48:53.960,0:49:00.400
-425 uh there you go
-
-0:48:57.359,0:49:03.760
-so right now we have 425
+425 uh there you go so right now we have 425
 
 0:49:00.400,0:49:08.079
-points right so we have 424
-
-0:49:03.760,0:49:11.440
-24 425 energy functions
+points right so we have 424 24 425 energy functions
 
 0:49:08.079,0:49:13.200
-of which function of y so
-
-0:49:11.440,0:49:15.359
-given that i pick a y i have an energy
+of which function of y so given that i pick a y i have an energy
 
 0:49:13.200,0:49:16.960
-function those are functions in z
-
-0:49:15.359,0:49:18.400
-given that you pick the minimum value of
+function those are functions in z given that you pick the minimum value of
 
 0:49:16.960,0:49:19.839
-this energy function that's going to be
-
-0:49:18.400,0:49:23.280
-your free energy
+this energy function that's going to be your free energy
 
 0:49:19.839,0:49:24.720
-for a specific y so you remove that
-
-0:49:23.280,0:49:28.720
-latent variable so we have
+for a specific y so you remove that latent variable so we have
 
 0:49:24.720,0:49:31.040
-an internal possible way of spending our
-
-0:49:28.720,0:49:32.400
-uh manifold right so you want to think
+an internal possible way of spending our uh manifold right so you want to think
 
 0:49:31.040,0:49:35.280
-about this as you know
-
-0:49:32.400,0:49:37.440
-you have like uh your potato in your
+about this as you know you have like uh your potato in your
 
 0:49:35.280,0:49:38.079
-model like your model thinks about the
-
-0:49:37.440,0:49:41.200
-data is
+model like your model thinks about the data is
 
 0:49:38.079,0:49:41.680
-distributed as this kind of shape and
-
-0:49:41.200,0:49:44.319
-then
+distributed as this kind of shape and then
 
 0:49:41.680,0:49:46.319
-your latent variable allows you to go
-
-0:49:44.319,0:49:48.800
-all around this potato
+your latent variable allows you to go all around this potato
 
 0:49:46.319,0:49:49.760
-so right now if you add if you ask me oh
-
-0:49:48.800,0:49:52.559
-is this
+so right now if you add if you ask me oh is this
 
 0:49:49.760,0:49:54.480
-point here on your manifold or not so if
-
-0:49:52.559,0:49:57.200
-this point is on my manifold
+point here on your manifold or not so if this point is on my manifold
 
 0:49:54.480,0:49:58.400
-i know that by going around my manifold
-
-0:49:57.200,0:50:01.599
-and find out if
+i know that by going around my manifold and find out if
 
 0:49:58.400,0:50:05.280
-oh i get there right and so if
-
-0:50:01.599,0:50:07.440
-the free energy of that point is zero
+oh i get there right and so if the free energy of that point is zero
 
 0:50:05.280,0:50:09.280
-therefore it means that that point you
-
-0:50:07.440,0:50:11.680
-are asking me about
+therefore it means that that point you are asking me about
 
 0:50:09.280,0:50:12.800
-leaves on the manifold that the model
-
-0:50:11.680,0:50:15.359
-has learned
+leaves on the manifold that the model has learned
 
 0:50:12.800,0:50:16.720
-if your free energy is not zero then
-
-0:50:15.359,0:50:19.839
-it's gonna be simply
+if your free energy is not zero then it's gonna be simply
 
 0:50:16.720,0:50:20.720
-equal to the quadratic uh euclidean
-
-0:50:19.839,0:50:23.599
-distance
+equal to the quadratic uh euclidean distance
 
 0:50:20.720,0:50:24.240
-from that location from your point and
-
-0:50:23.599,0:50:28.960
-my
+from that location from your point and my
 
 0:50:24.240,0:50:33.680
-manifold right did i answer the question
-
-0:50:28.960,0:50:36.880
-yeah okay uh more questions for me
+manifold right did i answer the question yeah okay uh more questions for me
 
 0:50:33.680,0:50:40.160
-oh was everything clear i i this stuff i
-
-0:50:36.880,0:50:40.559
-really just digested it uh like in the
+oh was everything clear i i this stuff i really just digested it uh like in the
 
 0:50:40.160,0:50:43.119
-past
-
-0:50:40.559,0:50:45.839
-30 hours so again i might not have done
+past 30 hours so again i might not have done
 
 0:50:43.119,0:50:48.880
-a very good job
-
-0:50:45.839,0:50:52.640
-let's see how do we choose a function to
+a very good job let's see how do we choose a function to
 
 0:50:48.880,0:50:52.640
-represent the data manifold
-
-0:50:53.040,0:50:59.040
-in this case it seemed like we chose a
+represent the data manifold in this case it seemed like we chose a
 
 0:50:56.000,0:51:02.400
-ellipse based on the data but how about
-
-0:50:59.040,0:51:04.400
-other scenarios yeah definitely uh
+ellipse based on the data but how about other scenarios yeah definitely uh
 
 0:51:02.400,0:51:05.680
-there is a lot of you know research
-
-0:51:04.400,0:51:07.040
-going in uh
+there is a lot of you know research going in uh
 
 0:51:05.680,0:51:10.160
-architectures right network
-
-0:51:07.040,0:51:13.599
-architectures so
+architectures right network architectures so
 
 0:51:10.160,0:51:14.319
-but again uh right now yeah we we chose
-
-0:51:13.599,0:51:16.720
-that
+but again uh right now yeah we we chose that
 
 0:51:14.319,0:51:18.559
-next time i'm gonna be trying to learn
-
-0:51:16.720,0:51:20.960
-the level of compatibility
+next time i'm gonna be trying to learn the level of compatibility
 
 0:51:18.559,0:51:22.559
-like i'm gonna try to learn this energy
-
-0:51:20.960,0:51:26.079
-for the x y
+like i'm gonna try to learn this energy for the x y
 
 0:51:22.559,0:51:27.599
-z the the triple right and so
-
-0:51:26.079,0:51:29.359
-we're gonna be just using neural nets
+z the the triple right and so we're gonna be just using neural nets
 
 0:51:27.599,0:51:31.599
-right even the sine and cosine
-
-0:51:29.359,0:51:33.359
-you can somehow approximate them with a
+right even the sine and cosine you can somehow approximate them with a
 
 0:51:31.599,0:51:34.319
-few layers right so instead of having
-
-0:51:33.359,0:51:38.319
-these
+few layers right so instead of having these
 
 0:51:34.319,0:51:40.720
-uh z function uh the g function
-
-0:51:38.319,0:51:41.760
-over here instead of having this very
+uh z function uh the g function over here instead of having this very
 
 0:51:40.720,0:51:43.440
-simple thing
-
-0:51:41.760,0:51:45.520
-we can think about having you know a few
+simple thing we can think about having you know a few
 
 0:51:43.440,0:51:47.920
-layers of a neural net right
-
-0:51:45.520,0:51:49.280
-so you can still use a few layers of a
+layers of a neural net right so you can still use a few layers of a
 
 0:51:47.920,0:51:51.280
-neural net
-
-0:51:49.280,0:51:52.720
-but you're not going to be using the
+neural net but you're not going to be using the
 
 0:51:51.280,0:51:54.559
-neural net to do vector
-
-0:51:52.720,0:51:56.160
-vector mapping you're going to be using
+neural net to do vector vector mapping you're going to be using
 
 0:51:54.559,0:51:59.599
-a neural net to do
-
-0:51:56.160,0:52:01.359
-a bunch of vectors to scalars right
+a neural net to do a bunch of vectors to scalars right
 
 0:51:59.599,0:52:02.960
-so bunch of vectors to scalars is going
-
-0:52:01.359,0:52:05.520
-to be this
+so bunch of vectors to scalars is going to be this
 
 0:52:02.960,0:52:06.800
-energy-based you know way of thinking
-
-0:52:05.520,0:52:09.839
-about things
+energy-based you know way of thinking about things
 
 0:52:06.800,0:52:10.960
-uh because again how do you
-
-0:52:09.839,0:52:12.559
-let's say you want to translate
+uh because again how do you let's say you want to translate
 
 0:52:10.960,0:52:13.280
-something from one language to another
-
-0:52:12.559,0:52:15.280
-language right
+something from one language to another language right
 
 0:52:13.280,0:52:16.559
-so i have one sentence but i may
-
-0:52:15.280,0:52:18.240
-translate that sentence
+so i have one sentence but i may translate that sentence
 
 0:52:16.559,0:52:20.240
-in different ways in another language
-
-0:52:18.240,0:52:22.079
-right so how do you train this you
+in different ways in another language right so how do you train this you
 
 0:52:20.240,0:52:24.079
-cannot really say
-
-0:52:22.079,0:52:26.079
-i do soft marks because first of all
+cannot really say i do soft marks because first of all
 
 0:52:24.079,0:52:29.520
-there is an infinite number
-
-0:52:26.079,0:52:31.200
-of sentences so you can't do that
+there is an infinite number of sentences so you can't do that
 
 0:52:29.520,0:52:33.280
-but then there might be even multiple
-
-0:52:31.200,0:52:35.440
-sentences that are correctly associated
+but then there might be even multiple sentences that are correctly associated
 
 0:52:33.280,0:52:37.839
-to your first sentence
-
-0:52:35.440,0:52:39.200
-so this energy based model allow you to
+to your first sentence so this energy based model allow you to
 
 0:52:37.839,0:52:41.680
-end up with a
-
-0:52:39.200,0:52:42.319
-score scoring mechanism which is this
+end up with a score scoring mechanism which is this
 
 0:52:41.680,0:52:46.720
-energy
-
-0:52:42.319,0:52:49.839
-which is telling you how compatible are
+energy which is telling you how compatible are
 
 0:52:46.720,0:52:50.880
-points right so here x y and z are all
-
-0:52:49.839,0:52:52.640
-interchangeable
+points right so here x y and z are all interchangeable
 
 0:52:50.880,0:52:54.079
-given one i can find the other right so
-
-0:52:52.640,0:52:57.280
-if i have the energy
+given one i can find the other right so if i have the energy
 
 0:52:54.079,0:52:58.559
-if my model learned the energy i can
-
-0:52:57.280,0:53:00.960
-find x given y
+if my model learned the energy i can find x given y
 
 0:52:58.559,0:53:01.920
-i can find y given z i can find z given
-
-0:53:00.960,0:53:04.400
-x i can find
+i can find y given z i can find z given x i can find
 
 0:53:01.920,0:53:06.240
-all kind of combination those x y z i
-
-0:53:04.400,0:53:07.599
-don't even have to write them x y and z
+all kind of combination those x y z i don't even have to write them x y and z
 
 0:53:06.240,0:53:08.480
-i can just write all the components
-
-0:53:07.599,0:53:10.640
-right and then i can
+i can just write all the components right and then i can
 
 0:53:08.480,0:53:12.640
-as long as my model learns that right it
-
-0:53:10.640,0:53:16.000
-learns all the
+as long as my model learns that right it learns all the
 
 0:53:12.640,0:53:16.480
-uh how do you call them um interactions
-
-0:53:16.000,0:53:20.000
-that
+uh how do you call them um interactions that
 
 0:53:16.480,0:53:21.760
-exist in my data that's why uh jan likes
-
-0:53:20.000,0:53:23.040
-them so much and they're super powerful
+exist in my data that's why uh jan likes them so much and they're super powerful
 
 0:53:21.760,0:53:26.000
-because they don't make too many
-
-0:53:23.040,0:53:26.000
-assumptions i think
+because they don't make too many assumptions i think
 
 0:53:26.160,0:53:32.880
-uh did i okay i answer your question uh
-
-0:53:29.200,0:53:36.640
-we are over time so i think
+uh did i okay i answer your question uh we are over time so i think
 
 0:53:32.880,0:53:38.079
-this lesson was kind of
-
-0:53:36.640,0:53:40.400
-fine i don't know you had to tell me
+this lesson was kind of fine i don't know you had to tell me
 
 0:53:38.079,0:53:42.640
-because i really don't know
-
-0:53:40.400,0:53:44.319
-i hope you like this yeah that was great
+because i really don't know i hope you like this yeah that was great
 
 0:53:42.640,0:53:47.040
-okay because people are very
-
-0:53:44.319,0:53:48.960
-quiet today i wanted to make also a
+okay because people are very quiet today i wanted to make also a
 
 0:53:47.040,0:53:49.599
-notebook but then the notebook is really
-
-0:53:48.960,0:53:52.000
-ugly
+notebook but then the notebook is really ugly
 
 0:53:49.599,0:53:52.720
-because i use the notebook to make very
-
-0:53:52.000,0:53:54.559
-pretty
+because i use the notebook to make very pretty
 
 0:53:52.720,0:53:56.640
-visualization but the code is really
-
-0:53:54.559,0:53:58.800
-ugly maybe next time
+visualization but the code is really ugly maybe next time
 
 0:53:56.640,0:54:00.000
-i can share with you a cleanup version
-
-0:53:58.800,0:54:02.880
-of the notebook for
+i can share with you a cleanup version of the notebook for
 
 0:54:00.000,0:54:03.440
-pedagogical purpose right and especially
-
-0:54:02.880,0:54:06.000
-going to be
+pedagogical purpose right and especially going to be
 
 0:54:03.440,0:54:07.599
-showing you this network which doesn't
-
-0:54:06.000,0:54:08.640
-have an input doesn't have a forward
+showing you this network which doesn't have an input doesn't have a forward
 
 0:54:07.599,0:54:10.720
-function
-
-0:54:08.640,0:54:12.240
-which is so funny uh and then we're
+function which is so funny uh and then we're
 
 0:54:10.720,0:54:14.559
-gonna be learning perhaps
-
-0:54:12.240,0:54:16.079
-what is the free energy without this
+gonna be learning perhaps what is the free energy without this
 
 0:54:14.559,0:54:18.880
-beta that goes to
-
-0:54:16.079,0:54:19.680
-um to infinity and we're gonna be
+beta that goes to um to infinity and we're gonna be
 
 0:54:18.880,0:54:21.760
-learning how to do
-
-0:54:19.680,0:54:23.920
-learning okay so today again we just
+learning how to do learning okay so today again we just
 
 0:54:21.760,0:54:25.359
-learned so let me get to the beginning
-
-0:54:23.920,0:54:28.319
-so we can end up
+learned so let me get to the beginning so we can end up
 
 0:54:25.359,0:54:29.760
-here so today we talk about inference
-
-0:54:28.319,0:54:32.000
-okay
+here so today we talk about inference okay
 
 0:54:29.760,0:54:33.440
-we do inference by doing minimization of
-
-0:54:32.000,0:54:34.960
-an energy function
+we do inference by doing minimization of an energy function
 
 0:54:33.440,0:54:36.559
-learning is something we're going to be
-
-0:54:34.960,0:54:37.040
-talking about next time they don't they
+learning is something we're going to be talking about next time they don't they
 
 0:54:36.559,0:54:39.280
-don't
-
-0:54:37.040,0:54:41.359
-they don't have anything to share well
+don't they don't have anything to share well
 
 0:54:39.280,0:54:43.520
-it's two different topics right
-
-0:54:41.359,0:54:45.280
-next time the other one and then the
+it's two different topics right next time the other one and then the
 
 0:54:43.520,0:54:45.760
-other part so it was inference for
-
-0:54:45.280,0:54:48.000
-latent
+other part so it was inference for latent
 
 0:54:45.760,0:54:49.119
-variable energy based model which allow
-
-0:54:48.000,0:54:51.920
-you to capture
+variable energy based model which allow you to capture
 
 0:54:49.119,0:54:53.040
-this multi multi modality of you know
-
-0:54:51.920,0:54:56.720
-multi
+this multi multi modality of you know multi
 
 0:54:53.040,0:54:58.000
-multi modality of coexistence of things
-
-0:54:56.720,0:54:59.200
-right you can you don't have simply
+multi modality of coexistence of things right you can you don't have simply
 
 0:54:58.000,0:55:01.760
-vector to vector you have
-
-0:54:59.200,0:55:02.960
-one too many options right and then we
+vector to vector you have one too many options right and then we
 
 0:55:01.760,0:55:06.400
-talk about
-
-0:55:02.960,0:55:09.599
-uh we talk about this stuff here now
+talk about uh we talk about this stuff here now
 
 0:55:06.400,0:55:12.799
-and how we can possibly try to learn
-
-0:55:09.599,0:55:15.359
-this combination of x y uh x y
+and how we can possibly try to learn this combination of x y uh x y
 
 0:55:12.799,0:55:18.000
-combination right
-
-0:55:15.359,0:55:20.480
-uh so there's a question so minimizing
+combination right uh so there's a question so minimizing
 
 0:55:18.000,0:55:24.480
-energy regarding to
-
-0:55:20.480,0:55:28.480
-train manifold basically means denoising
+energy regarding to train manifold basically means denoising
 
 0:55:24.480,0:55:29.760
-uh i think you can think about that as
-
-0:55:28.480,0:55:33.680
-yeah in that way
+uh i think you can think about that as yeah in that way
 
 0:55:29.760,0:55:35.040
-so the real manifold
-
-0:55:33.680,0:55:37.119
-okay depends which one is the real
+so the real manifold okay depends which one is the real
 
 0:55:35.040,0:55:40.079
-manifold right so if the model
-
-0:55:37.119,0:55:42.240
-has learned the the real manifold then
+manifold right so if the model has learned the the real manifold then
 
 0:55:40.079,0:55:44.240
-you know by minimizing the energy
-
-0:55:42.240,0:55:46.079
-you can find what is the denoised
+you know by minimizing the energy you can find what is the denoised
 
 0:55:44.240,0:55:48.000
-version of your input
-
-0:55:46.079,0:55:49.200
-another option you have to denoise this
+version of your input another option you have to denoise this
 
 0:55:48.000,0:55:50.079
-stuff is going to be if you find
-
-0:55:49.200,0:55:51.520
-yourself here
+stuff is going to be if you find yourself here
 
 0:55:50.079,0:55:53.280
-you can compute this energy you can
-
-0:55:51.520,0:55:54.799
-follow the gradient and then here you
+you can compute this energy you can follow the gradient and then here you
 
 0:55:53.280,0:55:56.960
-can recompute the energy
-
-0:55:54.799,0:55:58.960
-you can still go and follow the energy
+can recompute the energy you can still go and follow the energy
 
 0:55:56.960,0:56:01.119
-the gradient so you can end up boom
-
-0:55:58.960,0:56:03.680
-down on the manifold right so you can
+the gradient so you can end up boom down on the manifold right so you can
 
 0:56:01.119,0:56:06.079
-make little steps so you can just go
-
-0:56:03.680,0:56:07.599
-uh you know you can find out where to go
+make little steps so you can just go uh you know you can find out where to go
 
 0:56:06.079,0:56:10.640
-or you can use the
-
-0:56:07.599,0:56:13.599
-uh you know the z check
+or you can use the uh you know the z check
 
 0:56:10.640,0:56:15.520
-to find out what is the uh best
-
-0:56:13.599,0:56:17.839
-approximation of your point over here
+to find out what is the uh best approximation of your point over here
 
 0:56:15.520,0:56:17.839
-right
-
-0:56:18.160,0:56:22.799
-okay all right so that was it um thank
+right okay all right so that was it um thank
 
 0:56:21.280,0:56:23.520
-you for listening you have a nice
-
-0:56:22.799,0:56:25.839
-evening
+you for listening you have a nice evening
 
 0:56:23.520,0:56:27.200
-i see on friday i feel free to ask jan
-
-0:56:25.839,0:56:30.240
-questions about this
+i see on friday i feel free to ask jan questions about this
 
 0:56:27.200,0:56:30.640
-uh this this practicum he he was you
-
-0:56:30.240,0:56:33.920
-know
+uh this this practicum he he was you know
 
 0:56:30.640,0:56:34.400
-helping a lot as well right have a good
-
-0:56:33.920,0:56:36.880
-night
+helping a lot as well right have a good night
 
 0:56:34.400,0:56:36.880
-bye bye
-
-0:56:37.760,0:56:41.520
-so how can you get more out of this
+bye bye so how can you get more out of this
 
 0:56:39.680,0:56:44.400
-lesson today
-
-0:56:41.520,0:56:44.960
-comprehension if something was not yet
+lesson today comprehension if something was not yet
 
 0:56:44.400,0:56:47.760
-clear
-
-0:56:44.960,0:56:49.440
-you should really ask me uh anything in
+clear you should really ask me uh anything in
 
 0:56:47.760,0:56:51.440
-the comment section below okay
-
-0:56:49.440,0:56:53.440
-i will answer every comment that you
+the comment section below okay i will answer every comment that you
 
 0:56:51.440,0:56:55.520
-write over there
-
-0:56:53.440,0:56:56.960
-news if you would like to keep up with
+write over there news if you would like to keep up with
 
 0:56:55.520,0:56:57.440
-everything i post online you should
-
-0:56:56.960,0:57:01.599
-check
+everything i post online you should check
 
 0:56:57.440,0:57:03.839
-my twitter account under alph cnz
-
-0:57:01.599,0:57:05.200
-if you also would like youtube to notify
+my twitter account under alph cnz if you also would like youtube to notify
 
 0:57:03.839,0:57:08.160
-about you the latest
-
-0:57:05.200,0:57:09.839
-videos i upload then press that
+about you the latest videos i upload then press that
 
 0:57:08.160,0:57:11.119
-subscribe button and turn on the
-
-0:57:09.839,0:57:12.640
-notification bell
+subscribe button and turn on the notification bell
 
 0:57:11.119,0:57:15.280
-such that you're not going to be missing
-
-0:57:12.640,0:57:19.599
-any content if you like this video
+such that you're not going to be missing any content if you like this video
 
 0:57:15.280,0:57:22.720
-put a like on it it means a lot to me
-
-0:57:19.599,0:57:25.200
-searching we have a companion website
+put a like on it it means a lot to me searching we have a companion website
 
 0:57:22.720,0:57:26.000
-where you can find each and every video
-
-0:57:25.200,0:57:29.200
-transcribed
+where you can find each and every video transcribed
 
 0:57:26.000,0:57:29.839
-by students that volunteered for example
-
-0:57:29.200,0:57:32.880
-here
+by students that volunteered for example here
 
 0:57:29.839,0:57:35.839
-you can see this lesson transcribed
-
-0:57:32.880,0:57:36.400
-as you can tell the titles are links
+you can see this lesson transcribed as you can tell the titles are links
 
 0:57:35.839,0:57:38.880
-which are
-
-0:57:36.400,0:57:40.400
-redirecting you to the correct section
+which are redirecting you to the correct section
 
 0:57:38.880,0:57:43.599
-in the video
-
-0:57:40.400,0:57:47.359
-so here we have this lesson transcribed
+in the video so here we have this lesson transcribed
 
 0:57:43.599,0:57:49.359
-to you in english moreover
-
-0:57:47.359,0:57:51.119
-not only english is available as you can
+to you in english moreover not only english is available as you can
 
 0:57:49.359,0:57:55.040
-tell here there is the
-
-0:57:51.119,0:57:56.559
-english flag you can go up on top for
+tell here there is the english flag you can go up on top for
 
 0:57:55.040,0:57:59.680
-example and i show you
-
-0:57:56.559,0:58:01.920
-the home page here you can find that
+example and i show you the home page here you can find that
 
 0:57:59.680,0:58:03.040
-many languages are available arabic
-
-0:58:01.920,0:58:05.280
-spanish version
+many languages are available arabic spanish version
 
 0:58:03.040,0:58:06.240
-french italian japanese korean russian
-
-0:58:05.280,0:58:08.720
-turkish
+french italian japanese korean russian turkish
 
 0:58:06.240,0:58:10.400
-and chinese and more are coming if you
-
-0:58:08.720,0:58:11.119
-would like to contribute and add your
+and chinese and more are coming if you would like to contribute and add your
 
 0:58:10.400,0:58:13.520
-own language
-
-0:58:11.119,0:58:14.720
-don't hesitate to contact me on twitter
+own language don't hesitate to contact me on twitter
 
 0:58:13.520,0:58:17.839
-or by email
-
-0:58:14.720,0:58:19.280
-this is the language part moreover it
+or by email this is the language part moreover it
 
 0:58:17.839,0:58:21.599
-really really helps
-
-0:58:19.280,0:58:22.319
-if you implement things that we cover in
+really really helps if you implement things that we cover in
 
 0:58:21.599,0:58:24.960
-class
-
-0:58:22.319,0:58:25.920
-with file torch and you know a notebook
+class with file torch and you know a notebook
 
 0:58:24.960,0:58:28.640
-perhaps
-
-0:58:25.920,0:58:30.160
-in some patient today class didn't have
+perhaps in some patient today class didn't have
 
 0:58:28.640,0:58:32.480
-a companion notebook
-
-0:58:30.160,0:58:33.440
-but nevertheless i would really
+a companion notebook but nevertheless i would really
 
 0:58:32.480,0:58:36.559
-recommend you to
-
-0:58:33.440,0:58:39.440
-try to put together a few trials
+recommend you to try to put together a few trials
 
 0:58:36.559,0:58:40.400
-yourself such that you can test your
-
-0:58:39.440,0:58:43.119
-knowledge
+yourself such that you can test your knowledge
 
 0:58:40.400,0:58:44.240
-finally if you find any bug in the
-
-0:58:43.119,0:58:46.640
-previous notebooks
+finally if you find any bug in the previous notebooks
 
 0:58:44.240,0:58:47.839
-in the website anywhere you're really
-
-0:58:46.640,0:58:50.160
-encouraged to
+in the website anywhere you're really encouraged to
 
 0:58:47.839,0:58:51.359
-point them out on github or if you find
-
-0:58:50.160,0:58:53.680
-yourself inclined
+point them out on github or if you find yourself inclined
 
 0:58:51.359,0:58:55.680
-you can also send a pull request such
-
-0:58:53.680,0:58:57.760
-that you can be an official contributor
+you can also send a pull request such that you can be an official contributor
 
 0:58:55.680,0:59:00.240
-to this project
-
-0:58:57.760,0:59:01.119
-and don't forget to like share and
+to this project and don't forget to like share and
 
 0:59:00.240,0:59:04.240
-subscribe
-
-0:59:01.119,0:59:06.319
-bye bye
-
-0:59:04.240,0:59:06.319
-you
-
+subscribe bye bye
diff --git a/docs/en/week15/practicum15B.sbv b/docs/en/week15/practicum15B.sbv
index 6b83859d3..a97a511e8 100644
--- a/docs/en/week15/practicum15B.sbv
+++ b/docs/en/week15/practicum15B.sbv
@@ -1,4794 +1,2396 @@
 0:00:01.920,0:00:08.160
-so we share the screen
-
-0:00:05.040,0:00:10.240
-and i'm opening the chat
+so we share the screen and i'm opening the chat
 
 0:00:08.160,0:00:12.160
-all right so i have the chat open so you
-
-0:00:10.240,0:00:14.240
-can interact with me
+all right so i have the chat open so you can interact with me
 
 0:00:12.160,0:00:16.480
-and so a small recap from last time last
-
-0:00:14.240,0:00:19.359
-time we've been talking about energy
+and so a small recap from last time last time we've been talking about energy
 
 0:00:16.480,0:00:20.080
-uh and actually we've been talking about
-
-0:00:19.359,0:00:23.119
-inference
+uh and actually we've been talking about inference
 
 0:00:20.080,0:00:24.560
-how to find set how to find y check uh
-
-0:00:23.119,0:00:28.160
-how to compute y
+how to find set how to find y check uh how to compute y
 
 0:00:24.560,0:00:30.080
-f and e okay and so let me just start
-
-0:00:28.160,0:00:32.160
-i guess with the the last slide from
+f and e okay and so let me just start i guess with the the last slide from
 
 0:00:30.080,0:00:35.120
-last time so we
-
-0:00:32.160,0:00:35.920
-had computed this f infinity uh which is
+last time so we had computed this f infinity uh which is
 
 0:00:35.120,0:00:39.680
-called
-
-0:00:35.920,0:00:42.480
-uh zero temperature limit uh free energy
+called uh zero temperature limit uh free energy
 
 0:00:39.680,0:00:44.239
-uh as a function of my y and y is going
-
-0:00:42.480,0:00:46.399
-to be a two dimensional
+uh as a function of my y and y is going to be a two dimensional
 
 0:00:44.239,0:00:47.360
-vector right so whenever i'm going to be
-
-0:00:46.399,0:00:49.520
-plotting this f
+vector right so whenever i'm going to be plotting this f
 
 0:00:47.360,0:00:51.520
-infinity of y it's going to be a scalar
-
-0:00:49.520,0:00:55.120
-field means this a height
+infinity of y it's going to be a scalar field means this a height
 
 0:00:51.520,0:00:57.840
-over like a 2d region okay
-
-0:00:55.120,0:00:59.039
-so we saw already this stuff that since
+over like a 2d region okay so we saw already this stuff that since
 
 0:00:57.840,0:01:00.800
-it's gonna have different height i'm
-
-0:00:59.039,0:01:03.840
-gonna represent with the
+it's gonna have different height i'm gonna represent with the
 
 0:01:00.800,0:01:07.760
-color purple the height equals zero
-
-0:01:03.840,0:01:09.760
-and then color equal green for
+color purple the height equals zero and then color equal green for
 
 0:01:07.760,0:01:10.960
-equal one and then everything that is
-
-0:01:09.760,0:01:14.240
-above and beyond
+equal one and then everything that is above and beyond
 
 0:01:10.960,0:01:14.799
-the free energy equal tool is going to
-
-0:01:14.240,0:01:18.720
-be in
+the free energy equal tool is going to be in
 
 0:01:14.799,0:01:22.159
-yellow okay and so
-
-0:01:18.720,0:01:23.920
-this is how this stuff looks i
+yellow okay and so this is how this stuff looks i
 
 0:01:22.159,0:01:26.240
-would like to remind you that this free
-
-0:01:23.920,0:01:29.280
-energy was the quadratic
+would like to remind you that this free energy was the quadratic
 
 0:01:26.240,0:01:31.600
-uh a cliton distance from the model
-
-0:01:29.280,0:01:35.119
-manifold right so all points that are
+uh a cliton distance from the model manifold right so all points that are
 
 0:01:31.600,0:01:36.720
-within the um within the model manifold
-
-0:01:35.119,0:01:39.360
-they have zero cost right
+within the um within the model manifold they have zero cost right
 
 0:01:36.720,0:01:40.240
-this is sorry zero energy free energy
-
-0:01:39.360,0:01:41.759
-because again the
+this is sorry zero energy free energy because again the
 
 0:01:40.240,0:01:44.240
-the distance between them and the
-
-0:01:41.759,0:01:45.840
-manifold is zero so zero squared is zero
+the distance between them and the manifold is zero so zero squared is zero
 
 0:01:44.240,0:01:47.360
-and then as you move away it's gonna
-
-0:01:45.840,0:01:50.799
-it's gonna increase up
+and then as you move away it's gonna it's gonna increase up
 
 0:01:47.360,0:01:54.159
-uh quadratically so
-
-0:01:50.799,0:01:55.920
-uh so far everything should be uh
+uh quadratically so uh so far everything should be uh
 
 0:01:54.159,0:01:58.000
-known understood and you know you you
-
-0:01:55.920,0:01:58.719
-took yeah you had one week to to go over
+known understood and you know you you took yeah you had one week to to go over
 
 0:01:58.000,0:02:01.759
-this stuff so
-
-0:01:58.719,0:02:04.079
-i i assume everyone is quite familiar
+this stuff so i i assume everyone is quite familiar
 
 0:02:01.759,0:02:06.079
-so something that you may notice right
-
-0:02:04.079,0:02:07.840
-now is gonna be in the side
+so something that you may notice right now is gonna be in the side
 
 0:02:06.079,0:02:11.039
-of these ellipse you're going to have
-
-0:02:07.840,0:02:13.200
-like a region that is slightly
+of these ellipse you're going to have like a region that is slightly
 
 0:02:11.039,0:02:14.879
-slightly lighter right you can see a
-
-0:02:13.200,0:02:17.680
-lighter degree of
+slightly lighter right you can see a lighter degree of
 
 0:02:14.879,0:02:18.879
-purple so what's going on over there so
-
-0:02:17.680,0:02:22.000
-let me show you this
+purple so what's going on over there so let me show you this
 
 0:02:18.879,0:02:23.440
-uh image here uh with the height
-
-0:02:22.000,0:02:25.280
-you know proportional to the actual
+uh image here uh with the height you know proportional to the actual
 
 0:02:23.440,0:02:28.160
-height of this um
-
-0:02:25.280,0:02:30.080
-of this free energy okay so i'm gonna
+height of this um of this free energy okay so i'm gonna
 
 0:02:28.160,0:02:32.319
-change the color map such that
-
-0:02:30.080,0:02:34.239
-uh you can clearly see what's going on
+change the color map such that uh you can clearly see what's going on
 
 0:02:32.319,0:02:34.720
-and i'm gonna be using this one which is
-
-0:02:34.239,0:02:37.840
-called
+and i'm gonna be using this one which is called
 
 0:02:34.720,0:02:40.400
-cold warm so cold means like
-
-0:02:37.840,0:02:42.080
-f infinity equals zero i'm going to be
+cold warm so cold means like f infinity equals zero i'm going to be
 
 0:02:40.400,0:02:45.120
-using the blue color
-
-0:02:42.080,0:02:45.680
-for f infinity equal 0.5 i'm gonna be
+using the blue color for f infinity equal 0.5 i'm gonna be
 
 0:02:45.120,0:02:48.400
-using
-
-0:02:45.680,0:02:50.480
-a gray color and then for everything
+using a gray color and then for everything
 
 0:02:48.400,0:02:55.120
-that is above and beyond
-
-0:02:50.480,0:02:57.680
-f infinity one is going to be in red
+that is above and beyond f infinity one is going to be in red
 
 0:02:55.120,0:02:59.680
-and so this is going to be um the the
-
-0:02:57.680,0:03:00.720
-image you saw before now that was like
+and so this is going to be um the the image you saw before now that was like
 
 0:02:59.680,0:03:03.120
-simply saw from
-
-0:03:00.720,0:03:04.080
-uh from top here i'm gonna show you the
+simply saw from uh from top here i'm gonna show you the
 
 0:03:03.120,0:03:07.360
-contour
-
-0:03:04.080,0:03:08.959
-so each uh line here they share the same
+contour so each uh line here they share the same
 
 0:03:07.360,0:03:11.519
-value of the free energy
-
-0:03:08.959,0:03:13.760
-okay so let me spin this little guy so
+value of the free energy okay so let me spin this little guy so
 
 0:03:11.519,0:03:16.560
-it's that you can see all around
-
-0:03:13.760,0:03:17.120
-as you can tell all the regions like the
+it's that you can see all around as you can tell all the regions like the
 
 0:03:16.560,0:03:20.400
-height
-
-0:03:17.120,0:03:22.239
-around the the the ellipse that is
+height around the the the ellipse that is
 
 0:03:20.400,0:03:23.440
-with the the manifold ellipse is gonna
-
-0:03:22.239,0:03:25.280
-have zero energy
+with the the manifold ellipse is gonna have zero energy
 
 0:03:23.440,0:03:27.440
-and as you move away from that you're
-
-0:03:25.280,0:03:29.599
-gonna have like a quadratic thing right
+and as you move away from that you're gonna have like a quadratic thing right
 
 0:03:27.440,0:03:33.519
-so you're gonna have like a parabola
-
-0:03:29.599,0:03:35.120
-uh what you notice is that in the center
+so you're gonna have like a parabola uh what you notice is that in the center
 
 0:03:33.519,0:03:36.799
-so on the outside of course is going to
-
-0:03:35.120,0:03:38.400
-be like a parabola but in the center
+so on the outside of course is going to be like a parabola but in the center
 
 0:03:36.799,0:03:39.840
-those two things are going to be going
-
-0:03:38.400,0:03:43.200
-up on a peak
+those two things are going to be going up on a peak
 
 0:03:39.840,0:03:46.720
-right and this might
-
-0:03:43.200,0:03:48.879
-or might not be wanted and so the
+right and this might or might not be wanted and so the
 
 0:03:46.720,0:03:50.879
-this we're gonna start today lesson by
-
-0:03:48.879,0:03:53.920
-learning how to relax
+this we're gonna start today lesson by learning how to relax
 
 0:03:50.879,0:03:56.000
-this uh free energy this infinite zero
-
-0:03:53.920,0:03:58.879
-temperature limit free energy
+this uh free energy this infinite zero temperature limit free energy
 
 0:03:56.000,0:03:59.599
-to a more you know uh a free energy
-
-0:03:58.879,0:04:01.599
-without
+to a more you know uh a free energy without
 
 0:03:59.599,0:04:03.920
-local minima such that you know it's a
-
-0:04:01.599,0:04:06.080
-bit more smooth
+local minima such that you know it's a bit more smooth
 
 0:04:03.920,0:04:07.599
-let me take here a cross section of this
-
-0:04:06.080,0:04:10.159
-you know bathtub
+let me take here a cross section of this you know bathtub
 
 0:04:07.599,0:04:10.640
-for y one equals zero so i'm gonna be
-
-0:04:10.159,0:04:13.680
-chaff
+for y one equals zero so i'm gonna be chaff
 
 0:04:10.640,0:04:15.519
-chopping it in a correspondence of y one
-
-0:04:13.680,0:04:17.359
-equals zero
+chopping it in a correspondence of y one equals zero
 
 0:04:15.519,0:04:19.359
-so what we get is going to be the
-
-0:04:17.359,0:04:22.240
-following you're gonna see now
+so what we get is going to be the following you're gonna see now
 
 0:04:19.359,0:04:24.080
-that those two branches are gonna be my
-
-0:04:22.240,0:04:26.400
-parabolic branches right
+that those two branches are gonna be my parabolic branches right
 
 0:04:24.080,0:04:27.520
-so again what is this free energy free
-
-0:04:26.400,0:04:30.800
-energy
+so again what is this free energy free energy
 
 0:04:27.520,0:04:31.919
-was the square distance of your given
-
-0:04:30.800,0:04:34.720
-point
+was the square distance of your given point
 
 0:04:31.919,0:04:35.360
-to the closest point on the manifold
-
-0:04:34.720,0:04:37.600
-right
+to the closest point on the manifold right
 
 0:04:35.360,0:04:38.759
-so if you're on the manifold which is
-
-0:04:37.600,0:04:42.400
-like location
+so if you're on the manifold which is like location
 
 0:04:38.759,0:04:43.120
-0.4 for example then the distance
-
-0:04:42.400,0:04:45.840
-between you
+0.4 for example then the distance between you
 
 0:04:43.120,0:04:47.280
-and the manifold is going to be zero and
-
-0:04:45.840,0:04:48.160
-therefore the square of zero is going to
+and the manifold is going to be zero and therefore the square of zero is going to
 
 0:04:47.280,0:04:50.240
-be zero
-
-0:04:48.160,0:04:52.240
-as you move away let's say we move to
+be zero as you move away let's say we move to
 
 0:04:50.240,0:04:55.520
-the right hand side of this
-
-0:04:52.240,0:04:56.080
-0.4 as you move linearly to the right
+the right hand side of this 0.4 as you move linearly to the right
 
 0:04:55.520,0:04:57.759
-hand side
-
-0:04:56.080,0:04:59.360
-you're going to be increasing
+hand side you're going to be increasing
 
 0:04:57.759,0:05:00.320
-quadratically right that's why we
-
-0:04:59.360,0:05:02.960
-observe
+quadratically right that's why we observe
 
 0:05:00.320,0:05:04.320
-this energy free energy going up
-
-0:05:02.960,0:05:06.720
-quadratically
+this energy free energy going up quadratically
 
 0:05:04.320,0:05:07.919
-similarly what happens on the other side
-
-0:05:06.720,0:05:09.919
-of course
+similarly what happens on the other side of course
 
 0:05:07.919,0:05:11.919
-the same happens as you move towards the
-
-0:05:09.919,0:05:13.120
-zero right and so as you move towards
+the same happens as you move towards the zero right and so as you move towards
 
 0:05:11.919,0:05:16.560
-the zero you're gonna get
-
-0:05:13.120,0:05:19.120
-that you try to climb up that parabola
+the zero you're gonna get that you try to climb up that parabola
 
 0:05:16.560,0:05:19.840
-and we have this peak over here and so
-
-0:05:19.120,0:05:21.680
-in the next
+and we have this peak over here and so in the next
 
 0:05:19.840,0:05:24.160
-slide we're gonna be learning how to
-
-0:05:21.680,0:05:26.720
-smooth that peak
+slide we're gonna be learning how to smooth that peak
 
 0:05:24.160,0:05:27.840
-i'll let you i tell you later why we uh
-
-0:05:26.720,0:05:30.560
-what is very why
+i'll let you i tell you later why we uh what is very why
 
 0:05:27.840,0:05:31.039
-this is very useful like why we why my
-
-0:05:30.560,0:05:34.560
-why
+this is very useful like why we why my why
 
 0:05:31.039,0:05:38.479
-we might want to do so okay
-
-0:05:34.560,0:05:39.120
-so free energy we we know right the the
+we might want to do so okay so free energy we we know right the the
 
 0:05:38.479,0:05:42.400
-minimum
-
-0:05:39.120,0:05:45.440
-value of the energy e that
+minimum value of the energy e that
 
 0:05:42.400,0:05:47.759
-is spanning across y and z right so
-
-0:05:45.440,0:05:50.639
-you have this energy we saw that uh for
+is spanning across y and z right so you have this energy we saw that uh for
 
 0:05:47.759,0:05:53.600
-a given y we have like an energy over z
-
-0:05:50.639,0:05:55.199
-and then the free energy was the value
+a given y we have like an energy over z and then the free energy was the value
 
 0:05:53.600,0:05:58.000
-of the energy correspondent
-
-0:05:55.199,0:05:59.600
-to the location where we have the
+of the energy correspondent to the location where we have the
 
 0:05:58.000,0:06:00.479
-minimum value right so the minimum value
-
-0:05:59.600,0:06:04.080
-of this
+minimum value right so the minimum value of this
 
 0:06:00.479,0:06:05.759
-e is going to be my free energy
-
-0:06:04.080,0:06:08.720
-now i'm going to be introducing a
+e is going to be my free energy now i'm going to be introducing a
 
 0:06:05.759,0:06:12.479
-relaxed version which is going to be
-
-0:06:08.720,0:06:14.880
-this uh purple f so
+relaxed version which is going to be this uh purple f so
 
 0:06:12.479,0:06:15.600
-this purple f function parameterized by
-
-0:06:14.880,0:06:19.919
-beta
+this purple f function parameterized by beta
 
 0:06:15.600,0:06:21.199
-is going to be simply this expression uh
-
-0:06:19.919,0:06:24.400
-what is this beta
+is going to be simply this expression uh what is this beta
 
 0:06:21.199,0:06:26.479
-right so this beta it's in
-
-0:06:24.400,0:06:27.680
-physics it's called the inverse
+right so this beta it's in physics it's called the inverse
 
 0:06:26.479,0:06:29.759
-temperature
-
-0:06:27.680,0:06:31.360
-the thermo thermodynamic inverse
+temperature the thermo thermodynamic inverse
 
 0:06:29.759,0:06:34.560
-temperature or the
-
-0:06:31.360,0:06:37.360
-coldness and it's simply one over
+temperature or the coldness and it's simply one over
 
 0:06:34.560,0:06:40.080
-uh kb which is the boltzmann constant
-
-0:06:37.360,0:06:42.560
-multiplied by the temperature okay
+uh kb which is the boltzmann constant multiplied by the temperature okay
 
 0:06:40.080,0:06:44.720
-so again if t that capital t the
-
-0:06:42.560,0:06:46.479
-temperature is very very very high
+so again if t that capital t the temperature is very very very high
 
 0:06:44.720,0:06:48.080
-like it's very warm like you're on the
-
-0:06:46.479,0:06:52.160
-sun beta is gonna be
+like it's very warm like you're on the sun beta is gonna be
 
 0:06:48.080,0:06:54.639
-extremely small right it's gonna be zero
-
-0:06:52.160,0:06:55.919
-instead if temperature the temperature
+extremely small right it's gonna be zero instead if temperature the temperature
 
 0:06:54.639,0:06:58.400
-is cold like
-
-0:06:55.919,0:06:59.599
-zero kelvin then automatically you're
+is cold like zero kelvin then automatically you're
 
 0:06:58.400,0:07:03.520
-gonna get that beta
-
-0:06:59.599,0:07:06.720
-it's plus infinity right and so
+gonna get that beta it's plus infinity right and so
 
 0:07:03.520,0:07:10.240
-now you can understand why
-
-0:07:06.720,0:07:12.080
-i call my f infinity the zero
+now you can understand why i call my f infinity the zero
 
 0:07:10.240,0:07:15.039
-temperature limit
-
-0:07:12.080,0:07:16.160
-free energy so it's zero temperature
+temperature limit free energy so it's zero temperature
 
 0:07:15.039,0:07:18.800
-it's super cold right
-
-0:07:16.160,0:07:19.840
-so capital t is zero meaning beta is
+it's super cold right so capital t is zero meaning beta is
 
 0:07:18.800,0:07:22.080
-plus infinity
-
-0:07:19.840,0:07:24.240
-so again if you have this free energy
+plus infinity so again if you have this free energy
 
 0:07:22.080,0:07:25.680
-with so-called free energy
-
-0:07:24.240,0:07:28.160
-the free energy is going to be exactly
+with so-called free energy the free energy is going to be exactly
 
 0:07:25.680,0:07:30.160
-the minimum otherwise if you relax
-
-0:07:28.160,0:07:31.840
-this constraint as you warm up a little
+the minimum otherwise if you relax this constraint as you warm up a little
 
 0:07:30.160,0:07:33.840
-bit this free energy
-
-0:07:31.840,0:07:35.680
-the free energy is going to be a
+bit this free energy the free energy is going to be a
 
 0:07:33.840,0:07:38.800
-summation of multiple
-
-0:07:35.680,0:07:40.960
-things right so this s here is the s for
+summation of multiple things right so this s here is the s for
 
 0:07:38.800,0:07:43.680
-sum is a summation of all these
-
-0:07:40.960,0:07:46.720
-components here multiplied by the
+sum is a summation of all these components here multiplied by the
 
 0:07:43.680,0:07:48.879
-interval cool
-
-0:07:46.720,0:07:50.080
-this symbol over here it's simply the
+interval cool this symbol over here it's simply the
 
 0:07:48.879,0:07:53.280
-measure
-
-0:07:50.080,0:07:56.319
-of the domain of z so in our case
+measure of the domain of z so in our case
 
 0:07:53.280,0:07:56.960
-uh z goes from zero to two pi and
-
-0:07:56.319,0:08:00.000
-therefore
+uh z goes from zero to two pi and therefore
 
 0:07:56.960,0:08:00.560
-this item over here it simply means two
-
-0:08:00.000,0:08:03.680
-pi
+this item over here it simply means two pi
 
 0:08:00.560,0:08:07.520
-okay all right all right but who
-
-0:08:03.680,0:08:09.759
-who remembers what this kbt is right
+okay all right all right but who who remembers what this kbt is right
 
 0:08:07.520,0:08:11.199
-what is this kbt why are we talking
-
-0:08:09.759,0:08:14.160
-about energies right
+what is this kbt why are we talking about energies right
 
 0:08:11.199,0:08:14.960
-so again from physics no 101 you might
-
-0:08:14.160,0:08:18.000
-remember that
+so again from physics no 101 you might remember that
 
 0:08:14.960,0:08:20.319
-the average
-
-0:08:18.000,0:08:21.039
-translational kinetic energy was the two
+the average translational kinetic energy was the two
 
 0:08:20.319,0:08:25.039
-third
-
-0:08:21.039,0:08:25.840
-kbt no and therefore kbt or two third
+third kbt no and therefore kbt or two third
 
 0:08:25.039,0:08:29.199
-kbt
-
-0:08:25.840,0:08:30.080
-express the uh kinetic energy right of
+kbt express the uh kinetic energy right of
 
 0:08:29.199,0:08:33.200
-this
-
-0:08:30.080,0:08:35.680
-let's say gas with all those particles
+this let's say gas with all those particles
 
 0:08:33.200,0:08:36.959
-and so the temperature allows you to
-
-0:08:35.680,0:08:38.959
-express
+and so the temperature allows you to express
 
 0:08:36.959,0:08:41.360
-uh the energy right so you have
-
-0:08:38.959,0:08:44.560
-temperature and energy are connected
+uh the energy right so you have temperature and energy are connected
 
 0:08:41.360,0:08:46.720
-um so you can make uh
-
-0:08:44.560,0:08:48.800
-a quick you know check check here and
+um so you can make uh a quick you know check check here and
 
 0:08:46.720,0:08:51.920
-beta since it's going to be the inverse
-
-0:08:48.800,0:08:55.600
-of kbt it's going to be in one over
+beta since it's going to be the inverse of kbt it's going to be in one over
 
 0:08:51.920,0:08:58.800
-joule right and so here we have these
-
-0:08:55.600,0:09:00.800
-one over joule means that this stuff is
+joule right and so here we have these one over joule means that this stuff is
 
 0:08:58.800,0:09:01.839
-joule therefore f is going to be an
-
-0:09:00.800,0:09:04.080
-energy
+joule therefore f is going to be an energy
 
 0:09:01.839,0:09:04.880
-and then inside this exponential we have
-
-0:09:04.080,0:09:07.680
-one over
+and then inside this exponential we have one over
 
 0:09:04.880,0:09:09.279
-joule times the e which is joule and
-
-0:09:07.680,0:09:12.399
-then if you multiply the two you
+joule times the e which is joule and then if you multiply the two you
 
 0:09:09.279,0:09:12.800
-then the two you know units cancel out
-
-0:09:12.399,0:09:16.000
-so
+then the two you know units cancel out so
 
 0:09:12.800,0:09:18.560
-everything works just fine
-
-0:09:16.000,0:09:19.360
-all right all right all right um and
+everything works just fine all right all right all right um and
 
 0:09:18.560,0:09:21.760
-also yes
-
-0:09:19.360,0:09:23.279
-the the dimension of z cancel out with
+also yes the the dimension of z cancel out with
 
 0:09:21.760,0:09:24.080
-this dimension right so everything is
-
-0:09:23.279,0:09:27.360
-just pure
+this dimension right so everything is just pure
 
 0:09:24.080,0:09:29.040
-uh pure number okay again these are this
-
-0:09:27.360,0:09:30.640
-is not machine learning this is physics
+uh pure number okay again these are this is not machine learning this is physics
 
 0:09:29.040,0:09:32.320
-just to give you a little bit of you
-
-0:09:30.640,0:09:34.800
-know uh
+just to give you a little bit of you know uh
 
 0:09:32.320,0:09:36.320
-overview about what this stuff where
-
-0:09:34.800,0:09:38.000
-this stuff comes from right so this is
+overview about what this stuff where this stuff comes from right so this is
 
 0:09:36.320,0:09:39.839
-just from our friends from the physics
-
-0:09:38.000,0:09:41.600
-department
+just from our friends from the physics department
 
 0:09:39.839,0:09:43.600
-all right all right all right so i want
-
-0:09:41.600,0:09:45.519
-to compute this free energy in this
+all right all right all right so i want to compute this free energy in this
 
 0:09:43.600,0:09:48.720
-relaxed version of this
-
-0:09:45.519,0:09:50.399
-uh free energy uh since i don't want to
+relaxed version of this uh free energy uh since i don't want to
 
 0:09:48.720,0:09:53.360
-compute this integral
-
-0:09:50.399,0:09:55.440
-i may not know how to do that i simply
+compute this integral i may not know how to do that i simply
 
 0:09:53.360,0:09:58.959
-use a simple discretization right
-
-0:09:55.440,0:10:01.519
-and so i replace this latin s with a
+use a simple discretization right and so i replace this latin s with a
 
 0:09:58.959,0:10:02.320
-greek s right and then i replace this
-
-0:10:01.519,0:10:05.040
-latin d
+greek s right and then i replace this latin d
 
 0:10:02.320,0:10:06.800
-with a greek t so everything else is
-
-0:10:05.040,0:10:09.440
-just the same so i go from the
+with a greek t so everything else is just the same so i go from the
 
 0:10:06.800,0:10:11.680
-time continuous to a discretization very
-
-0:10:09.440,0:10:13.600
-simple discretization it works
+time continuous to a discretization very simple discretization it works
 
 0:10:11.680,0:10:15.519
-in our case because z is like one
-
-0:10:13.600,0:10:18.720
-dimension i saw you know everything is
+in our case because z is like one dimension i saw you know everything is
 
 0:10:15.519,0:10:22.160
-pretty easy uh
-
-0:10:18.720,0:10:24.480
-moreover here i will just define and
+pretty easy uh moreover here i will just define and
 
 0:10:22.160,0:10:25.760
-pay attention i am defining right now
-
-0:10:24.480,0:10:29.200
-for this class
+pay attention i am defining right now for this class
 
 0:10:25.760,0:10:32.880
-okay this thing uh has been the
-
-0:10:29.200,0:10:36.480
-soft mean of e so my
+okay this thing uh has been the soft mean of e so my
 
 0:10:32.880,0:10:39.120
-free energy uh the purple one
-
-0:10:36.480,0:10:39.839
-it's simply the relaxation of the zero
+free energy uh the purple one it's simply the relaxation of the zero
 
 0:10:39.120,0:10:42.079
-temperature
-
-0:10:39.839,0:10:43.120
-limit is going to be simply this soft
+temperature limit is going to be simply this soft
 
 0:10:42.079,0:10:45.360
-mean so
-
-0:10:43.120,0:10:46.560
-the zero temperature the super cold one
+mean so the zero temperature the super cold one
 
 0:10:45.360,0:10:50.000
-is simply the mean
-
-0:10:46.560,0:10:52.480
-okay am i n min whereas if i
+is simply the mean okay am i n min whereas if i
 
 0:10:50.000,0:10:53.120
-compute if i relax if i turn on the
-
-0:10:52.480,0:10:55.680
-temperature
+compute if i relax if i turn on the temperature
 
 0:10:53.120,0:10:56.640
-like i increase the thermostat i'm gonna
-
-0:10:55.680,0:10:59.760
-have this
+like i increase the thermostat i'm gonna have this
 
 0:10:56.640,0:11:02.399
-soft mean which is this log
-
-0:10:59.760,0:11:03.360
-summation of exponential okay and i call
+soft mean which is this log summation of exponential okay and i call
 
 0:11:02.399,0:11:05.920
-this actual
-
-0:11:03.360,0:11:07.279
-soft mean why do i call it actual soft
+this actual soft mean why do i call it actual soft
 
 0:11:05.920,0:11:10.560
-meme because other people
-
-0:11:07.279,0:11:12.720
-uh most of the people outside this class
+meme because other people uh most of the people outside this class
 
 0:11:10.560,0:11:14.000
-will call this the soft means something
-
-0:11:12.720,0:11:16.480
-else and i will let you
+will call this the soft means something else and i will let you
 
 0:11:14.000,0:11:18.480
-know a bit more about these in a few
-
-0:11:16.480,0:11:21.920
-slides okay
+know a bit more about these in a few slides okay
 
 0:11:18.480,0:11:25.680
-something that is super interesting is
-
-0:11:21.920,0:11:29.440
-computing the limit of this free energy
+something that is super interesting is computing the limit of this free energy
 
 0:11:25.680,0:11:31.440
-here for beta that goes to zero so
-
-0:11:29.440,0:11:33.120
-whenever you increase the temperature
+here for beta that goes to zero so whenever you increase the temperature
 
 0:11:31.440,0:11:35.279
-as the temperature on the sun like it's
-
-0:11:33.120,0:11:39.200
-super warm what is the most
+as the temperature on the sun like it's super warm what is the most
 
 0:11:35.279,0:11:41.920
-relaxed version of this min
-
-0:11:39.200,0:11:43.360
-and so if you do that you're gonna see
+relaxed version of this min and so if you do that you're gonna see
 
 0:11:41.920,0:11:46.480
-that
-
-0:11:43.360,0:11:48.320
-this stuff ends up being the average but
+that this stuff ends up being the average but
 
 0:11:46.480,0:11:50.720
-again this is just you know
-
-0:11:48.320,0:11:51.519
-um it's not relevant it's not too
+again this is just you know um it's not relevant it's not too
 
 0:11:50.720,0:11:54.720
-important
-
-0:11:51.519,0:11:55.120
-uh is the derivation just i can show you
+important uh is the derivation just i can show you
 
 0:11:54.720,0:11:56.959
-here
-
-0:11:55.120,0:11:58.240
-and i just show you so you can you have
+here and i just show you so you can you have
 
 0:11:56.959,0:12:01.279
-access later
-
-0:11:58.240,0:12:03.920
-uh the limit of this free energy
+access later uh the limit of this free energy
 
 0:12:01.279,0:12:04.399
-for beta that goes to zero so it's very
-
-0:12:03.920,0:12:06.959
-warm
+for beta that goes to zero so it's very warm
 
 0:12:04.399,0:12:07.920
-super warm it ends up being simply the
-
-0:12:06.959,0:12:11.839
-average
+super warm it ends up being simply the average
 
 0:12:07.920,0:12:15.279
-of the energy okay across those heads
-
-0:12:11.839,0:12:17.200
-again not to uh you don't have to get
+of the energy okay across those heads again not to uh you don't have to get
 
 0:12:15.279,0:12:19.440
-scared about that math
-
-0:12:17.200,0:12:20.800
-all right so let's compute this free
+scared about that math all right so let's compute this free
 
 0:12:19.440,0:12:23.040
-energy for
-
-0:12:20.800,0:12:24.079
-the cases we saw before right so we are
+energy for the cases we saw before right so we are
 
 0:12:23.040,0:12:25.920
-still doing inference
-
-0:12:24.079,0:12:27.760
-as last time but instead of using the
+still doing inference as last time but instead of using the
 
 0:12:25.920,0:12:29.360
-cold inference the cold free energy
-
-0:12:27.760,0:12:30.560
-we're going to use this you know relaxed
+cold inference the cold free energy we're going to use this you know relaxed
 
 0:12:29.360,0:12:34.000
-version
-
-0:12:30.560,0:12:37.279
-for the y equal 23 so if you remember
+version for the y equal 23 so if you remember
 
 0:12:34.000,0:12:39.440
-the y equal 23 was this x
-
-0:12:37.279,0:12:42.160
-the green x on the right hand side and
+the y equal 23 was this x the green x on the right hand side and
 
 0:12:39.440,0:12:44.800
-then here the free energy was the square
-
-0:12:42.160,0:12:46.560
-of the distance between the blue x and
+then here the free energy was the square of the distance between the blue x and
 
 0:12:44.800,0:12:49.440
-the green x right so the
-
-0:12:46.560,0:12:52.240
-the distance was 0.5 square would would
+the green x right so the the distance was 0.5 square would would
 
 0:12:49.440,0:12:53.519
-have been 0.25 and that would have been
-
-0:12:52.240,0:12:56.240
-the free energy
+have been 0.25 and that would have been the free energy
 
 0:12:53.519,0:12:57.040
-uh zero temperature zero zero
-
-0:12:56.240,0:12:59.680
-temperature
+uh zero temperature zero zero temperature
 
 0:12:57.040,0:13:01.440
-limit free energy but in this case we
-
-0:12:59.680,0:13:04.399
-have now to consider
+limit free energy but in this case we have now to consider
 
 0:13:01.440,0:13:05.120
-all these contributions and so i'm gonna
-
-0:13:04.399,0:13:08.959
-show you
+all these contributions and so i'm gonna show you
 
 0:13:05.120,0:13:12.560
-how all those little
-
-0:13:08.959,0:13:15.920
-z dz will contribute to this free energy
+how all those little z dz will contribute to this free energy
 
 0:13:12.560,0:13:19.519
-and so we choose a beta equal 1
-
-0:13:15.920,0:13:21.760
-and we have now this so given that y
+and so we choose a beta equal 1 and we have now this so given that y
 
 0:13:19.519,0:13:23.760
-prime is going to be this x on the right
-
-0:13:21.760,0:13:27.600
-hand side
+prime is going to be this x on the right hand side
 
 0:13:23.760,0:13:31.040
-my free energy now comes from the
-
-0:13:27.600,0:13:31.920
-addition of all these uh terms here the
+my free energy now comes from the addition of all these uh terms here the
 
 0:13:31.040,0:13:35.120
-exponential
-
-0:13:31.920,0:13:38.399
-of you know minus the energy of
+exponential of you know minus the energy of
 
 0:13:35.120,0:13:40.000
-all of this right so all the squares
-
-0:13:38.399,0:13:41.920
-like the exponential of the negative
+all of this right so all the squares like the exponential of the negative
 
 0:13:40.000,0:13:44.560
-squares right
-
-0:13:41.920,0:13:45.440
-so as you can tell those points that are
+squares right so as you can tell those points that are
 
 0:13:44.560,0:13:48.560
-close to the
-
-0:13:45.440,0:13:50.240
-x will have like a
+close to the x will have like a
 
 0:13:48.560,0:13:51.920
-smaller energy and therefore the
-
-0:13:50.240,0:13:54.480
-exponential will be larger
+smaller energy and therefore the exponential will be larger
 
 0:13:51.920,0:13:55.920
-and that's why you can see them but for
-
-0:13:54.480,0:13:58.320
-energy that are you know
+and that's why you can see them but for energy that are you know
 
 0:13:55.920,0:14:00.240
-further away that are very high energy
-
-0:13:58.320,0:14:01.839
-you do not do the exponential of main
+further away that are very high energy you do not do the exponential of main
 
 0:14:00.240,0:14:04.079
-minus and large number you're going to
-
-0:14:01.839,0:14:07.040
-get basically zero so they don't count
+minus and large number you're going to get basically zero so they don't count
 
 0:14:04.079,0:14:09.120
-in this summation in this integral okay
-
-0:14:07.040,0:14:12.560
-first question for people at home
+in this summation in this integral okay first question for people at home
 
 0:14:09.120,0:14:17.600
-to just check if you are following
-
-0:14:12.560,0:14:20.079
-how that where does 0.75 come from
+to just check if you are following how that where does 0.75 come from
 
 0:14:17.600,0:14:21.120
-so where does this value over here come
-
-0:14:20.079,0:14:23.760
-from
+so where does this value over here come from
 
 0:14:21.120,0:14:25.279
-and you're supposed to type on the chart
-
-0:14:23.760,0:14:27.920
-such that i can read
+and you're supposed to type on the chart such that i can read
 
 0:14:25.279,0:14:29.199
-aloud what you're saying so i'm asking
-
-0:14:27.920,0:14:33.680
-once again
+aloud what you're saying so i'm asking once again
 
 0:14:29.199,0:14:36.240
-where does this value over here 075
-
-0:14:33.680,0:14:36.240
-come from
+where does this value over here 075 come from
 
 0:14:39.440,0:14:45.839
-and someone is to reply
-
-0:14:47.480,0:14:53.440
-contribution to the energy yes yes no
+and someone is to reply contribution to the energy yes yes no
 
 0:14:49.760,0:14:56.399
-the number 075 i i need you to tell me
-
-0:14:53.440,0:14:58.240
-how to compute 0.75 where does that
+the number 075 i i need you to tell me how to compute 0.75 where does that
 
 0:14:56.399,0:15:02.399
-number come from
-
-0:14:58.240,0:15:04.560
-you have all these closest why
+number come from you have all these closest why
 
 0:15:02.399,0:15:05.600
-till then yeah tell me uh how do i how
-
-0:15:04.560,0:15:08.800
-do i compute
+till then yeah tell me uh how do i how do i compute
 
 0:15:05.600,0:15:08.800
-1 over 2 pi no
-
-0:15:10.880,0:15:17.600
-x okay x minus beta e okay so how much
+1 over 2 pi no x okay x minus beta e okay so how much
 
 0:15:15.199,0:15:17.600
-is e
-
-0:15:19.839,0:15:23.440
-e is the square distance right so how
+is e e is the square distance right so how
 
 0:15:22.240,0:15:26.639
-much is it
-
-0:15:23.440,0:15:30.240
-how much is okay e is 0 25
+much is it how much is okay e is 0 25
 
 0:15:26.639,0:15:34.079
-correct and so e to the minus
-
-0:15:30.240,0:15:37.120
-0 25 is going to be 0 75
+correct and so e to the minus 0 25 is going to be 0 75
 
 0:15:34.079,0:15:40.160
-correct okay
-
-0:15:37.120,0:15:43.360
-so jc got the right answer
+correct okay so jc got the right answer
 
 0:15:40.160,0:15:43.839
-good job so great now we know where that
-
-0:15:43.360,0:15:45.680
-number
+good job so great now we know where that number
 
 0:15:43.839,0:15:48.000
-comes from so every time you see this
-
-0:15:45.680,0:15:49.600
-diagram so although it looks very sparse
+comes from so every time you see this diagram so although it looks very sparse
 
 0:15:48.000,0:15:51.839
-and pretty and whatever you always have
-
-0:15:49.600,0:15:52.800
-to pay attention to the number i put on
+and pretty and whatever you always have to pay attention to the number i put on
 
 0:15:51.839,0:15:54.560
-this
-
-0:15:52.800,0:15:57.120
-on the screen right so those numbers are
+this on the screen right so those numbers are
 
 0:15:54.560,0:16:00.560
-not random number they are
-
-0:15:57.120,0:16:03.519
-computed by my computer and you always
+not random number they are computed by my computer and you always
 
 0:16:00.560,0:16:04.160
-always always have to check on a piece
-
-0:16:03.519,0:16:06.720
-of paper
+always always have to check on a piece of paper
 
 0:16:04.160,0:16:08.240
-that these numbers make sense because if
-
-0:16:06.720,0:16:09.600
-they don't make sense
+that these numbers make sense because if they don't make sense
 
 0:16:08.240,0:16:11.600
-then you're not understanding what's
-
-0:16:09.600,0:16:12.560
-going on okay so you have to pay
+then you're not understanding what's going on okay so you have to pay
 
 0:16:11.600,0:16:15.600
-attention
-
-0:16:12.560,0:16:18.240
-to the numbers and you know
+attention to the numbers and you know
 
 0:16:15.600,0:16:19.519
-okay i'm a physicist right so you always
-
-0:16:18.240,0:16:22.480
-i always
+okay i'm a physicist right so you always i always
 
 0:16:19.519,0:16:23.839
-uh have in advance in my mind the answer
-
-0:16:22.480,0:16:26.079
-that my program my
+uh have in advance in my mind the answer that my program my
 
 0:16:23.839,0:16:28.000
-network my whatever is supposed to do
-
-0:16:26.079,0:16:29.600
-right if i make an electronic circuit i
+network my whatever is supposed to do right if i make an electronic circuit i
 
 0:16:28.000,0:16:31.360
-must understand
-
-0:16:29.600,0:16:32.720
-i must know in advance what is the you
+must understand i must know in advance what is the you
 
 0:16:31.360,0:16:34.720
-know voltage somewhere
-
-0:16:32.720,0:16:36.399
-here and there before i actually measure
+know voltage somewhere here and there before i actually measure
 
 0:16:34.720,0:16:39.600
-it otherwise uh
-
-0:16:36.399,0:16:41.759
-you know it don't go much ahead
+it otherwise uh you know it don't go much ahead
 
 0:16:39.600,0:16:43.680
-all right all right all right so let's
-
-0:16:41.759,0:16:46.480
-move on and let's now consider
+all right all right all right so let's move on and let's now consider
 
 0:16:43.680,0:16:47.519
-instead the case for when i have y y
-
-0:16:46.480,0:16:51.279
-prime
+instead the case for when i have y y prime
 
 0:16:47.519,0:16:52.240
-equal 10 right so the 10th item so which
-
-0:16:51.279,0:16:55.839
-is the
+equal 10 right so the 10th item so which is the
 
 0:16:52.240,0:16:58.560
-element on the top there so in this case
-
-0:16:55.839,0:16:59.839
-i'm gonna get that all those points here
+element on the top there so in this case i'm gonna get that all those points here
 
 0:16:58.560,0:17:02.160
-will contribute
-
-0:16:59.839,0:17:06.400
-to the free energy right in this case
+will contribute to the free energy right in this case
 
 0:17:02.160,0:17:09.520
-we're gonna have a number to 0.26
-
-0:17:06.400,0:17:12.799
-27 okay someone else that is not jesse
+we're gonna have a number to 0.26 27 okay someone else that is not jesse
 
 0:17:09.520,0:17:15.039
-uh can write on the chat how much
-
-0:17:12.799,0:17:17.039
-where that number comes from so where
+uh can write on the chat how much where that number comes from so where
 
 0:17:15.039,0:17:18.799
-does 026 come from
-
-0:17:17.039,0:17:21.439
-i think you must have understand
+does 026 come from i think you must have understand
 
 0:17:18.799,0:17:21.439
-understood now
-
-0:17:22.559,0:17:30.160
-e to the minus 1 kind of yes so
+understood now e to the minus 1 kind of yes so
 
 0:17:26.559,0:17:31.600
-so the distance here is 1.1 1.1 square
-
-0:17:30.160,0:17:34.000
-is going to be 1.2
+so the distance here is 1.1 1.1 square is going to be 1.2
 
 0:17:31.600,0:17:36.160
-and then you take e to the minus 1.2
-
-0:17:34.000,0:17:38.640
-which is 0 26 yeah
+and then you take e to the minus 1.2 which is 0 26 yeah
 
 0:17:36.160,0:17:39.919
-that's correct all right all right all
-
-0:17:38.640,0:17:42.960
-right okay so
+that's correct all right all right all right okay so
 
 0:17:39.919,0:17:47.840
-next question uh what happens now
-
-0:17:42.960,0:17:47.840
-if my y prime is going to be the origin
+next question uh what happens now if my y prime is going to be the origin
 
 0:17:48.480,0:17:53.760
-so what happens if my y prime is the
-
-0:17:52.000,0:17:54.559
-origin for the zero temperature you're
+so what happens if my y prime is the origin for the zero temperature you're
 
 0:17:53.760,0:17:57.280
-going to get the
-
-0:17:54.559,0:17:59.440
-square the distance right from either
+going to get the square the distance right from either
 
 0:17:57.280,0:18:03.120
-side
-
-0:17:59.440,0:18:04.400
-in this case what's going to be the main
+side in this case what's going to be the main
 
 0:18:03.120,0:18:05.919
-difference if you warm up the
-
-0:18:04.400,0:18:06.880
-temperature right so you're not zero
+difference if you warm up the temperature right so you're not zero
 
 0:18:05.919,0:18:08.240
-temperature it's not
-
-0:18:06.880,0:18:09.520
-it's not freezing cold we are going to
+temperature it's not it's not freezing cold we are going to
 
 0:18:08.240,0:18:10.559
-be increasing a little bit the
-
-0:18:09.520,0:18:13.919
-temperature
+be increasing a little bit the temperature
 
 0:18:10.559,0:18:14.400
-and how is this free energy changing
-
-0:18:13.919,0:18:19.360
-from
+and how is this free energy changing from
 
 0:18:14.400,0:18:19.360
-before anyone can type on the chart
-
-0:18:20.000,0:18:24.400
-it's symmetric yeah that's perfect how
+before anyone can type on the chart it's symmetric yeah that's perfect how
 
 0:18:23.120,0:18:26.320
-do you know
-
-0:18:24.400,0:18:28.240
-you already saw the slides before oh you
+do you know you already saw the slides before oh you
 
 0:18:26.320,0:18:31.919
-actually got it right
-
-0:18:28.240,0:18:31.919
-okay i assume you got it right
+actually got it right okay i assume you got it right
 
 0:18:32.000,0:18:35.360
-all right okay that's perfect yes it's
-
-0:18:33.760,0:18:38.559
-symmetric right so
+all right okay that's perfect yes it's symmetric right so
 
 0:18:35.360,0:18:42.559
-uh a point now inside
-
-0:18:38.559,0:18:43.120
-oh okay yeah i don't know if it's a he
+uh a point now inside oh okay yeah i don't know if it's a he
 
 0:18:42.559,0:18:45.120
-or she
-
-0:18:43.120,0:18:46.720
-but studied physics in the undergrad
+or she but studied physics in the undergrad
 
 0:18:45.120,0:18:49.919
-okay cool
-
-0:18:46.720,0:18:52.320
-all right uh so
+okay cool all right uh so
 
 0:18:49.919,0:18:54.080
-in this case you have again that all
-
-0:18:52.320,0:18:56.640
-those points on the top on the bottom
+in this case you have again that all those points on the top on the bottom
 
 0:18:54.080,0:18:58.720
-will contribute to the free energy uh
-
-0:18:56.640,0:19:01.760
-given that i choose that y
+will contribute to the free energy uh given that i choose that y
 
 0:18:58.720,0:19:04.880
-uh y prime to be in the center okay
-
-0:19:01.760,0:19:06.320
-all right so that's pretty much oh
+uh y prime to be in the center okay all right so that's pretty much oh
 
 0:19:04.880,0:19:09.520
-but why we are talking why are we
-
-0:19:06.320,0:19:12.640
-talking about this right so we came here
+but why we are talking why are we talking about this right so we came here
 
 0:19:09.520,0:19:14.720
-because we had that issue with the
-
-0:19:12.640,0:19:17.600
-picky picky center right i showed you
+because we had that issue with the picky picky center right i showed you
 
 0:19:14.720,0:19:19.360
-before that spinning bathtub
-
-0:19:17.600,0:19:21.280
-and then the cross-section here that we
+before that spinning bathtub and then the cross-section here that we
 
 0:19:19.360,0:19:23.120
-had this picky thing
-
-0:19:21.280,0:19:24.720
-which was coming from the cold free
+had this picky thing which was coming from the cold free
 
 0:19:23.120,0:19:26.799
-energy let's
-
-0:19:24.720,0:19:29.679
-let me show you what happens now if i
+energy let's let me show you what happens now if i
 
 0:19:26.799,0:19:32.880
-choose the warm free energy right
-
-0:19:29.679,0:19:35.919
-and so if i do that i'm gonna get if i
+choose the warm free energy right and so if i do that i'm gonna get if i
 
 0:19:32.880,0:19:35.919
-can scroll my screen
-
-0:19:37.600,0:19:40.880
-oh you don't see anything okay let me
+can scroll my screen oh you don't see anything okay let me
 
 0:19:39.679,0:19:43.840
-click click
-
-0:19:40.880,0:19:44.400
-okay all right and so the red one was
+click click okay all right and so the red one was
 
 0:19:43.840,0:19:47.120
-the
-
-0:19:44.400,0:19:47.919
-super cold the beta is the coldness
+the super cold the beta is the coldness
 
 0:19:47.120,0:19:51.120
-again so
-
-0:19:47.919,0:19:52.799
-large beta is called and then we
+again so large beta is called and then we
 
 0:19:51.120,0:19:54.880
-reduce the coldness so we increase the
-
-0:19:52.799,0:19:57.840
-temperature and as you can see the
+reduce the coldness so we increase the temperature and as you can see the
 
 0:19:54.880,0:19:58.720
-the picky part becomes smooth smooth
-
-0:19:57.840,0:20:02.400
-smooth
+the picky part becomes smooth smooth smooth
 
 0:19:58.720,0:20:05.120
-until it becomes oh
-
-0:20:02.400,0:20:06.159
-becomes some a parabola with a single
+until it becomes oh becomes some a parabola with a single
 
 0:20:05.120,0:20:09.200
-global
-
-0:20:06.159,0:20:11.600
-minima oh
+global minima oh
 
 0:20:09.200,0:20:15.039
-this is coming out to be remember what
-
-0:20:11.600,0:20:17.200
-happens if beta goes to zero
+this is coming out to be remember what happens if beta goes to zero
 
 0:20:15.039,0:20:21.200
-you get the average right so you
-
-0:20:17.200,0:20:24.559
-actually recover the msc
+you get the average right so you actually recover the msc
 
 0:20:21.200,0:20:28.480
-okay i'm just giving like small uh
-
-0:20:24.559,0:20:31.360
-small like information bits
+okay i'm just giving like small uh small like information bits
 
 0:20:28.480,0:20:31.919
-pills whatever but again yeah so
-
-0:20:31.360,0:20:33.679
-whenever
+pills whatever but again yeah so whenever
 
 0:20:31.919,0:20:35.679
-we increase the temperature you're going
-
-0:20:33.679,0:20:36.799
-to be relaxing until you get just one
+we increase the temperature you're going to be relaxing until you get just one
 
 0:20:35.679,0:20:38.320
-single minimum
-
-0:20:36.799,0:20:40.559
-and then there are no more latent
+single minimum and then there are no more latent
 
 0:20:38.320,0:20:43.840
-because we just average out everything
-
-0:20:40.559,0:20:46.480
-without those weights right anyhow
+because we just average out everything without those weights right anyhow
 
 0:20:43.840,0:20:47.440
-uh i i think now if you if you need to
-
-0:20:46.480,0:20:49.440
-implement this stuff
+uh i i think now if you if you need to implement this stuff
 
 0:20:47.440,0:20:52.000
-in pytorch you're gonna be be getting
-
-0:20:49.440,0:20:54.640
-like quite frustrated because
+in pytorch you're gonna be be getting like quite frustrated because
 
 0:20:52.000,0:20:55.280
-they use different names for the things
-
-0:20:54.640,0:20:58.000
-i just
+they use different names for the things i just
 
 0:20:55.280,0:20:59.440
-defined and someone say oh you should
-
-0:20:58.000,0:21:01.679
-have used their names no
+defined and someone say oh you should have used their names no
 
 0:20:59.440,0:21:04.159
-because those are wrong right so i use
-
-0:21:01.679,0:21:06.400
-the correct name so the one that is
+because those are wrong right so i use the correct name so the one that is
 
 0:21:04.159,0:21:07.520
-that makes sense i will try to sell it
-
-0:21:06.400,0:21:09.760
-to you this way
+that makes sense i will try to sell it to you this way
 
 0:21:07.520,0:21:12.159
-so let me explain to you a little bit of
-
-0:21:09.760,0:21:15.039
-you know what is the nomenclature i use
+so let me explain to you a little bit of you know what is the nomenclature i use
 
 0:21:12.159,0:21:15.440
-uh such that it makes sense at least to
-
-0:21:15.039,0:21:18.240
-me
+uh such that it makes sense at least to me
 
 0:21:15.440,0:21:19.760
-otherwise things don't make sense to me
-
-0:21:18.240,0:21:21.600
-so this is the actual
+otherwise things don't make sense to me so this is the actual
 
 0:21:19.760,0:21:23.440
-soft max right not the soft max that
-
-0:21:21.600,0:21:25.039
-people talk outside this class this is
+soft max right not the soft max that people talk outside this class this is
 
 0:21:23.440,0:21:27.840
-the actual soft max
-
-0:21:25.039,0:21:29.679
-which is this you know one over betas
+the actual soft max which is this you know one over betas
 
 0:21:27.840,0:21:31.039
-log of blah blah blah some of the
-
-0:21:29.679,0:21:34.640
-exponentials
+log of blah blah blah some of the exponentials
 
 0:21:31.039,0:21:38.000
-i just expanded these um the previous
-
-0:21:34.640,0:21:40.000
-uh i just expanded the the one over z
+i just expanded these um the previous uh i just expanded the the one over z
 
 0:21:38.000,0:21:42.480
-i took it out right in the in the
-
-0:21:40.000,0:21:44.159
-logarithm so i just split the two things
+i took it out right in the in the logarithm so i just split the two things
 
 0:21:42.480,0:21:46.480
-so how do we implement this stuff in
-
-0:21:44.159,0:21:49.360
-pytorch well you just use this function
+so how do we implement this stuff in pytorch well you just use this function
 
 0:21:46.480,0:21:52.640
-which is called torch dot log sum x
-
-0:21:49.360,0:21:53.120
-which is this soft max actual softmax
+which is called torch dot log sum x which is this soft max actual softmax
 
 0:21:52.640,0:21:55.600
-right
-
-0:21:53.120,0:21:56.880
-and then plus or minus that additional
+right and then plus or minus that additional
 
 0:21:55.600,0:21:58.960
-constant over there
-
-0:21:56.880,0:22:00.240
-right so this is how you want to use how
+constant over there right so this is how you want to use how
 
 0:21:58.960,0:22:03.440
-to implement that
-
-0:22:00.240,0:22:06.559
-because you know it's numerically stable
+to implement that because you know it's numerically stable
 
 0:22:03.440,0:22:08.320
-moreover if you
-
-0:22:06.559,0:22:10.240
-this is the actual definition of the
+moreover if you this is the actual definition of the
 
 0:22:08.320,0:22:12.640
-actual soft min
-
-0:22:10.240,0:22:14.240
-and you can see this is what i wrote
+actual soft min and you can see this is what i wrote
 
 0:22:12.640,0:22:16.240
-before
-
-0:22:14.240,0:22:17.840
-you can think about that it's very
+before you can think about that it's very
 
 0:22:16.240,0:22:19.360
-similar to the softmax right
-
-0:22:17.840,0:22:21.520
-the actual softmax what's the only
+similar to the softmax right the actual softmax what's the only
 
 0:22:19.360,0:22:24.240
-difference there are two minuses right
-
-0:22:21.520,0:22:24.880
-and so you can do that you can get get
+difference there are two minuses right and so you can do that you can get get
 
 0:22:24.240,0:22:26.640
-that away
-
-0:22:24.880,0:22:28.400
-with you know you put a minus in front
+that away with you know you put a minus in front
 
 0:22:26.640,0:22:30.480
-so you cancel the first minus
-
-0:22:28.400,0:22:31.760
-and you put a minus inside so you cancel
+so you cancel the first minus and you put a minus inside so you cancel
 
 0:22:30.480,0:22:34.320
-the other minus
-
-0:22:31.760,0:22:36.240
-and so you know the soft mean is simply
+the other minus and so you know the soft mean is simply
 
 0:22:34.320,0:22:38.400
-uh you can implement it as a
-
-0:22:36.240,0:22:40.559
-soft max with the two minuses okay
+uh you can implement it as a soft max with the two minuses okay
 
 0:22:38.400,0:22:42.320
-against actual softmax
-
-0:22:40.559,0:22:43.760
-and then someone of course is going to
+against actual softmax and then someone of course is going to
 
 0:22:42.320,0:22:46.000
-be asking
-
-0:22:43.760,0:22:48.480
-but what is the softmax we use in class
+be asking but what is the softmax we use in class
 
 0:22:46.000,0:22:51.840
-every time so that one
-
-0:22:48.480,0:22:55.120
-is actually the soft arc max right
+every time so that one is actually the soft arc max right
 
 0:22:51.840,0:22:56.880
-why is that right because a arc max is
-
-0:22:55.120,0:23:00.000
-going to be like a one hot vector
+why is that right because a arc max is going to be like a one hot vector
 
 0:22:56.880,0:23:02.799
-and d1 tells you what is the index
-
-0:23:00.000,0:23:04.000
-of the element that has the maximum
+and d1 tells you what is the index of the element that has the maximum
 
 0:23:02.799,0:23:06.720
-value right
-
-0:23:04.000,0:23:08.000
-so the max gives you retrieves the
+value right so the max gives you retrieves the
 
 0:23:06.720,0:23:10.240
-maximum value
-
-0:23:08.000,0:23:12.400
-you know and then the arc max is going
+maximum value you know and then the arc max is going
 
 0:23:10.240,0:23:14.320
-to tell you where is the index
-
-0:23:12.400,0:23:15.840
-pointing to that maximum value right so
+to tell you where is the index pointing to that maximum value right so
 
 0:23:14.320,0:23:18.000
-this is like a vector
-
-0:23:15.840,0:23:19.200
-with a one hot vector and the other one
+this is like a vector with a one hot vector and the other one
 
 0:23:18.000,0:23:21.440
-is a scalar
-
-0:23:19.200,0:23:23.600
-similarly whenever i compute this soft
+is a scalar similarly whenever i compute this soft
 
 0:23:21.440,0:23:26.159
-max the softer version of the max
-
-0:23:23.600,0:23:28.400
-now this max is not just the max it's
+max the softer version of the max now this max is not just the max it's
 
 0:23:26.159,0:23:30.880
-going to be like a summation of this
-
-0:23:28.400,0:23:32.640
-uh the logarithm of the summation of the
+going to be like a summation of this uh the logarithm of the summation of the
 
 0:23:30.880,0:23:34.400
-exponential right
-
-0:23:32.640,0:23:36.080
-which you can change the temperature if
+exponential right which you can change the temperature if
 
 0:23:34.400,0:23:37.200
-you get the temperature super cold you
-
-0:23:36.080,0:23:38.720
-retrieve the max
+you get the temperature super cold you retrieve the max
 
 0:23:37.200,0:23:41.039
-if you warm up the temperature you get
-
-0:23:38.720,0:23:43.840
-something like more
+if you warm up the temperature you get something like more
 
 0:23:41.039,0:23:45.039
-like a weighted summation and for the
-
-0:23:43.840,0:23:46.880
-soft dark max
+like a weighted summation and for the soft dark max
 
 0:23:45.039,0:23:48.960
-which was like the arc max is the one
-
-0:23:46.880,0:23:50.640
-hot if it's super cold
+which was like the arc max is the one hot if it's super cold
 
 0:23:48.960,0:23:52.080
-it's gonna still be one hot but if you
-
-0:23:50.640,0:23:53.600
-warm up the temperature
+it's gonna still be one hot but if you warm up the temperature
 
 0:23:52.080,0:23:55.679
-you're gonna get a distribution
-
-0:23:53.600,0:23:58.000
-probability distribution right
+you're gonna get a distribution probability distribution right
 
 0:23:55.679,0:23:59.200
-so whenever someone says oh the soft max
-
-0:23:58.000,0:24:01.360
-gives you the probability distribution
+so whenever someone says oh the soft max gives you the probability distribution
 
 0:23:59.200,0:24:04.159
-now that's the soft dark mask okay
-
-0:24:01.360,0:24:06.000
-uh arc max being the one hot or the zero
+now that's the soft dark mask okay uh arc max being the one hot or the zero
 
 0:24:04.159,0:24:06.720
-temperature limited limit gives you the
-
-0:24:06.000,0:24:08.080
-one hot
+temperature limited limit gives you the one hot
 
 0:24:06.720,0:24:10.000
-if you increase the temperature you get
-
-0:24:08.080,0:24:12.799
-a distribution so finally
+if you increase the temperature you get a distribution so finally
 
 0:24:10.000,0:24:14.080
-these are the correct names no one is
-
-0:24:12.799,0:24:18.159
-using but me
+these are the correct names no one is using but me
 
 0:24:14.080,0:24:21.360
-so i hope i didn't create confusion
-
-0:24:18.159,0:24:24.960
-if i did sorry but still this is the
+so i hope i didn't create confusion if i did sorry but still this is the
 
 0:24:21.360,0:24:27.840
-correct way of seeing these things okay
-
-0:24:24.960,0:24:28.799
-because it makes sense right so again uh
+correct way of seeing these things okay because it makes sense right so again uh
 
 0:24:27.840,0:24:31.279
-if you have the
-
-0:24:28.799,0:24:32.000
-max if you have a function you want to
+if you have the max if you have a function you want to
 
 0:24:31.279,0:24:34.640
-find the
-
-0:24:32.000,0:24:36.080
-max it's here right if you have this
+find the max it's here right if you have this
 
 0:24:34.640,0:24:38.240
-function you want to find the mean
-
-0:24:36.080,0:24:39.679
-you can take the function you flip it
+function you want to find the mean you can take the function you flip it
 
 0:24:38.240,0:24:41.039
-you find the max
-
-0:24:39.679,0:24:42.559
-and then you flip it back again you get
+you find the max and then you flip it back again you get
 
 0:24:41.039,0:24:43.039
-the mean right so that's what i show you
-
-0:24:42.559,0:24:45.600
-here
+the mean right so that's what i show you here
 
 0:24:43.039,0:24:46.880
-i show you that soft min is simply the
-
-0:24:45.600,0:24:50.799
-flipped version
+i show you that soft min is simply the flipped version
 
 0:24:46.880,0:24:54.000
-the negative right of the max with a
-
-0:24:50.799,0:24:55.919
-flipped in argument okay
+the negative right of the max with a flipped in argument okay
 
 0:24:54.000,0:24:57.039
-all right all right so enough me talking
-
-0:24:55.919,0:24:59.919
-about
+all right all right so enough me talking about
 
 0:24:57.039,0:25:01.520
-mathematics and things i hope it was
-
-0:24:59.919,0:25:05.200
-fine
+mathematics and things i hope it was fine
 
 0:25:01.520,0:25:08.640
-so this was the part
-
-0:25:05.200,0:25:09.440
-uh that was concluding the last lesson
+so this was the part uh that was concluding the last lesson
 
 0:25:08.640,0:25:12.880
-right so
-
-0:25:09.440,0:25:14.880
-this is the end of the inference
+right so this is the end of the inference
 
 0:25:12.880,0:25:16.000
-and we figured that there is the free
-
-0:25:14.880,0:25:18.880
-energy
+and we figured that there is the free energy
 
 0:25:16.000,0:25:19.679
-there is a very cold one or there is a
-
-0:25:18.880,0:25:21.520
-warm
+there is a very cold one or there is a warm
 
 0:25:19.679,0:25:23.120
-version or there is a very hot version
-
-0:25:21.520,0:25:24.080
-the hot version is going to be the
+version or there is a very hot version the hot version is going to be the
 
 0:25:23.120,0:25:25.760
-average
-
-0:25:24.080,0:25:27.600
-the warm version is going to be like
+average the warm version is going to be like
 
 0:25:25.760,0:25:30.400
-something you you may like
-
-0:25:27.600,0:25:31.200
-is like this marginalization of the of
+something you you may like is like this marginalization of the of
 
 0:25:30.400,0:25:33.360
-the latent
-
-0:25:31.200,0:25:34.720
-and then the super cold version the zero
+the latent and then the super cold version the zero
 
 0:25:33.360,0:25:37.200
-temperature limit is going to be
-
-0:25:34.720,0:25:39.200
-this exactly the minimum version minimum
+temperature limit is going to be this exactly the minimum version minimum
 
 0:25:37.200,0:25:42.480
-value
-
-0:25:39.200,0:25:45.120
-uh what i showed you was the fact that
+value uh what i showed you was the fact that
 
 0:25:42.480,0:25:45.919
-this model is a very poorly trained
-
-0:25:45.120,0:25:49.120
-model
+this model is a very poorly trained model
 
 0:25:45.919,0:25:51.760
-because those low energy
-
-0:25:49.120,0:25:53.679
-regions were not you know happening
+because those low energy regions were not you know happening
 
 0:25:51.760,0:25:56.480
-around this training set right so
-
-0:25:53.679,0:25:58.080
-let me show you once again uh the same
+around this training set right so let me show you once again uh the same
 
 0:25:56.480,0:26:00.480
-and the same diagram i showed you
-
-0:25:58.080,0:26:02.080
-at the beginning of today's lesson which
+and the same diagram i showed you at the beginning of today's lesson which
 
 0:26:00.480,0:26:04.320
-is this one over here
-
-0:26:02.080,0:26:05.520
-so here i show you with these white
+is this one over here so here i show you with these white
 
 0:26:04.320,0:26:08.559
-checks a few
-
-0:26:05.520,0:26:12.240
-uh samples uh on the
+checks a few uh samples uh on the
 
 0:26:08.559,0:26:14.000
-model manifold and then the y's the blue
-
-0:26:12.240,0:26:15.520
-eyes are the training sample but we
+model manifold and then the y's the blue eyes are the training sample but we
 
 0:26:14.000,0:26:17.200
-never use the training sample right i
-
-0:26:15.520,0:26:20.080
-just use the training sample to
+never use the training sample right i just use the training sample to
 
 0:26:17.200,0:26:21.039
-compute the energy the free energy but
-
-0:26:20.080,0:26:22.799
-we never
+compute the energy the free energy but we never
 
 0:26:21.039,0:26:24.480
-use them to learn because we didn't talk
-
-0:26:22.799,0:26:27.679
-about learning we talked about
+use them to learn because we didn't talk about learning we talked about
 
 0:26:24.480,0:26:29.120
-inference so far right and so
-
-0:26:27.679,0:26:31.600
-guess what is going to be the next part
+inference so far right and so guess what is going to be the next part
 
 0:26:29.120,0:26:34.080
-of today's lesson
-
-0:26:31.600,0:26:35.360
-you guessed it right training so now
+of today's lesson you guessed it right training so now
 
 0:26:34.080,0:26:38.640
-we're going to be starting
-
-0:26:35.360,0:26:39.120
-uh to learn how to train learn how to
+we're going to be starting uh to learn how to train learn how to
 
 0:26:38.640,0:26:41.440
-learn
-
-0:26:39.120,0:26:42.640
-train train how to learn no learn how to
+learn train train how to learn no learn how to
 
 0:26:41.440,0:26:45.600
-train
-
-0:26:42.640,0:26:48.960
-energy based model okay unless there are
+train energy based model okay unless there are
 
 0:26:45.600,0:26:48.960
-questions for me on the chat
-
-0:26:50.320,0:26:55.760
-no questions everything clear meta
+questions for me on the chat no questions everything clear meta
 
 0:26:53.039,0:26:57.600
-learning yes
-
-0:26:55.760,0:26:59.919
-no that one is a different subject next
+learning yes no that one is a different subject next
 
 0:26:57.600,0:26:59.919
-time
-
-0:27:00.159,0:27:04.720
-all right okay so i think yeah there is
+time all right okay so i think yeah there is
 
 0:27:02.960,0:27:06.880
-no big deal right so this is just
-
-0:27:04.720,0:27:08.799
-inference we didn't talk about any crazy
+no big deal right so this is just inference we didn't talk about any crazy
 
 0:27:06.880,0:27:09.279
-stuff and we talked in for inference
-
-0:27:08.799,0:27:11.039
-about
+stuff and we talked in for inference about
 
 0:27:09.279,0:27:12.559
-about the inference the whole last
-
-0:27:11.039,0:27:15.360
-lesson so
+about the inference the whole last lesson so
 
 0:27:12.559,0:27:17.840
-i guess we can move on move on and start
-
-0:27:15.360,0:27:20.320
-the training
+i guess we can move on move on and start the training
 
 0:27:17.840,0:27:21.279
-finding a well-behaved energy function
-
-0:27:20.320,0:27:24.080
-right
+finding a well-behaved energy function right
 
 0:27:21.279,0:27:25.200
-what does this mean so this means we
-
-0:27:24.080,0:27:27.840
-have to introduce
+what does this mean so this means we have to introduce
 
 0:27:25.200,0:27:30.240
-a loss functional what's a loss
-
-0:27:27.840,0:27:33.679
-functional
+a loss functional what's a loss functional
 
 0:27:30.240,0:27:35.440
-well it's a metric it's a scalar
-
-0:27:33.679,0:27:38.640
-function
+well it's a metric it's a scalar function
 
 0:27:35.440,0:27:42.000
-that is telling you how good your
-
-0:27:38.640,0:27:44.559
-energy function is right so we have an
+that is telling you how good your energy function is right so we have an
 
 0:27:42.000,0:27:45.520
-energy function which is this free
-
-0:27:44.559,0:27:47.360
-energy
+energy function which is this free energy
 
 0:27:45.520,0:27:49.279
-and then we're going to have a function
-
-0:27:47.360,0:27:51.039
-of my function
+and then we're going to have a function of my function
 
 0:27:49.279,0:27:53.840
-which is giving me a scalar which is
-
-0:27:51.039,0:27:56.880
-just telling me how good this
+which is giving me a scalar which is just telling me how good this
 
 0:27:53.840,0:27:59.520
-energy function is right so
-
-0:27:56.880,0:28:00.240
-a loss functional gives me a scalar
+energy function is right so a loss functional gives me a scalar
 
 0:27:59.520,0:28:03.679
-given that i
-
-0:28:00.240,0:28:06.320
-feed a function
+given that i feed a function
 
 0:28:03.679,0:28:07.679
-and here i just show to you that if i
-
-0:28:06.320,0:28:10.720
-have this curly l
+and here i just show to you that if i have this curly l
 
 0:28:07.679,0:28:13.600
-as you know the loss functional for the
-
-0:28:10.720,0:28:15.840
-all the whole batch my whole data set i
+as you know the loss functional for the all the whole batch my whole data set i
 
 0:28:13.600,0:28:19.600
-can also express this as the average
-
-0:28:15.840,0:28:21.600
-of these per sample loss functionals
+can also express this as the average of these per sample loss functionals
 
 0:28:19.600,0:28:23.919
-okay so i just do the average of those
-
-0:28:21.600,0:28:26.640
-per sample those functions
+okay so i just do the average of those per sample those functions
 
 0:28:23.919,0:28:28.559
-cool so what the heck am i talking about
-
-0:28:26.640,0:28:30.720
-right so i'm just giving you i'm
+cool so what the heck am i talking about right so i'm just giving you i'm
 
 0:28:28.559,0:28:32.080
-making so much hype but i didn't tell
-
-0:28:30.720,0:28:33.520
-you anything so far
+making so much hype but i didn't tell you anything so far
 
 0:28:32.080,0:28:35.120
-and we already know this stuff from you
-
-0:28:33.520,0:28:37.840
-know machine learning and you know
+and we already know this stuff from you know machine learning and you know
 
 0:28:35.120,0:28:39.360
-previous lessons and so here we go with
-
-0:28:37.840,0:28:41.520
-the first loss function
+previous lessons and so here we go with the first loss function
 
 0:28:39.360,0:28:43.200
-which is the which is the energy loss
-
-0:28:41.520,0:28:46.240
-function
+which is the which is the energy loss function
 
 0:28:43.200,0:28:49.360
-so this energy loss functional
-
-0:28:46.240,0:28:52.720
-it's simply the free energy
+so this energy loss functional it's simply the free energy
 
 0:28:49.360,0:28:54.720
-f evaluated in my y
-
-0:28:52.720,0:28:56.840
-where y is the data point on the data
+f evaluated in my y where y is the data point on the data
 
 0:28:54.720,0:28:59.840
-set right
-
-0:28:56.840,0:29:01.600
-so whenever whenever we train these
+set right so whenever whenever we train these
 
 0:28:59.840,0:29:03.200
-models we're going to be minimizing the
-
-0:29:01.600,0:29:06.399
-loss functionalities
+models we're going to be minimizing the loss functionalities
 
 0:29:03.200,0:29:08.000
-right the loss function and so in this
-
-0:29:06.399,0:29:11.120
-case the loss functional
+right the loss function and so in this case the loss functional
 
 0:29:08.000,0:29:14.399
-is actually the free energy at the
-
-0:29:11.120,0:29:17.440
-training point of course right i mean
+is actually the free energy at the training point of course right i mean
 
 0:29:14.399,0:29:18.559
-what this energy function has to do with
-
-0:29:17.440,0:29:21.200
-the free energy
+what this energy function has to do with the free energy
 
 0:29:18.559,0:29:22.559
-should be small for data that comes from
-
-0:29:21.200,0:29:25.279
-the training distribution
+should be small for data that comes from the training distribution
 
 0:29:22.559,0:29:26.960
-large elsewhere right and so what is the
-
-0:29:25.279,0:29:28.480
-easiest way to do that well of course
+large elsewhere right and so what is the easiest way to do that well of course
 
 0:29:26.960,0:29:31.679
-we're gonna just have the
-
-0:29:28.480,0:29:34.799
-loss functional being the free energy
+we're gonna just have the loss functional being the free energy
 
 0:29:31.679,0:29:37.600
-evaluated at the training point
-
-0:29:34.799,0:29:39.440
-so if it's larger than zero then the
+evaluated at the training point so if it's larger than zero then the
 
 0:29:37.600,0:29:41.039
-training of the network you know
-
-0:29:39.440,0:29:42.960
-changing the parameters such that we
+training of the network you know changing the parameters such that we
 
 0:29:41.039,0:29:45.520
-minimize the loss functional
-
-0:29:42.960,0:29:47.120
-is going to be squeezing down the free
+minimize the loss functional is going to be squeezing down the free
 
 0:29:45.520,0:29:48.640
-energy on those
-
-0:29:47.120,0:29:50.399
-points right so you have the point you
+energy on those points right so you have the point you
 
 0:29:48.640,0:29:53.679
-have a free energy boom
-
-0:29:50.399,0:29:56.799
-point free energy bomb all right so
+have a free energy boom point free energy bomb all right so
 
 0:29:53.679,0:29:58.480
-we just small like clamp like we
-
-0:29:56.799,0:30:00.559
-we are reducing the free energy in
+we just small like clamp like we we are reducing the free energy in
 
 0:29:58.480,0:30:04.000
-correspondence to all these
-
-0:30:00.559,0:30:06.000
-uh whites why is
+correspondence to all these uh whites why is
 
 0:30:04.000,0:30:07.120
-there is a check there is a check
-
-0:30:06.000,0:30:08.960
-because i
+there is a check there is a check because i
 
 0:30:07.120,0:30:10.240
-want to emphasize the fact that we are
-
-0:30:08.960,0:30:12.240
-trying to push down
+want to emphasize the fact that we are trying to push down
 
 0:30:10.240,0:30:13.440
-the energy at those locations right so i
-
-0:30:12.240,0:30:16.799
-push down there is the
+the energy at those locations right so i push down there is the
 
 0:30:13.440,0:30:19.600
-arrow pointing down i push down
-
-0:30:16.799,0:30:20.960
-all right okay i might sound silly but
+arrow pointing down i push down all right okay i might sound silly but
 
 0:30:19.600,0:30:24.880
-it doesn't matter i like
-
-0:30:20.960,0:30:26.320
-myself silly so instead now we're gonna
+it doesn't matter i like myself silly so instead now we're gonna
 
 0:30:24.880,0:30:28.399
-be introducing these
-
-0:30:26.320,0:30:29.520
-uh contrastive methods what is a
+be introducing these uh contrastive methods what is a
 
 0:30:28.399,0:30:32.080
-contrastive method
-
-0:30:29.520,0:30:32.720
-uh in this case this contrasting method
+contrastive method uh in this case this contrasting method
 
 0:30:32.080,0:30:35.440
-will have
-
-0:30:32.720,0:30:36.399
-a white check which is blue why is blue
+will have a white check which is blue why is blue
 
 0:30:35.440,0:30:38.320
-because it's cold
-
-0:30:36.399,0:30:40.320
-right so we want to try to get low
+because it's cold right so we want to try to get low
 
 0:30:38.320,0:30:41.600
-energy again the energy the temperature
-
-0:30:40.320,0:30:44.399
-are connected right
+energy again the energy the temperature are connected right
 
 0:30:41.600,0:30:46.320
-so low energy is going to be cold blue
-
-0:30:44.399,0:30:50.159
-and then i have a white hot
+so low energy is going to be cold blue and then i have a white hot
 
 0:30:46.320,0:30:51.520
-why is hot why is red why why hot is red
-
-0:30:50.159,0:30:53.679
-i want to increase the energy right
+why is hot why is red why why hot is red i want to increase the energy right
 
 0:30:51.520,0:30:56.720
-that's why there is the the hot pointing
-
-0:30:53.679,0:31:00.080
-upwards and so in this case
+that's why there is the the hot pointing upwards and so in this case
 
 0:30:56.720,0:31:04.000
-uh given that m is a positive number
-
-0:31:00.080,0:31:08.480
-the difference f of y hat
+uh given that m is a positive number the difference f of y hat
 
 0:31:04.000,0:31:10.880
-minus f of y check that the difference
-
-0:31:08.480,0:31:11.760
-it will the network will try to make it
+minus f of y check that the difference it will the network will try to make it
 
 0:31:10.880,0:31:15.360
-larger than
-
-0:31:11.760,0:31:19.039
-m right so for as long as the difference
+larger than m right so for as long as the difference
 
 0:31:15.360,0:31:22.159
-is smaller than m then these
-
-0:31:19.039,0:31:22.559
-you know this value over here will have
+is smaller than m then these you know this value over here will have
 
 0:31:22.159,0:31:25.919
-a
-
-0:31:22.559,0:31:29.519
-positive value whenever f y hat
+a positive value whenever f y hat
 
 0:31:25.919,0:31:32.799
-minus f y check will be larger than
-
-0:31:29.519,0:31:34.240
-m then you're gonna have that
+minus f y check will be larger than m then you're gonna have that
 
 0:31:32.799,0:31:36.000
-you know the output of this stuff is
-
-0:31:34.240,0:31:38.880
-gonna be zero
+you know the output of this stuff is gonna be zero
 
 0:31:36.000,0:31:39.360
-okay because there is a a positive part
-
-0:31:38.880,0:31:42.559
-so
+okay because there is a a positive part so
 
 0:31:39.360,0:31:42.880
-again this hinge loss will simply try to
-
-0:31:42.559,0:31:45.919
-get
+again this hinge loss will simply try to get
 
 0:31:42.880,0:31:46.720
-that second difference to be larger than
-
-0:31:45.919,0:31:50.960
-the
+that second difference to be larger than the
 
 0:31:46.720,0:31:50.960
-uh the first term the margin
-
-0:31:51.039,0:31:54.640
-in order to have like a smoother version
+uh the first term the margin in order to have like a smoother version
 
 0:31:53.120,0:31:56.960
-of this margin
-
-0:31:54.640,0:31:58.799
-this is like very binary right if you're
+of this margin this is like very binary right if you're
 
 0:31:56.960,0:32:01.279
-lower than the margin you push
-
-0:31:58.799,0:32:03.120
-larger than the margin you stop pushing
+lower than the margin you push larger than the margin you stop pushing
 
 0:32:01.279,0:32:07.679
-you can use this other version the
-
-0:32:03.120,0:32:10.799
-the loss log loss functional
+you can use this other version the the loss log loss functional
 
 0:32:07.679,0:32:13.279
-which is a smooth margin uh
-
-0:32:10.799,0:32:14.399
-you can you can see right whenever you
+which is a smooth margin uh you can you can see right whenever you
 
 0:32:13.279,0:32:17.440
-have that
-
-0:32:14.399,0:32:18.240
-inside these in this parenthesis you
+have that inside these in this parenthesis you
 
 0:32:17.440,0:32:20.399
-have a very
-
-0:32:18.240,0:32:21.679
-negative number so if this is very very
+have a very negative number so if this is very very
 
 0:32:20.399,0:32:23.600
-large and this is zero
-
-0:32:21.679,0:32:25.360
-let's say you're gonna have the x of a
+large and this is zero let's say you're gonna have the x of a
 
 0:32:23.600,0:32:25.919
-very negative number which is roughly
-
-0:32:25.360,0:32:28.080
-zero
+very negative number which is roughly zero
 
 0:32:25.919,0:32:29.279
-and i have the log of one which is you
-
-0:32:28.080,0:32:32.240
-know stop pushing
+and i have the log of one which is you know stop pushing
 
 0:32:29.279,0:32:33.600
-there is no more instead if this value
-
-0:32:32.240,0:32:35.840
-here is large
+there is no more instead if this value here is large
 
 0:32:33.600,0:32:37.600
-and this value is maybe negative or
-
-0:32:35.840,0:32:38.880
-whatever is zero
+and this value is maybe negative or whatever is zero
 
 0:32:37.600,0:32:40.640
-you're gonna have the exponential of
-
-0:32:38.880,0:32:42.799
-this number which is gonna be very large
+you're gonna have the exponential of this number which is gonna be very large
 
 0:32:40.640,0:32:44.240
-and then you're gonna have the one plus
-
-0:32:42.799,0:32:47.679
-this exponential
+and then you're gonna have the one plus this exponential
 
 0:32:44.240,0:32:48.240
-uh which again the one gets neglected
-
-0:32:47.679,0:32:49.840
-you don't get
+uh which again the one gets neglected you don't get
 
 0:32:48.240,0:32:51.840
-the log of this x but you're gonna get
-
-0:32:49.840,0:32:53.679
-basically the uh the
+the log of this x but you're gonna get basically the uh the
 
 0:32:51.840,0:32:57.279
-the loss is gonna be proportional to the
-
-0:32:53.679,0:33:00.880
-energy right if it's very large
+the loss is gonna be proportional to the energy right if it's very large
 
 0:32:57.279,0:33:03.519
-cool cool but again for our case uh
-
-0:33:00.880,0:33:04.240
-we just have a very tiny one-dimensional
+cool cool but again for our case uh we just have a very tiny one-dimensional
 
 0:33:03.519,0:33:06.399
-latent
-
-0:33:04.240,0:33:07.760
-so we don't need to do this uh this
+latent so we don't need to do this uh this
 
 0:33:06.399,0:33:09.760
-contrastive sampling
-
-0:33:07.760,0:33:11.440
-uh contrastive learning it's necessary
+contrastive sampling uh contrastive learning it's necessary
 
 0:33:09.760,0:33:13.440
-whenever you have like a
-
-0:33:11.440,0:33:14.720
-um you know maybe like a high
+whenever you have like a um you know maybe like a high
 
 0:33:13.440,0:33:18.320
-dimensional latent
-
-0:33:14.720,0:33:20.080
-and so on um so let's just
+dimensional latent and so on um so let's just
 
 0:33:18.320,0:33:22.559
-you know let's just train this model
-
-0:33:20.080,0:33:26.080
-because i didn't train this model so far
+you know let's just train this model because i didn't train this model so far
 
 0:33:22.559,0:33:29.519
-with this energy loss functional
-
-0:33:26.080,0:33:31.360
-okay and so i train this model it takes
+with this energy loss functional okay and so i train this model it takes
 
 0:33:29.519,0:33:33.840
-one epoch to converge
-
-0:33:31.360,0:33:36.159
-it's ridiculously fast okay but it's a
+one epoch to converge it's ridiculously fast okay but it's a
 
 0:33:33.840,0:33:38.880
-toy example so you understand that
-
-0:33:36.159,0:33:39.200
-and i'm gonna start by uh showing you
+toy example so you understand that and i'm gonna start by uh showing you
 
 0:33:38.880,0:33:41.600
-the
-
-0:33:39.200,0:33:43.519
-zero temperature limit the super cold
+the zero temperature limit the super cold
 
 0:33:41.600,0:33:45.200
-free energy okay
-
-0:33:43.519,0:33:46.720
-uh on the left hand side i'm gonna show
+free energy okay uh on the left hand side i'm gonna show
 
 0:33:45.200,0:33:48.559
-you the untrained version which is the
-
-0:33:46.720,0:33:52.000
-one we already saw before
+you the untrained version which is the one we already saw before
 
 0:33:48.559,0:33:54.559
-so in this case for every training
-
-0:33:52.000,0:33:55.440
-point the blue point i have a
+so in this case for every training point the blue point i have a
 
 0:33:54.559,0:33:57.919
-corresponding
-
-0:33:55.440,0:34:00.080
-x which is the location on the model
+corresponding x which is the location on the model
 
 0:33:57.919,0:34:04.559
-manifold that is the closest to that
-
-0:34:00.080,0:34:08.639
-training point okay whenever i train
+manifold that is the closest to that training point okay whenever i train
 
 0:34:04.559,0:34:11.200
-i'm gonna be you know uh get a gradient
-
-0:34:08.639,0:34:12.000
-that gradient is gonna be i just i told
+i'm gonna be you know uh get a gradient that gradient is gonna be i just i told
 
 0:34:11.200,0:34:13.520
-you before
-
-0:34:12.000,0:34:15.679
-if you if you get the mean you're gonna
+you before if you if you get the mean you're gonna
 
 0:34:13.520,0:34:17.280
-get one item and then if you do the
-
-0:34:15.679,0:34:19.119
-derivative you're gonna get the
+get one item and then if you do the derivative you're gonna get the
 
 0:34:17.280,0:34:21.040
-argument which is just the one in
-
-0:34:19.119,0:34:24.240
-correspondence to the
+argument which is just the one in correspondence to the
 
 0:34:21.040,0:34:25.919
-lowest value and so that one is going to
-
-0:34:24.240,0:34:28.960
-be represented here
+lowest value and so that one is going to be represented here
 
 0:34:25.919,0:34:31.359
-by uh that
-
-0:34:28.960,0:34:32.480
-arrow over here right so this arrow here
+by uh that arrow over here right so this arrow here
 
 0:34:31.359,0:34:35.440
-is going to be
-
-0:34:32.480,0:34:36.320
-the energy the derivative of the energy
+is going to be the energy the derivative of the energy
 
 0:34:35.440,0:34:39.599
-which is going to be
-
-0:34:36.320,0:34:42.320
-just the distance like the y minus
+which is going to be just the distance like the y minus
 
 0:34:39.599,0:34:43.440
-y check and then that's going to be
-
-0:34:42.320,0:34:45.760
-multiplied by
+y check and then that's going to be multiplied by
 
 0:34:43.440,0:34:48.240
-you know the one in corresponding to the
-
-0:34:45.760,0:34:51.280
-location that is closest to uh
+you know the one in corresponding to the location that is closest to uh
 
 0:34:48.240,0:34:54.879
-to our point all right so
-
-0:34:51.280,0:34:57.440
-what this means is that during training
+to our point all right so what this means is that during training
 
 0:34:54.879,0:34:59.040
-whenever we use the ztl the zero
-
-0:34:57.440,0:35:00.720
-temperature limit you're gonna get
+whenever we use the ztl the zero temperature limit you're gonna get
 
 0:34:59.040,0:35:02.240
-the location on the manifold that is
-
-0:35:00.720,0:35:03.680
-closest to your training point
+the location on the manifold that is closest to your training point
 
 0:35:02.240,0:35:05.520
-and then you're gonna get this point to
-
-0:35:03.680,0:35:06.960
-be moving there
+and then you're gonna get this point to be moving there
 
 0:35:05.520,0:35:08.800
-you have this training point you get
-
-0:35:06.960,0:35:09.839
-this location that is on the manifold
+you have this training point you get this location that is on the manifold
 
 0:35:08.800,0:35:11.200
-closer to this point
-
-0:35:09.839,0:35:13.440
-and then you get a gradient that is
+closer to this point and then you get a gradient that is
 
 0:35:11.200,0:35:15.119
-making going up here
-
-0:35:13.440,0:35:16.960
-same you have a training point here
+making going up here same you have a training point here
 
 0:35:15.119,0:35:19.520
-close this point to the manifold here
-
-0:35:16.960,0:35:20.160
-you get this point a gradient that goes
+close this point to the manifold here you get this point a gradient that goes
 
 0:35:19.520,0:35:23.359
-down here
-
-0:35:20.160,0:35:26.079
-okay so this is the training procedure
+down here okay so this is the training procedure
 
 0:35:23.359,0:35:28.079
-when using this zero temperature limit
-
-0:35:26.079,0:35:30.480
-one epoch later
+when using this zero temperature limit one epoch later
 
 0:35:28.079,0:35:31.520
-on the right hand side the train version
-
-0:35:30.480,0:35:34.800
-bam
+on the right hand side the train version bam
 
 0:35:31.520,0:35:38.720
-all those axes automatically managed to
-
-0:35:34.800,0:35:41.119
-arrive to destination finished
+all those axes automatically managed to arrive to destination finished
 
 0:35:38.720,0:35:42.400
-so this is like a well-trained model
-
-0:35:41.119,0:35:45.520
-which i'll show you
+so this is like a well-trained model which i'll show you
 
 0:35:42.400,0:35:46.160
-where i show you the energy uh going to
-
-0:35:45.520,0:35:48.560
-zero in
+where i show you the energy uh going to zero in
 
 0:35:46.160,0:35:50.880
-in the all around like acro
-
-0:35:48.560,0:35:52.720
-corresponding to all the locations
+in the all around like acro corresponding to all the locations
 
 0:35:50.880,0:35:54.640
-corresponding to my training data set
-
-0:35:52.720,0:35:58.000
-right the training points the
+corresponding to my training data set right the training points the
 
 0:35:54.640,0:36:01.359
-the blue points what happens if you
-
-0:35:58.000,0:36:05.520
-have two closest point on a manifold if
+the blue points what happens if you have two closest point on a manifold if
 
 0:36:01.359,0:36:08.000
-for example if y is at zero zero
-
-0:36:05.520,0:36:08.000
-um
+for example if y is at zero zero um
 
 0:36:09.280,0:36:14.240
-right so in the energy in the in the
-
-0:36:12.720,0:36:15.839
-zero temperature limit you're going to
+right so in the energy in the in the zero temperature limit you're going to
 
 0:36:14.240,0:36:17.119
-get just one point it's going to be
-
-0:36:15.839,0:36:20.320
-pulled there
+get just one point it's going to be pulled there
 
 0:36:17.119,0:36:22.480
-and this is very prone to overfitting
-
-0:36:20.320,0:36:23.440
-let's say our z is not just one
+and this is very prone to overfitting let's say our z is not just one
 
 0:36:22.480,0:36:25.280
-dimensional
-
-0:36:23.440,0:36:27.119
-large it's larger right so instead of
+dimensional large it's larger right so instead of
 
 0:36:25.280,0:36:29.599
-having like a ellipse you're gonna have
-
-0:36:27.119,0:36:31.599
-like a potato
+having like a ellipse you're gonna have like a potato
 
 0:36:29.599,0:36:33.200
-if you haven't hold on let me finish the
-
-0:36:31.599,0:36:35.040
-answer if you have a potato
+if you haven't hold on let me finish the answer if you have a potato
 
 0:36:33.200,0:36:37.440
-or potato you're gonna get all these
-
-0:36:35.040,0:36:40.960
-locations on the potato to go
+or potato you're gonna get all these locations on the potato to go
 
 0:36:37.440,0:36:44.000
-to those training points and so if your
-
-0:36:40.960,0:36:45.599
-z is a high dimensional latent variable
+to those training points and so if your z is a high dimensional latent variable
 
 0:36:44.000,0:36:47.599
-you end up with a you start with a
-
-0:36:45.599,0:36:49.760
-potato and you end up with a porcupine
+you end up with a you start with a potato and you end up with a porcupine
 
 0:36:47.599,0:36:51.359
-with all those peaks going uh you know
-
-0:36:49.760,0:36:52.800
-going out and this is basically
+with all those peaks going uh you know going out and this is basically
 
 0:36:51.359,0:36:53.280
-overfitting you just memorize the
-
-0:36:52.800,0:36:55.440
-training
+overfitting you just memorize the training
 
 0:36:53.280,0:36:58.320
-set in our case this doesn't happen
-
-0:36:55.440,0:37:01.680
-because our latent is one dimensional so
+set in our case this doesn't happen because our latent is one dimensional so
 
 0:36:58.320,0:37:04.320
-you can't really pull spikes out
-
-0:37:01.680,0:37:04.320
-of that thing
+you can't really pull spikes out of that thing
 
 0:37:05.119,0:37:11.839
-but nevertheless we may want to figure
-
-0:37:08.880,0:37:16.320
-out how to deal with this overfitting
+but nevertheless we may want to figure out how to deal with this overfitting
 
 0:37:11.839,0:37:18.240
-uh by using this you know temperature
-
-0:37:16.320,0:37:19.839
-regularization thing right so before i
+uh by using this you know temperature regularization thing right so before i
 
 0:37:18.240,0:37:21.760
-show you there was a peak
-
-0:37:19.839,0:37:23.280
-if there is a zero temperature limit
+show you there was a peak if there is a zero temperature limit
 
 0:37:21.760,0:37:25.760
-then if you increase the temperature you
-
-0:37:23.280,0:37:28.560
-actually smooth out that peak
+then if you increase the temperature you actually smooth out that peak
 
 0:37:25.760,0:37:30.640
-and so here i'm going to show you uh
-
-0:37:28.560,0:37:33.359
-then i answer the other question
+and so here i'm going to show you uh then i answer the other question
 
 0:37:30.640,0:37:34.079
-actually let me see what happens here
-
-0:37:33.359,0:37:36.800
-how
+actually let me see what happens here how
 
 0:37:34.079,0:37:38.800
-do we update the energy function is it
-
-0:37:36.800,0:37:41.359
-parametrized with uh
+do we update the energy function is it parametrized with uh
 
 0:37:38.800,0:37:42.000
-oh here this is definition from last
-
-0:37:41.359,0:37:45.520
-time
+oh here this is definition from last time
 
 0:37:42.000,0:37:47.440
-right so my energy function is this one
-
-0:37:45.520,0:37:50.480
-right
+right so my energy function is this one right
 
 0:37:47.440,0:37:52.000
-where so my energy function is my model
-
-0:37:50.480,0:37:54.240
-right
+where so my energy function is my model right
 
 0:37:52.000,0:37:56.000
-which is the square difference between
-
-0:37:54.240,0:37:57.119
-the locations and because the laden for
+which is the square difference between the locations and because the laden for
 
 0:37:56.000,0:37:58.640
-the first component and the code of the
-
-0:37:57.119,0:37:59.280
-latent for the second component so this
+the first component and the code of the latent for the second component so this
 
 0:37:58.640,0:38:03.280
-is like
-
-0:37:59.280,0:38:03.280
-this is how e is parametrized right
+is like this is how e is parametrized right
 
 0:38:03.359,0:38:08.079
-uh does the learning interpolate between
-
-0:38:06.400,0:38:10.880
-the points
+uh does the learning interpolate between the points
 
 0:38:08.079,0:38:11.200
-uh it asked would this algorithm learn
-
-0:38:10.880,0:38:14.000
-the
+uh it asked would this algorithm learn the
 
 0:38:11.200,0:38:14.880
-mod the whole ellipse or just the blue
-
-0:38:14.000,0:38:17.839
-points
+mod the whole ellipse or just the blue points
 
 0:38:14.880,0:38:18.640
-okay so i'm getting there okay is there
-
-0:38:17.839,0:38:20.480
-a visualization
+okay so i'm getting there okay is there a visualization
 
 0:38:18.640,0:38:24.240
-for the spikes to talk about when
-
-0:38:20.480,0:38:26.800
-overfitting yeah getting there as well
+for the spikes to talk about when overfitting yeah getting there as well
 
 0:38:24.240,0:38:28.160
-all right so we were telling like we
-
-0:38:26.800,0:38:30.400
-were talking about
+all right so we were telling like we were talking about
 
 0:38:28.160,0:38:32.079
-how we train this energy function right
-
-0:38:30.400,0:38:34.000
-so this energy energy function
+how we train this energy function right so this energy energy function
 
 0:38:32.079,0:38:35.280
-is going to be this color thing i show
-
-0:38:34.000,0:38:37.200
-you over here
+is going to be this color thing i show you over here
 
 0:38:35.280,0:38:39.280
-and this is you know a different
-
-0:38:37.200,0:38:39.920
-representation it's simply the location
+and this is you know a different representation it's simply the location
 
 0:38:39.280,0:38:43.680
-of that
-
-0:38:39.920,0:38:46.480
-uh violet ellipse
+of that uh violet ellipse
 
 0:38:43.680,0:38:48.400
-training for the zero temperature zero
-
-0:38:46.480,0:38:49.280
-temperature limit means you take that
+training for the zero temperature zero temperature limit means you take that
 
 0:38:48.400,0:38:51.760
-point
-
-0:38:49.280,0:38:53.520
-of these ellipse you try to pull it up
+point of these ellipse you try to pull it up
 
 0:38:51.760,0:38:55.760
-right how you pull it up
-
-0:38:53.520,0:38:56.800
-the only two parameters we had in this
+right how you pull it up the only two parameters we had in this
 
 0:38:55.760,0:39:00.240
-model were
-
-0:38:56.800,0:39:03.599
-w1 and w2 which were con controlling the
+model were w1 and w2 which were con controlling the
 
 0:39:00.240,0:39:04.960
-x radius and the y radius right so we
-
-0:39:03.599,0:39:07.359
-had two parameters
+x radius and the y radius right so we had two parameters
 
 0:39:04.960,0:39:08.079
-and with two parameters we try to fit
-
-0:39:07.359,0:39:11.119
-all these
+and with two parameters we try to fit all these
 
 0:39:08.079,0:39:11.440
-y's right and so basically the network
-
-0:39:11.119,0:39:13.599
-will
+y's right and so basically the network will
 
 0:39:11.440,0:39:15.200
-like the the training procedure gradient
-
-0:39:13.599,0:39:17.680
-descent will eventually
+like the the training procedure gradient descent will eventually
 
 0:39:15.200,0:39:19.200
-try to change the size of this ellipse
-
-0:39:17.680,0:39:20.880
-such that it
+try to change the size of this ellipse such that it
 
 0:39:19.200,0:39:22.240
-you know expands and they're going to be
-
-0:39:20.880,0:39:25.680
-matching all those
+you know expands and they're going to be matching all those
 
 0:39:22.240,0:39:29.040
-uh blue dots okay
-
-0:39:25.680,0:39:31.280
-the spiky thing was i was saying is that
+uh blue dots okay the spiky thing was i was saying is that
 
 0:39:29.040,0:39:33.200
-if you have a high dimensional z
-
-0:39:31.280,0:39:34.800
-like in this case z is one dimension so
+if you have a high dimensional z like in this case z is one dimension so
 
 0:39:33.200,0:39:36.960
-you have like one line
-
-0:39:34.800,0:39:38.800
-like that if that is two dimensional
+you have like one line like that if that is two dimensional
 
 0:39:36.960,0:39:41.119
-it's going to be the whole surface right
-
-0:39:38.800,0:39:42.800
-and so now it's trivial to overfit you
+it's going to be the whole surface right and so now it's trivial to overfit you
 
 0:39:41.119,0:39:44.160
-can move anywhere in the plane there is
-
-0:39:42.800,0:39:46.960
-no more constraint
+can move anywhere in the plane there is no more constraint
 
 0:39:44.160,0:39:47.359
-of living on that line and so we have to
-
-0:39:46.960,0:39:49.760
-see
+of living on that line and so we have to see
 
 0:39:47.359,0:39:51.200
-how we can avoid overfitting but in this
-
-0:39:49.760,0:39:53.359
-case it doesn't happen but
+how we can avoid overfitting but in this case it doesn't happen but
 
 0:39:51.200,0:39:54.720
-you know we can see now that by
-
-0:39:53.359,0:39:57.839
-increasing the temperature
+you know we can see now that by increasing the temperature
 
 0:39:54.720,0:39:59.599
-we no longer pick points individually
-
-0:39:57.839,0:40:02.720
-so we are using this marginalization
+we no longer pick points individually so we are using this marginalization
 
 0:39:59.599,0:40:06.160
-this vision thingy
-
-0:40:02.720,0:40:08.800
-so on the bottom part is marginalization
+this vision thingy so on the bottom part is marginalization
 
 0:40:06.160,0:40:11.200
-on the left hand side i show you how the
-
-0:40:08.800,0:40:14.160
-training uh works right
+on the left hand side i show you how the training uh works right
 
 0:40:11.200,0:40:15.440
-so you have that all those locations
-
-0:40:14.160,0:40:18.720
-contribute
+so you have that all those locations contribute
 
 0:40:15.440,0:40:21.119
-to these you know the gradient
-
-0:40:18.720,0:40:22.960
-are just the average of those arrows
+to these you know the gradient are just the average of those arrows
 
 0:40:21.119,0:40:26.079
-here so given that we pick
-
-0:40:22.960,0:40:29.680
-one y that is this green x
+here so given that we pick one y that is this green x
 
 0:40:26.079,0:40:32.960
-over here you get these all these
-
-0:40:29.680,0:40:35.280
-points on this manifold will contribute
+over here you get these all these points on this manifold will contribute
 
 0:40:32.960,0:40:37.760
-and will be attracted there here before
-
-0:40:35.280,0:40:39.920
-we have only one point gets pulled up
+and will be attracted there here before we have only one point gets pulled up
 
 0:40:37.760,0:40:41.839
-here we have that all these points get
-
-0:40:39.920,0:40:44.240
-pulled up right so it's much harder to
+here we have that all these points get pulled up right so it's much harder to
 
 0:40:41.839,0:40:45.280
-overfit something you want to pay
-
-0:40:44.240,0:40:47.359
-attention here
+overfit something you want to pay attention here
 
 0:40:45.280,0:40:48.960
-is that how do i compute the gradient so
-
-0:40:47.359,0:40:50.839
-the gradient
+is that how do i compute the gradient so the gradient
 
 0:40:48.960,0:40:52.000
-i'm computing the gradient of this soft
-
-0:40:50.839,0:40:54.800
-mean
+i'm computing the gradient of this soft mean
 
 0:40:52.000,0:40:57.040
-and so automatically we are gonna get a
-
-0:40:54.800,0:40:58.640
-soft argument right so if you have a max
+and so automatically we are gonna get a soft argument right so if you have a max
 
 0:40:57.040,0:41:00.560
-you do the gradient you're gonna get the
-
-0:40:58.640,0:41:02.960
-arc max or if you have a mean
+you do the gradient you're gonna get the arc max or if you have a mean
 
 0:41:00.560,0:41:04.240
-the gradient is gonna be the argument
-
-0:41:02.960,0:41:06.319
-here we have a soft
+the gradient is gonna be the argument here we have a soft
 
 0:41:04.240,0:41:08.400
-mean and therefore the gradient is going
-
-0:41:06.319,0:41:11.280
-to be the soft argument
+mean and therefore the gradient is going to be the soft argument
 
 0:41:08.400,0:41:11.920
-multiplied by the the derivative of this
-
-0:41:11.280,0:41:13.680
-energy
+multiplied by the the derivative of this energy
 
 0:41:11.920,0:41:15.920
-and which is going to be simply this
-
-0:41:13.680,0:41:18.079
-vector right so the energy is the square
+and which is going to be simply this vector right so the energy is the square
 
 0:41:15.920,0:41:19.520
-distance if you do the derivative you're
-
-0:41:18.079,0:41:22.640
-going to get the vector which are
+distance if you do the derivative you're going to get the vector which are
 
 0:41:19.520,0:41:26.400
-here shown in white and then the height
-
-0:41:22.640,0:41:28.720
-is gonna be uh basically given to you
+here shown in white and then the height is gonna be uh basically given to you
 
 0:41:26.400,0:41:29.440
-by the you know the the vector
-
-0:41:28.720,0:41:32.640
-multiplied
+by the you know the the vector multiplied
 
 0:41:29.440,0:41:36.560
-by this soft argument
-
-0:41:32.640,0:41:36.880
-cool wow that's a lot to take i think
+by this soft argument cool wow that's a lot to take i think
 
 0:41:36.560,0:41:39.599
-but
-
-0:41:36.880,0:41:41.280
-it's it's i think it's just great uh
+but it's it's i think it's just great uh
 
 0:41:39.599,0:41:44.240
-finally i train the last one
-
-0:41:41.280,0:41:45.599
-and i'm gonna get something like this on
+finally i train the last one and i'm gonna get something like this on
 
 0:41:44.240,0:41:48.800
-the right hand side
-
-0:41:45.599,0:41:52.240
-okay so before i show you
+the right hand side okay so before i show you
 
 0:41:48.800,0:41:53.839
-the cross-section for the left-hand side
-
-0:41:52.240,0:41:55.760
-the untrained version i'm going to show
+the cross-section for the left-hand side the untrained version i'm going to show
 
 0:41:53.839,0:41:56.800
-you now the cross-section for this train
-
-0:41:55.760,0:41:59.200
-version
+you now the cross-section for this train version
 
 0:41:56.800,0:42:01.359
-so the zero temperature limit the super
-
-0:41:59.200,0:42:02.640
-cold one i'm gonna get this red one with
+so the zero temperature limit the super cold one i'm gonna get this red one with
 
 0:42:01.359,0:42:05.280
-a spike
-
-0:42:02.640,0:42:05.839
-and then as you increase the temperature
+a spike and then as you increase the temperature
 
 0:42:05.280,0:42:08.400
-as you
-
-0:42:05.839,0:42:10.079
-reduce this beta we're moving up until
+as you reduce this beta we're moving up until
 
 0:42:08.400,0:42:14.400
-you get this you know average
-
-0:42:10.079,0:42:17.359
-version this parabolic uh blue one right
+you get this you know average version this parabolic uh blue one right
 
 0:42:14.400,0:42:18.800
-okay okay okay and so all of this was
-
-0:42:17.359,0:42:22.079
-about
+okay okay okay and so all of this was about
 
 0:42:18.800,0:42:23.680
-unsupervised learning right so far we
-
-0:42:22.079,0:42:28.400
-only have seen
+unsupervised learning right so far we only have seen
 
 0:42:23.680,0:42:30.880
-y's where are the x's
-
-0:42:28.400,0:42:32.319
-and so this is like yesterday night i'm
+y's where are the x's and so this is like yesterday night i'm
 
 0:42:30.880,0:42:34.560
-like okay maybe i don't talk about
-
-0:42:32.319,0:42:36.079
-supervised learning like i don't
+like okay maybe i don't talk about supervised learning like i don't
 
 0:42:34.560,0:42:38.000
-how long is going to take me to now
-
-0:42:36.079,0:42:39.040
-train a model with the x's and
+how long is going to take me to now train a model with the x's and
 
 0:42:38.000,0:42:42.400
-everything and
-
-0:42:39.040,0:42:44.800
-i don't want to do it but then
+everything and i don't want to do it but then
 
 0:42:42.400,0:42:45.680
-i just change one line of code and
-
-0:42:44.800,0:42:48.000
-everything just
+i just change one line of code and everything just
 
 0:42:45.680,0:42:49.920
-works so everything we have seen so far
-
-0:42:48.000,0:42:51.760
-is exactly the same
+works so everything we have seen so far is exactly the same
 
 0:42:49.920,0:42:53.200
-for the unconditional which is this
-
-0:42:51.760,0:42:56.240
-unsupervised
+for the unconditional which is this unsupervised
 
 0:42:53.200,0:42:57.760
-learning way and it's gonna
-
-0:42:56.240,0:43:00.240
-like one line change you're gonna get
+learning way and it's gonna like one line change you're gonna get
 
 0:42:57.760,0:43:00.640
-the supervised like the self-supervised
-
-0:43:00.240,0:43:03.040
-the
+the supervised like the self-supervised the
 
 0:43:00.640,0:43:03.760
-conditional and so now in the last five
-
-0:43:03.040,0:43:04.880
-minutes
+conditional and so now in the last five minutes
 
 0:43:03.760,0:43:06.880
-we're gonna be talking about the
-
-0:43:04.880,0:43:08.240
-self-supervised learning or the
+we're gonna be talking about the self-supervised learning or the
 
 0:43:06.880,0:43:10.400
-conditional case
-
-0:43:08.240,0:43:12.480
-what does this mean so let's get back to
+conditional case what does this mean so let's get back to
 
 0:43:10.400,0:43:13.280
-the training data this is my training
-
-0:43:12.480,0:43:16.560
-data right
+the training data this is my training data right
 
 0:43:13.280,0:43:19.119
-we have we try to learn this horn
-
-0:43:16.560,0:43:21.040
-that is starting with a horizontal mouth
+we have we try to learn this horn that is starting with a horizontal mouth
 
 0:43:19.119,0:43:24.240
-like it's like a closed mouth ah
-
-0:43:21.040,0:43:27.680
-like that and then it goes like a very
+like it's like a closed mouth ah like that and then it goes like a very
 
 0:43:24.240,0:43:28.400
-you know tall and narrow and then the
-
-0:43:27.680,0:43:31.119
-profile
+you know tall and narrow and then the profile
 
 0:43:28.400,0:43:31.760
-the envelope is exponential right so
-
-0:43:31.119,0:43:34.880
-here
+the envelope is exponential right so here
 
 0:43:31.760,0:43:38.240
-the the the radius
-
-0:43:34.880,0:43:39.839
-goes from beta to alpha and in x
+the the the radius goes from beta to alpha and in x
 
 0:43:38.240,0:43:41.839
-in exp like it's multiplied by the
-
-0:43:39.839,0:43:43.599
-exponential of two times the x
+in exp like it's multiplied by the exponential of two times the x
 
 0:43:41.839,0:43:45.920
-and the other case the goes from alpha
-
-0:43:43.599,0:43:46.640
-to beta and also it is multiplied by
+and the other case the goes from alpha to beta and also it is multiplied by
 
 0:43:45.920,0:43:50.560
-this
-
-0:43:46.640,0:43:52.640
-exponential so let's see if we can learn
+this exponential so let's see if we can learn
 
 0:43:50.560,0:43:54.960
-this stuff and i didn't know if it was
-
-0:43:52.640,0:43:58.160
-easy or hard i thought it was hard
+this stuff and i didn't know if it was easy or hard i thought it was hard
 
 0:43:54.960,0:43:59.119
-it was very easy and so untrained model
-
-0:43:58.160,0:44:02.000
-manifold
+it was very easy and so untrained model manifold
 
 0:43:59.119,0:44:03.119
-so let's give it a look how does my
-
-0:44:02.000,0:44:05.920
-model look now
+so let's give it a look how does my model look now
 
 0:44:03.119,0:44:07.359
-so i have a z and since i have control
-
-0:44:05.920,0:44:10.079
-over z i take
+so i have a z and since i have control over z i take
 
 0:44:07.359,0:44:12.079
-you know zero to two pi to pi excluded
-
-0:44:10.079,0:44:16.240
-that's why the bracket is flipped
+you know zero to two pi to pi excluded that's why the bracket is flipped
 
 0:44:12.079,0:44:19.359
-with an interval of pi over 24.
-
-0:44:16.240,0:44:21.760
-so i get a line on over there i fit this
+with an interval of pi over 24. so i get a line on over there i fit this
 
 0:44:19.359,0:44:22.480
-z on the decoder and then i'm gonna get
-
-0:44:21.760,0:44:25.440
-my y
+z on the decoder and then i'm gonna get my y
 
 0:44:22.480,0:44:26.319
-tilde which is gonna be moving uh like
-
-0:44:25.440,0:44:28.800
-going around
+tilde which is gonna be moving uh like going around
 
 0:44:26.319,0:44:32.960
-ellipses because that's how my network
-
-0:44:28.800,0:44:35.359
-is routed inside the decoder right
+ellipses because that's how my network is routed inside the decoder right
 
 0:44:32.960,0:44:37.520
-moreover we're gonna have our y's our
-
-0:44:35.359,0:44:39.040
-observer why is observed
+moreover we're gonna have our y's our observer why is observed
 
 0:44:37.520,0:44:41.280
-you can see is observed because there is
-
-0:44:39.040,0:44:44.160
-a shade in that
+you can see is observed because there is a shade in that
 
 0:44:41.280,0:44:45.200
-bubble there in the circle and now we
-
-0:44:44.160,0:44:47.920
-have a predictor
+bubble there in the circle and now we have a predictor
 
 0:44:45.200,0:44:48.319
-and the decoder not only takes my latent
-
-0:44:47.920,0:44:50.960
-z
+and the decoder not only takes my latent z
 
 0:44:48.319,0:44:51.760
-but also a predictor and the predictor
-
-0:44:50.960,0:44:54.880
-is fed
+but also a predictor and the predictor is fed
 
 0:44:51.760,0:44:56.480
-with my observed x and since again if i
-
-0:44:54.880,0:44:59.760
-have control over z
+with my observed x and since again if i have control over z
 
 0:44:56.480,0:45:02.960
-i can simply say it goes from 0 to 1
-
-0:44:59.760,0:45:05.280
-with 0.02 interval
+i can simply say it goes from 0 to 1 with 0.02 interval
 
 0:45:02.960,0:45:07.520
-let me show you how my untrained network
-
-0:45:05.280,0:45:11.200
-manifold looks right so this is what
+let me show you how my untrained network manifold looks right so this is what
 
 0:45:07.520,0:45:13.599
-this untrained network manifold looks
-
-0:45:11.200,0:45:15.680
-all right so how do i train this well i
+this untrained network manifold looks all right so how do i train this well i
 
 0:45:13.599,0:45:19.359
-just do the zero temperature limit
-
-0:45:15.680,0:45:22.000
-uh free energy training so given my
+just do the zero temperature limit uh free energy training so given my
 
 0:45:19.359,0:45:22.640
-horn as before i take one point one y
-
-0:45:22.000,0:45:25.200
-point
+horn as before i take one point one y point
 
 0:45:22.640,0:45:25.839
-i find the closest point on my manifold
-
-0:45:25.200,0:45:27.839
-and then
+i find the closest point on my manifold and then
 
 0:45:25.839,0:45:29.440
-i try to pull it up i take this other
-
-0:45:27.839,0:45:31.440
-point i take the closest point
+i try to pull it up i take this other point i take the closest point
 
 0:45:29.440,0:45:33.280
-and i put it down there i take this
-
-0:45:31.440,0:45:34.240
-point over here i take the closest point
+and i put it down there i take this point over here i take the closest point
 
 0:45:33.280,0:45:36.000
-and then put it on
-
-0:45:34.240,0:45:38.560
-i take this point over here on the on
+and then put it on i take this point over here on the on
 
 0:45:36.000,0:45:39.200
-the horn i take the closest point on the
-
-0:45:38.560,0:45:41.760
-manifold
+the horn i take the closest point on the manifold
 
 0:45:39.200,0:45:42.240
-and i pull it down i do that for one
-
-0:45:41.760,0:45:45.119
-epoch
+and i pull it down i do that for one epoch
 
 0:45:42.240,0:45:47.119
-only i told you it was very easy to
-
-0:45:45.119,0:45:49.839
-train this model
+only i told you it was very easy to train this model
 
 0:45:47.119,0:45:51.599
-and we get actually i had to define
-
-0:45:49.839,0:45:53.359
-first what is the energy function right
+and we get actually i had to define first what is the energy function right
 
 0:45:51.599,0:45:56.240
-so my energy function
-
-0:45:53.359,0:45:57.920
-uh in this case is going to be this e of
+so my energy function uh in this case is going to be this e of
 
 0:45:56.240,0:46:01.200
-x y and z
-
-0:45:57.920,0:46:03.599
-where again those two components uh like
+x y and z where again those two components uh like
 
 0:46:01.200,0:46:05.040
-it's going to be the sum of the square
-
-0:46:03.599,0:46:08.560
-distances
+it's going to be the sum of the square distances
 
 0:46:05.040,0:46:10.560
-but in this case i have f and g right so
-
-0:46:08.560,0:46:13.760
-we have a predictor f
+but in this case i have f and g right so we have a predictor f
 
 0:46:10.560,0:46:16.800
-which are both of them mapping r to r r2
-
-0:46:13.760,0:46:19.520
-and then f is going to be a neural net
+which are both of them mapping r to r r2 and then f is going to be a neural net
 
 0:46:16.800,0:46:20.160
-mapping my input x through a linear
-
-0:46:19.520,0:46:22.720
-layer and
+mapping my input x through a linear layer and
 
 0:46:20.160,0:46:23.440
-reload to a eight dimensional hidden
-
-0:46:22.720,0:46:25.280
-layer
+reload to a eight dimensional hidden layer
 
 0:46:23.440,0:46:26.880
-then i go again through another linear
-
-0:46:25.280,0:46:28.000
-layer and reload another eight
+then i go again through another linear layer and reload another eight
 
 0:46:26.880,0:46:29.680
-dimensional
-
-0:46:28.000,0:46:31.760
-hidden layer and then i have my final
+dimensional hidden layer and then i have my final
 
 0:46:29.680,0:46:32.880
-linear layer to end up in two dimensions
-
-0:46:31.760,0:46:36.400
-so i have a four layer
+linear layer to end up in two dimensions so i have a four layer
 
 0:46:32.880,0:46:38.560
-network input two hidden of size eight
-
-0:46:36.400,0:46:41.040
-and then one output of size two
+network input two hidden of size eight and then one output of size two
 
 0:46:38.560,0:46:42.000
-and then my g function is simply what
-
-0:46:41.040,0:46:45.839
-allows me to get
+and then my g function is simply what allows me to get
 
 0:46:42.000,0:46:47.839
-this z going in in loops okay
-
-0:46:45.839,0:46:50.240
-and but then the point now is that these
+this z going in in loops okay and but then the point now is that these
 
 0:46:47.839,0:46:53.359
-two components are going to be scaled
-
-0:46:50.240,0:46:56.079
-by the output of f
+two components are going to be scaled by the output of f
 
 0:46:53.359,0:46:58.000
-so this is my model very very tiny very
-
-0:46:56.079,0:47:01.040
-tiny model
+so this is my model very very tiny very tiny model
 
 0:46:58.000,0:47:01.760
-and i'm going to be training it and then
-
-0:47:01.040,0:47:04.480
-i show you
+and i'm going to be training it and then i show you
 
 0:47:01.760,0:47:05.040
-the model manifold so again i take the
-
-0:47:04.480,0:47:08.160
-same
+the model manifold so again i take the same
 
 0:47:05.040,0:47:09.040
-discretization for the znx and this is
-
-0:47:08.160,0:47:13.040
-how
+discretization for the znx and this is how
 
 0:47:09.040,0:47:16.240
-the training train model manifold looks
-
-0:47:13.040,0:47:17.839
-it's awesome right i think it's just
+the training train model manifold looks it's awesome right i think it's just
 
 0:47:16.240,0:47:19.839
-great
-
-0:47:17.839,0:47:21.119
-all right and this one took nothing no
+great all right and this one took nothing no
 
 0:47:19.839,0:47:25.359
-time to train
-
-0:47:21.119,0:47:27.440
-so how can we move on how
+time to train so how can we move on how
 
 0:47:25.359,0:47:29.200
-how what do we do next right how do we
-
-0:47:27.440,0:47:31.760
-move forward from here
+how what do we do next right how do we move forward from here
 
 0:47:29.200,0:47:34.079
-so there are a few more ways to scale
-
-0:47:31.760,0:47:36.079
-this up not to toy example so
+so there are a few more ways to scale this up not to toy example so
 
 0:47:34.079,0:47:37.359
-so far i've been kind of cheating right
-
-0:47:36.079,0:47:40.800
-i've been always
+so far i've been kind of cheating right i've been always
 
 0:47:37.359,0:47:41.520
-uh embedding into d decoder the fact
-
-0:47:40.800,0:47:44.960
-that my
+uh embedding into d decoder the fact that my
 
 0:47:41.520,0:47:45.359
-z goes around circles but i don't know
-
-0:47:44.960,0:47:48.160
-that
+z goes around circles but i don't know that
 
 0:47:45.359,0:47:49.920
-right so we don't know that and so we
-
-0:47:48.160,0:47:53.359
-may use something like this
+right so we don't know that and so we may use something like this
 
 0:47:49.920,0:47:56.559
-in this case my g function takes my
-
-0:47:53.359,0:47:58.720
-f and z
+in this case my g function takes my f and z
 
 0:47:56.559,0:47:59.839
-and then you know g can be a neural net
-
-0:47:58.720,0:48:01.680
-as well and
+and then you know g can be a neural net as well and
 
 0:47:59.839,0:48:02.960
-in this case i have to learn the fact
-
-0:48:01.680,0:48:06.240
-that this stuff
+in this case i have to learn the fact that this stuff
 
 0:48:02.960,0:48:09.920
-moves around circles so i in this case i
-
-0:48:06.240,0:48:13.119
-should be learning this sine and cosine
+moves around circles so i in this case i should be learning this sine and cosine
 
 0:48:09.920,0:48:15.839
-but then how do i know that actually
-
-0:48:13.119,0:48:18.000
-z is one dimensional well i know because
+but then how do i know that actually z is one dimensional well i know because
 
 0:48:15.839,0:48:20.559
-i generated my data right so
-
-0:48:18.000,0:48:22.480
-i am the owner of my data generation
+i generated my data right so i am the owner of my data generation
 
 0:48:20.559,0:48:25.680
-process so i knew that
-
-0:48:22.480,0:48:28.000
-uh theta was a one-dimensional item so
+process so i knew that uh theta was a one-dimensional item so
 
 0:48:25.680,0:48:29.359
-definitely i can just use a latent that
-
-0:48:28.000,0:48:31.040
-is one-dimensional but
+definitely i can just use a latent that is one-dimensional but
 
 0:48:29.359,0:48:33.599
-no one can tell me that for you know
-
-0:48:31.040,0:48:36.079
-natural images or whatever right so
+no one can tell me that for you know natural images or whatever right so
 
 0:48:33.599,0:48:37.760
-that's the other big issue and so how
-
-0:48:36.079,0:48:38.720
-how would we deal with the fact that we
+that's the other big issue and so how how would we deal with the fact that we
 
 0:48:37.760,0:48:42.240
-don't know
-
-0:48:38.720,0:48:44.160
-what is the correct size of my latent
+don't know what is the correct size of my latent
 
 0:48:42.240,0:48:45.839
-uh because again if you choose a large
-
-0:48:44.160,0:48:48.319
-latent you're going to be
+uh because again if you choose a large latent you're going to be
 
 0:48:45.839,0:48:49.839
-very easily overfitting everything and
-
-0:48:48.319,0:48:51.520
-so in this case
+very easily overfitting everything and so in this case
 
 0:48:49.839,0:48:53.920
-what changes from the previous slide
-
-0:48:51.520,0:48:58.319
-which is this one is that now z
+what changes from the previous slide which is this one is that now z
 
 0:48:53.920,0:49:00.400
-is a vector okay so z is a vector
-
-0:48:58.319,0:49:01.760
-no longer just a single line so actually
+is a vector okay so z is a vector no longer just a single line so actually
 
 0:49:00.400,0:49:03.200
-this should be a vector and this should
-
-0:49:01.760,0:49:06.640
-be like whatever
+this should be a vector and this should be like whatever
 
 0:49:03.200,0:49:09.680
-the shape and now my g goes from
-
-0:49:06.640,0:49:11.200
-the dimension of f and you know
+the shape and now my g goes from the dimension of f and you know
 
 0:49:09.680,0:49:11.599
-cartesian product with the dimension of
-
-0:49:11.200,0:49:15.119
-set
+cartesian product with the dimension of set
 
 0:49:11.599,0:49:18.240
-into r2 now the issue is that
-
-0:49:15.119,0:49:20.640
-we need to regularize
+into r2 now the issue is that we need to regularize
 
 0:49:18.240,0:49:21.920
-this loss functional because otherwise
-
-0:49:20.640,0:49:23.040
-you are going to be drastically over
+this loss functional because otherwise you are going to be drastically over
 
 0:49:21.920,0:49:25.280
-fitting right
-
-0:49:23.040,0:49:26.160
-and so this is what current research
+fitting right and so this is what current research
 
 0:49:25.280,0:49:28.880
-with jan
-
-0:49:26.160,0:49:30.000
-uh is now what me and my students are
+with jan uh is now what me and my students are
 
 0:49:28.880,0:49:32.559
-doing
-
-0:49:30.000,0:49:35.200
-with jan we are trying to figure out
+doing with jan we are trying to figure out
 
 0:49:32.559,0:49:37.440
-ways to regularize the latent variable
-
-0:49:35.200,0:49:38.240
-such that we can you know make things
+ways to regularize the latent variable such that we can you know make things
 
 0:49:37.440,0:49:42.480
-actually not
-
-0:49:38.240,0:49:44.640
-simply overfit and that was it
+actually not simply overfit and that was it
 
 0:49:42.480,0:49:45.680
-that was all i had to tell you about
-
-0:49:44.640,0:49:49.040
-latent variable
+that was all i had to tell you about latent variable
 
 0:49:45.680,0:49:51.680
-energy-based models inference training
-
-0:49:49.040,0:49:52.880
-zero temperature limit a bit warmer than
+energy-based models inference training zero temperature limit a bit warmer than
 
 0:49:51.680,0:49:55.680
-free energy
-
-0:49:52.880,0:49:56.240
-uh and then we saw the unconditional
+free energy uh and then we saw the unconditional
 
 0:49:55.680,0:49:57.920
-case
-
-0:49:56.240,0:50:00.640
-with unsupervised learning and then we
+case with unsupervised learning and then we
 
 0:49:57.920,0:50:02.640
-have seen the conditional case with the
-
-0:50:00.640,0:50:04.240
-uh self-supervised learning right where
+have seen the conditional case with the uh self-supervised learning right where
 
 0:50:02.640,0:50:07.280
-we have access to these
-
-0:50:04.240,0:50:09.760
-acts and the code to train
+we have access to these acts and the code to train
 
 0:50:07.280,0:50:11.760
-these two models uh like the code that i
-
-0:50:09.760,0:50:14.319
-use for training the conditional case
+these two models uh like the code that i use for training the conditional case
 
 0:50:11.760,0:50:16.480
-it's just the same code as i use for the
-
-0:50:14.319,0:50:18.160
-unsupervised in a supervised case but
+it's just the same code as i use for the unsupervised in a supervised case but
 
 0:50:16.480,0:50:20.079
-with one line change so
-
-0:50:18.160,0:50:22.559
-really really it doesn't take much
+with one line change so really really it doesn't take much
 
 0:50:20.079,0:50:25.680
-effort to put this together
-
-0:50:22.559,0:50:28.240
-what it took some effort was to draw
+effort to put this together what it took some effort was to draw
 
 0:50:25.680,0:50:30.640
-the slides but again that's just because
-
-0:50:28.240,0:50:33.680
-i like making things pretty
+the slides but again that's just because i like making things pretty
 
 0:50:30.640,0:50:37.119
-and that was it ah
-
-0:50:33.680,0:50:39.440
-thank you for listening questions please
+and that was it ah thank you for listening questions please
 
 0:50:37.119,0:50:41.520
-go on i mean it's done right class is
-
-0:50:39.440,0:50:44.720
-finished
+go on i mean it's done right class is finished
 
 0:50:41.520,0:50:44.720
-you can ask anything you want
-
-0:50:45.200,0:50:48.000
-are you still awake
+you can ask anything you want are you still awake
 
 0:50:51.280,0:50:55.119
-yes okay someone is awake can you
-
-0:50:53.040,0:50:57.359
-explain the input dimension of
+yes okay someone is awake can you explain the input dimension of
 
 0:50:55.119,0:50:59.200
-g again yes i can explain as much as you
-
-0:50:57.359,0:51:00.240
-want as you want so now it's office
+g again yes i can explain as much as you want as you want so now it's office
 
 0:50:59.200,0:51:02.880
-hours right
-
-0:51:00.240,0:51:04.400
-you can ask anything you want uh hold on
+hours right you can ask anything you want uh hold on
 
 0:51:02.880,0:51:05.839
-first question so can you explain the
-
-0:51:04.400,0:51:07.920
-input dimension of g
+first question so can you explain the input dimension of g
 
 0:51:05.839,0:51:09.680
-uh in this case let me go back to the
-
-0:51:07.920,0:51:13.200
-first case
+uh in this case let me go back to the first case
 
 0:51:09.680,0:51:15.839
-in this case g is 1 because it was
-
-0:51:13.200,0:51:18.400
-fed with z right and then the output was
+in this case g is 1 because it was fed with z right and then the output was
 
 0:51:15.839,0:51:20.559
-this g1 and g2 which were cosine and
-
-0:51:18.400,0:51:23.200
-sine
+this g1 and g2 which were cosine and sine
 
 0:51:20.559,0:51:24.559
-in the second case g is going to be the
-
-0:51:23.200,0:51:26.559
-input is going to be this
+in the second case g is going to be the input is going to be this
 
 0:51:24.559,0:51:28.160
-f which we don't know exactly the
-
-0:51:26.559,0:51:31.599
-dimension can be anything
+f which we don't know exactly the dimension can be anything
 
 0:51:28.160,0:51:33.839
-so the dimension of f and then
-
-0:51:31.599,0:51:34.960
-that given that i know that z is one
+so the dimension of f and then that given that i know that z is one
 
 0:51:33.839,0:51:37.599
-dimensional
-
-0:51:34.960,0:51:38.079
-finally which is the super you know norm
+dimensional finally which is the super you know norm
 
 0:51:37.599,0:51:41.520
-like
-
-0:51:38.079,0:51:41.920
-the actual case that the more realistic
+like the actual case that the more realistic
 
 0:51:41.520,0:51:44.640
-case
-
-0:51:41.920,0:51:45.520
-is this one where we don't necessarily
+case is this one where we don't necessarily
 
 0:51:44.640,0:51:48.079
-know what is
-
-0:51:45.520,0:51:48.720
-the supposed the dimension for the
+know what is the supposed the dimension for the
 
 0:51:48.079,0:51:51.359
-latent
-
-0:51:48.720,0:51:53.359
-and therefore now we're going to use a
+latent and therefore now we're going to use a
 
 0:51:51.359,0:51:54.800
-whatever dimensional variable latent
-
-0:51:53.359,0:51:58.000
-variable but
+whatever dimensional variable latent variable but
 
 0:51:54.800,0:51:59.280
-it's going to be necessary to regularize
-
-0:51:58.000,0:52:02.319
-the
+it's going to be necessary to regularize the
 
 0:51:59.280,0:52:04.400
-loss functional otherwise as i was
-
-0:52:02.319,0:52:07.599
-pointing out you can easily overfit
+loss functional otherwise as i was pointing out you can easily overfit
 
 0:52:04.400,0:52:09.920
-by using that zero temperature limit
-
-0:52:07.599,0:52:11.359
-uh nevertheless you can use you can warm
+by using that zero temperature limit uh nevertheless you can use you can warm
 
 0:52:09.920,0:52:13.599
-up the the temperature
-
-0:52:11.359,0:52:15.280
-and use that as a regularizer of course
+up the the temperature and use that as a regularizer of course
 
 0:52:13.599,0:52:18.880
-right
-
-0:52:15.280,0:52:21.520
-did you get it uh yeah
+right did you get it uh yeah
 
 0:52:18.880,0:52:23.200
-so next question how does this look
-
-0:52:21.520,0:52:24.960
-without a latent variable
+so next question how does this look without a latent variable
 
 0:52:23.200,0:52:26.880
-okay without a latent variable it's
-
-0:52:24.960,0:52:30.079
-exactly as turning
+okay without a latent variable it's exactly as turning
 
 0:52:26.880,0:52:33.920
-beta to zero okay
-
-0:52:30.079,0:52:37.280
-so beta to zero you just average
+beta to zero okay so beta to zero you just average
 
 0:52:33.920,0:52:39.680
-over all possible values how
-
-0:52:37.280,0:52:41.599
-does what does happen what what are you
+over all possible values how does what does happen what what are you
 
 0:52:39.680,0:52:44.319
-going to be ending up having
-
-0:52:41.599,0:52:44.800
-if you start here on the left side and
+going to be ending up having if you start here on the left side and
 
 0:52:44.319,0:52:47.119
-then
-
-0:52:44.800,0:52:47.920
-instead of having all these arrows that
+then instead of having all these arrows that
 
 0:52:47.119,0:52:50.400
-are shaped
-
-0:52:47.920,0:52:51.119
-now like that all these arrows will have
+are shaped now like that all these arrows will have
 
 0:52:50.400,0:52:54.000
-the same
-
-0:52:51.119,0:52:55.599
-length well actually these points over
+the same length well actually these points over
 
 0:52:54.000,0:52:56.880
-here will be even longer now because
-
-0:52:55.599,0:53:00.960
-they are further away
+here will be even longer now because they are further away
 
 0:52:56.880,0:53:03.280
-so these ellipse will be pulled
-
-0:53:00.960,0:53:04.559
-in every direction and the way to
+so these ellipse will be pulled in every direction and the way to
 
 0:53:03.280,0:53:07.200
-minimize this energy
-
-0:53:04.559,0:53:08.480
-is actually to make it collapse in a
+minimize this energy is actually to make it collapse in a
 
 0:53:07.200,0:53:12.079
-single point
-
-0:53:08.480,0:53:13.760
-center in zero and so that's the actual
+single point center in zero and so that's the actual
 
 0:53:12.079,0:53:16.160
-it's a very good question right so what
-
-0:53:13.760,0:53:20.000
-is the classical
+it's a very good question right so what is the classical
 
 0:53:16.160,0:53:22.000
-failure mode in you know neural network
-
-0:53:20.000,0:53:24.000
-whenever you have multiple targets
+failure mode in you know neural network whenever you have multiple targets
 
 0:53:22.000,0:53:26.880
-associated to the same input
-
-0:53:24.000,0:53:29.440
-you end up predicting the average of all
+associated to the same input you end up predicting the average of all
 
 0:53:26.880,0:53:31.520
-the possible targets
-
-0:53:29.440,0:53:32.960
-in this case the average of all possible
+the possible targets in this case the average of all possible
 
 0:53:31.520,0:53:34.640
-targets that are all those
-
-0:53:32.960,0:53:36.960
-points in the ellipse is going to be
+targets that are all those points in the ellipse is going to be
 
 0:53:34.640,0:53:39.680
-just the point in the origin
-
-0:53:36.960,0:53:41.440
-which is like the collapse of your model
+just the point in the origin which is like the collapse of your model
 
 0:53:39.680,0:53:42.880
-right so that's a very good question and
-
-0:53:41.440,0:53:46.480
-the point is that
+right so that's a very good question and the point is that
 
 0:53:42.880,0:53:49.760
-if you try to learn multi modal output
-
-0:53:46.480,0:53:54.079
-multimodal data set a data with
+if you try to learn multi modal output multimodal data set a data with
 
 0:53:49.760,0:53:56.640
-a msc like without latent with zero
-
-0:53:54.079,0:53:57.440
-zero beta infinite temperature you're
+a msc like without latent with zero zero beta infinite temperature you're
 
 0:53:56.640,0:54:00.559
-just you know
-
-0:53:57.440,0:54:03.599
-collapsing uh to the mean
+just you know collapsing uh to the mean
 
 0:54:00.559,0:54:04.000
-the average right m e a and not m i n
-
-0:54:03.599,0:54:07.280
-mean
+the average right m e a and not m i n mean
 
 0:54:04.000,0:54:09.599
-average all right another question
-
-0:54:07.280,0:54:10.880
-uh to be clear at the zero temperature
+average all right another question uh to be clear at the zero temperature
 
 0:54:09.599,0:54:14.000
-limit the loss
-
-0:54:10.880,0:54:18.079
-is only considering the energy
+limit the loss is only considering the energy
 
 0:54:14.000,0:54:21.440
-of the nearest point yeah
-
-0:54:18.079,0:54:24.559
-and as we warm it up the loss is using
+of the nearest point yeah and as we warm it up the loss is using
 
 0:54:21.440,0:54:26.640
-a weighted sum of all points and yes
-
-0:54:24.559,0:54:28.480
-and the weighting weights that you're
+a weighted sum of all points and yes and the weighting weights that you're
 
 0:54:26.640,0:54:30.480
-using for the weight of the sum
-
-0:54:28.480,0:54:31.760
-is the are the weights that are coming
+using for the weight of the sum is the are the weights that are coming
 
 0:54:30.480,0:54:34.559
-from the uh
-
-0:54:31.760,0:54:36.000
-soft argument right if you take the arg
+from the uh soft argument right if you take the arg
 
 0:54:34.559,0:54:38.400
-softening
-
-0:54:36.000,0:54:39.760
-you have soft mean of the energy right
+softening you have soft mean of the energy right
 
 0:54:38.400,0:54:41.920
-so that's what you get
-
-0:54:39.760,0:54:44.160
-you have the soft mean of the energy
+so that's what you get you have the soft mean of the energy
 
 0:54:41.920,0:54:45.280
-right so the f tilde it's soft mean of
-
-0:54:44.160,0:54:48.240
-the energy
+right so the f tilde it's soft mean of the energy
 
 0:54:45.280,0:54:49.520
-you take the derivative of the softmin
-
-0:54:48.240,0:54:51.440
-you get the
+you take the derivative of the softmin you get the
 
 0:54:49.520,0:54:53.680
-what you get you get the exponential
-
-0:54:51.440,0:54:56.160
-divided by the sum of exponential
+what you get you get the exponential divided by the sum of exponential
 
 0:54:53.680,0:54:56.799
-so that's the soft argument right
-
-0:54:56.160,0:55:00.160
-multiply
+so that's the soft argument right multiply
 
 0:54:56.799,0:55:03.359
-by e prime what is e prime e
-
-0:55:00.160,0:55:04.960
-was the square distance so if you take
+by e prime what is e prime e was the square distance so if you take
 
 0:55:03.359,0:55:05.599
-the derivative of the square distance
-
-0:55:04.960,0:55:08.319
-you just get
+the derivative of the square distance you just get
 
 0:55:05.599,0:55:10.079
-the vector which is now multiplied by
-
-0:55:08.319,0:55:12.559
-this soft argument
+the vector which is now multiplied by this soft argument
 
 0:55:10.079,0:55:13.359
-so exactly what you said uh which is
-
-0:55:12.559,0:55:15.839
-very good
+so exactly what you said uh which is very good
 
 0:55:13.359,0:55:18.000
-summary i'm gonna just read it again and
-
-0:55:15.839,0:55:21.119
-i show the other chart
+summary i'm gonna just read it again and i show the other chart
 
 0:55:18.000,0:55:22.720
-so i just read your comment
-
-0:55:21.119,0:55:25.119
-to be clear at the zero temperature
+so i just read your comment to be clear at the zero temperature
 
 0:55:22.720,0:55:26.799
-limit the loss is only considering the
-
-0:55:25.119,0:55:28.480
-energy of the nearest point
+limit the loss is only considering the energy of the nearest point
 
 0:55:26.799,0:55:29.839
-the distance the square distance to the
-
-0:55:28.480,0:55:31.599
-closest point yeah
+the distance the square distance to the closest point yeah
 
 0:55:29.839,0:55:34.160
-and as you warm it up the loss is going
-
-0:55:31.599,0:55:37.200
-to be the weighted sum
+and as you warm it up the loss is going to be the weighted sum
 
 0:55:34.160,0:55:42.480
-of not the points right what is sum uh
-
-0:55:37.200,0:55:45.599
-of all those um contributions right
+of not the points right what is sum uh of all those um contributions right
 
 0:55:42.480,0:55:46.319
-the x uh this exponential of the minus
-
-0:55:45.599,0:55:49.119
-beta e
+the x uh this exponential of the minus beta e
 
 0:55:46.319,0:55:50.319
-right that's what that was written here
-
-0:55:49.119,0:55:51.760
-on the top right
+right that's what that was written here on the top right
 
 0:55:50.319,0:55:53.760
-so as you warm it up you're going to get
-
-0:55:51.760,0:55:56.160
-this exponential which is the soft mean
+so as you warm it up you're going to get this exponential which is the soft mean
 
 0:55:53.760,0:55:56.799
-so soft mean and then if you compute the
-
-0:55:56.160,0:55:59.280
-uh
+so soft mean and then if you compute the uh
 
 0:55:56.799,0:56:00.960
-the derivative you're going to get the
-
-0:55:59.280,0:56:02.640
-soft argument multiplied by the
+the derivative you're going to get the soft argument multiplied by the
 
 0:56:00.960,0:56:04.400
-derivative of the energy
-
-0:56:02.640,0:56:06.319
-which are the arrows multiplied by the
+derivative of the energy which are the arrows multiplied by the
 
 0:56:04.400,0:56:09.359
-soft argument so cool
-
-0:56:06.319,0:56:10.799
-what happens if we allow z to move
+soft argument so cool what happens if we allow z to move
 
 0:56:09.359,0:56:13.839
-freely into the space
-
-0:56:10.799,0:56:15.839
-we're going to basically get a collapsed
+freely into the space we're going to basically get a collapsed
 
 0:56:13.839,0:56:17.119
-network this model can simply output
-
-0:56:15.839,0:56:20.640
-zero everywhere
+network this model can simply output zero everywhere
 
 0:56:17.119,0:56:21.520
-and that's where you may need to use the
-
-0:56:20.640,0:56:24.559
-contrastive
+and that's where you may need to use the contrastive
 
 0:56:21.520,0:56:26.319
-uh cases right so in that case uh you
-
-0:56:24.559,0:56:28.319
-know a very easy way to get
+uh cases right so in that case uh you know a very easy way to get
 
 0:56:26.319,0:56:31.200
-zero energy is gonna be just everything
-
-0:56:28.319,0:56:33.119
-zero right uh but in this in the
+zero energy is gonna be just everything zero right uh but in this in the
 
 0:56:31.200,0:56:35.599
-in this case you can use the contrastive
-
-0:56:33.119,0:56:37.760
-case you can say oh no in this case it
+in this case you can use the contrastive case you can say oh no in this case it
 
 0:56:35.599,0:56:40.720
-should be larger than some margin
-
-0:56:37.760,0:56:42.319
-and so that's how you can deal with this
+should be larger than some margin and so that's how you can deal with this
 
 0:56:40.720,0:56:45.760
-larger than uh
-
-0:56:42.319,0:56:48.880
-like z into d okay so
+larger than uh like z into d okay so
 
 0:56:45.760,0:56:49.520
-taking beta okay so taking beta uh to
-
-0:56:48.880,0:56:51.760
-zero
+taking beta okay so taking beta uh to zero
 
 0:56:49.520,0:56:53.839
-would defeat the purpose of having a
-
-0:56:51.760,0:56:54.319
-latent variable at all that's exactly
+would defeat the purpose of having a latent variable at all that's exactly
 
 0:56:53.839,0:56:56.720
-yeah
-
-0:56:54.319,0:56:58.480
-and so this is what i kind of briefly
+yeah and so this is what i kind of briefly
 
 0:56:56.720,0:57:02.400
-show you i didn't talk about
-
-0:56:58.480,0:57:04.160
-but this is like a a quick uh derivation
+show you i didn't talk about but this is like a a quick uh derivation
 
 0:57:02.400,0:57:06.559
-by showing you that if you go beta
-
-0:57:04.160,0:57:07.440
-equals zero like the limit for beta that
+by showing you that if you go beta equals zero like the limit for beta that
 
 0:57:06.559,0:57:10.480
-tends to zero
-
-0:57:07.440,0:57:11.440
-you retrieve the average across all the
+tends to zero you retrieve the average across all the
 
 0:57:10.480,0:57:14.640
-latent
-
-0:57:11.440,0:57:16.079
-and that's basically the you end up with
+latent and that's basically the you end up with
 
 0:57:14.640,0:57:18.240
-having msc right
-
-0:57:16.079,0:57:19.920
-you end up throwing away all those kind
+having msc right you end up throwing away all those kind
 
 0:57:18.240,0:57:23.760
-of uh the goodies
-
-0:57:19.920,0:57:26.319
-right and that was pretty much it
+of uh the goodies right and that was pretty much it
 
 0:57:23.760,0:57:28.480
-how can you get more out of this lesson
-
-0:57:26.319,0:57:30.480
-firstly comprehension
+how can you get more out of this lesson firstly comprehension
 
 0:57:28.480,0:57:33.119
-if anything was not clear ask me
-
-0:57:30.480,0:57:34.480
-anything in the comment section below
+if anything was not clear ask me anything in the comment section below
 
 0:57:33.119,0:57:36.640
-if you would like to follow up with the
-
-0:57:34.480,0:57:39.760
-latest news follow me on twitter
+if you would like to follow up with the latest news follow me on twitter
 
 0:57:36.640,0:57:42.160
-under the endl alph cnz
-
-0:57:39.760,0:57:44.000
-if you would like to be notified when i
+under the endl alph cnz if you would like to be notified when i
 
 0:57:42.160,0:57:45.680
-upload the latest video
-
-0:57:44.000,0:57:47.839
-don't forget to subscribe to the channel
+upload the latest video don't forget to subscribe to the channel
 
 0:57:45.680,0:57:50.079
-and turn on the notification bell
-
-0:57:47.839,0:57:52.000
-and if you like this video don't forget
+and turn on the notification bell and if you like this video don't forget
 
 0:57:50.079,0:57:54.400
-to put a thumb up
-
-0:57:52.000,0:57:55.920
-this video has a transcript in english
+to put a thumb up this video has a transcript in english
 
 0:57:54.400,0:57:57.359
-and if you would like to contribute to
-
-0:57:55.920,0:58:00.160
-the translation in your language
+and if you would like to contribute to the translation in your language
 
 0:57:57.359,0:58:01.839
-please let me know so here as you can
-
-0:58:00.160,0:58:05.359
-see we have the
+please let me know so here as you can see we have the
 
 0:58:01.839,0:58:07.920
-write up where we can see all these
-
-0:58:05.359,0:58:10.319
-video that has been transcribed here in
+write up where we can see all these video that has been transcribed here in
 
 0:58:07.920,0:58:13.040
-plain english
-
-0:58:10.319,0:58:14.240
-and then again as i said before if we go
+plain english and then again as i said before if we go
 
 0:58:13.040,0:58:16.240
-back to the homepage
-
-0:58:14.240,0:58:18.720
-we can see here in the english flag and
+back to the homepage we can see here in the english flag and
 
 0:58:16.240,0:58:20.799
-we can select different languages
-
-0:58:18.720,0:58:22.079
-now we have arabic spanish version
+we can select different languages now we have arabic spanish version
 
 0:58:20.799,0:58:24.960
-french italian japanese
-
-0:58:22.079,0:58:25.920
-korean russian turkish and chinese and
+french italian japanese korean russian turkish and chinese and
 
 0:58:24.960,0:58:28.640
-your language is just
-
-0:58:25.920,0:58:29.359
-waiting for you to be translated in
+your language is just waiting for you to be translated in
 
 0:58:28.640,0:58:31.599
-finally
-
-0:58:29.359,0:58:32.960
-do play with notebook and by torch in
+finally do play with notebook and by torch in
 
 0:58:31.599,0:58:35.520
-order to get yourself
-
-0:58:32.960,0:58:36.640
-more acquainted with all these new
+order to get yourself more acquainted with all these new
 
 0:58:35.520,0:58:38.880
-topics
-
-0:58:36.640,0:58:41.040
-and then if you find any typo or
+topics and then if you find any typo or
 
 0:58:38.880,0:58:41.520
-mistakes or anything just please let me
-
-0:58:41.040,0:58:43.760
-know
+mistakes or anything just please let me know
 
 0:58:41.520,0:58:46.079
-directly on github or if you feel brave
-
-0:58:43.760,0:58:48.960
-enough you can even send a pull request
+directly on github or if you feel brave enough you can even send a pull request
 
 0:58:46.079,0:58:51.280
-it will be gladly appreciated thank you
-
-0:58:48.960,0:58:54.240
-for listening and don't forget to like
+it will be gladly appreciated thank you for listening and don't forget to like
 
 0:58:51.280,0:58:54.240
-share and subscribe
-
-0:58:55.000,0:58:58.000
-bye
-
+share and subscribe bye