-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCS 285 Lecture 1, Part 4.srt
496 lines (372 loc) · 15.9 KB
/
CS 285 Lecture 1, Part 4.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
1
00:00:01,360 --> 00:00:04,000
so for the last portion of this first lecture module
2
00:00:04,000 --> 00:00:09,840
i'd like to come back to a question that i asked earlier how do we build intelligent machines
3
00:00:09,840 --> 00:00:19,039
if we really want to think about this problem kind of as engineers if we forget everything that we know about learning reinforcement learning and so on
4
00:00:19,039 --> 00:00:22,640
and you have to build an intelligent machine where would you start
5
00:00:22,640 --> 00:00:29,359
maybe a very logical place to start is to think about all of the things that the brain has to do the human brain
6
00:00:29,359 --> 00:00:31,359
think of them as independent modules
7
00:00:31,359 --> 00:00:42,800
and try to program in modules that do that breaking up the brain into individual parts and thinking about their function is a very old study and you know that goes back before even the modern cientific era
8
00:00:42,800 --> 00:00:46,879
of course today's understanding is a little more sophisticated than it was in the 19th century
9
00:00:46,879 --> 00:00:52,399
but there's still a lot of work trying to anatomically chop up the brain into parts with different functions
10
00:00:52,399 --> 00:00:59,039
so then it's very tempting as an engineer to think about building individual computer components to reproduce those functions
11
00:00:59,039 --> 00:01:00,719
but this quickly becomes very difficult
12
00:01:00,719 --> 00:01:05,680
because there are a lot of these parts implementing each of them is very hard
13
00:01:05,680 --> 00:01:11,439
and implementing all of them might require an inordinate amount of work
14
00:01:11,439 --> 00:01:17,119
what if we instead consider whether learning could be seen as the basis of intelligence
15
00:01:17,119 --> 00:01:24,640
i'm going to make an argument for why this might be true this is not by any means a widely accepted or universally accepted opinion
16
00:01:24,640 --> 00:01:29,040
but perhaps you know we can just entertain that notion and see where it leads us
17
00:01:29,040 --> 00:01:32,799
one way we could argue for this notion is to say well there are some things that we can all do like walking
18
00:01:32,799 --> 00:01:37,040
but there are some other things that clearly we have to learn
19
00:01:37,040 --> 00:01:42,240
because there's no way that humans could through evolution be prepared for things like driving a car
20
00:01:42,240 --> 00:01:45,360
cars weren't around when we evolved
21
00:01:45,360 --> 00:01:48,399
so the argument here would be that we can learn a huge variety of things
22
00:01:48,399 --> 00:01:53,439
so that second category some things we can only learn it's a very large category there's a huge variety of things we can learn
23
00:01:53,439 --> 00:01:55,600
including extremely difficult things
24
00:01:55,600 --> 00:02:02,880
and while there might be some things that we could argue or innate the range of things we can learn is so broad that perhaps it basically captures everything that we care about
25
00:02:02,880 --> 00:02:07,920
and those things that are innate we could have learned them also if they weren't an age to begin with
26
00:02:07,920 --> 00:02:14,160
so therefore our learning mechanisms are likely powerful enough to do basically everything that we associate with intelligence
27
00:02:14,160 --> 00:02:18,080
maybe you agree or disagree with this notion but if you humor this notion for a second
28
00:02:18,080 --> 00:02:26,480
we could think about how we'd use it to refine our plan for building intelligence
29
00:02:26,480 --> 00:02:36,000
now we could simply take this recipe and use it to instead of designing the functionality of each module instead design a learning algorithm for each module so we say well okay
30
00:02:36,000 --> 00:02:39,280
we're not going to implement the visual cortex of the motor cortex by hand
31
00:02:39,280 --> 00:02:42,319
what we'll instead implement is a learning algorithm for the visual cortex
32
00:02:42,384 --> 00:02:45,350
and a separate algorithm for the motor cortex
33
00:02:45,350 --> 00:02:51,680
this kind of thing was a dominant way of thinking about machine learning in the 90s and early 2000s
34
00:02:51,680 --> 00:02:56,959
but what if we hypothesize that perhaps there's a single flexible algorithm that can do all of these things
35
00:02:56,959 --> 00:03:00,159
perhaps not only is learning the basis of intelligence
36
00:03:00,159 --> 00:03:04,239
but instead but in fact learning with a single powerful algorithm as the basis of intelligence
37
00:03:04,239 --> 00:03:07,360
it's a very provocative notion but it's also a very appealing one
38
00:03:07,360 --> 00:03:09,440
because it suggests that we could save ourselves a lot of work
39
00:03:09,440 --> 00:03:18,000
instead of designing a separate algorithm for each module we simply design one algorithm that is broad enough and flexible enough to acquire all these capabilities
40
00:03:18,000 --> 00:03:25,040
and there is a bit of evidence a bit of circumstantial evidence to suggest that something like this might in fact be close to what's going on in the real brain
41
00:03:25,040 --> 00:03:33,680
these are these pieces of evidence they all have the general flavor of illustrating some degree of flexibility that is unusual or unexpected
42
00:03:33,680 --> 00:03:41,120
for instance you could acquire this degree of visual acuity by using your tongue you could take a camera with electrodes attached to it
43
00:03:41,120 --> 00:03:42,799
and place those electrodes on your tongue
44
00:03:42,799 --> 00:03:44,640
and then close your eyes
45
00:03:44,640 --> 00:03:47,200
or if you're blind try to use your tongue to see
46
00:03:47,200 --> 00:03:49,599
and then perform some tests of visual acuity
47
00:03:49,599 --> 00:03:54,720
and you will in fact with a fair bit of practice acquire degree of visual acuity with your tongue
48
00:03:54,720 --> 00:04:10,319
a more extreme experiment this is sometimes referred to as the thera rewiring experiment was performed on animals on fairness where the ferret's optic nerve was disconnected from the visual cortex and reconnected to the auditory cortex surgically when the fair was very very young
49
00:04:10,319 --> 00:04:11,519
then the ferret grew up
50
00:04:11,519 --> 00:04:13,280
and over the course of its development
51
00:04:13,280 --> 00:04:16,239
it actually recovered a degree of visual acuity
52
00:04:16,239 --> 00:04:19,759
which means that its auditory cortex was essentially learning how to see
53
00:04:19,759 --> 00:04:23,759
so if different sensor cortexes can be repurposed to perform each other's jobs
54
00:04:23,759 --> 00:04:28,320
perhaps in some sense they're all implementing the same flexible algorithm
55
00:04:28,320 --> 00:04:29,840
we could take this idea even further
56
00:04:29,840 --> 00:04:32,320
and hypothesize that not only sensory cortices
57
00:04:32,320 --> 00:04:36,880
but a lot of the functionality of the brain could in fact be performed by a single flexible algorithm
58
00:04:36,880 --> 00:04:39,919
we don't know if this is true but it's a very appealing notion
59
00:04:39,919 --> 00:04:43,680
if in fact this is true we could ask what kind of algorithm could it be
60
00:04:43,680 --> 00:04:46,800
what must the single algorithm be able to do
61
00:04:46,800 --> 00:04:53,520
it has to be able to interpret rich sensory inputs it has to deal with complex rich open world problems
62
00:04:53,520 --> 00:05:03,360
and it has to choose complex actions which means it has to reason about decision making and control where have we seen that before well these are the two parts of deep reinforcement learning
63
00:05:03,360 --> 00:05:16,000
the deep deals with handling complex open world inputs and the reinforcement learning provides the formalism for decision making and control
64
00:05:16,000 --> 00:05:27,520
and in fact there is again some circumstantial evidence that both deep learning and reinforcement learning at least individually provide some sensible model of how the brain processes information
65
00:05:27,520 --> 00:05:40,960
so this is a an older paper that's about a decade old at this point called unsupervised learning models of primer of primary cortical receptor fields and receptor field plasticity that tries to analyze the kind of features that are known to exist in the brain
66
00:05:40,960 --> 00:06:03,600
and compares them to the kind of features that are observed in primate sensor corpses for example they take a simple stimuli like the these kind of you know grading type stimuli which are known to stimulate individual uh receptive fields in the visual cortex
67
00:06:03,600 --> 00:06:08,720
and they analyze the kind of features that are learned from these stimuli by deep neural networks
68
00:06:08,720 --> 00:06:17,680
and then they compare the statistics of those features to features that are known to exist in the primate visual cortex from experiments on monkeys
69
00:06:17,680 --> 00:06:28,880
they do a similar experiment on auditory features so they expose a deep neural network to a range of auditory stimuli look at the statistics of the features that emerge and again compare those to the statistics of features in the brain
70
00:06:28,880 --> 00:06:38,800
and they even have kind of a funny experiment on the sense of touch where they take human subjects they have them manipulate an object with a glove that's been dusted with some white dust
71
00:06:38,800 --> 00:06:49,680
and they use where the dust was deposited on the glove to train a deep neural network to essentially represent touch sensing features
72
00:06:49,680 --> 00:06:57,919
and they compare them to features that are known to exist from experiments on monkeys where a monkey's hand is placed on a drum with indentations
73
00:06:57,919 --> 00:07:05,280
which rotates and then the monkey sense of touch is recorded from neurons in its brain and again they compare the statistics of these features
74
00:07:05,280 --> 00:07:08,960
and find that the neural network learn features with similar statistics
75
00:07:08,960 --> 00:07:12,400
now there are a few conclusions we might draw from these experiments
76
00:07:12,400 --> 00:07:18,880
for example we might conclude that the deep neural network it works the same way the brain works
77
00:07:18,880 --> 00:07:23,360
but i think there's actually a more simple explanation
78
00:07:23,360 --> 00:07:25,440
it's probably not about the deep neural network per se
79
00:07:25,440 --> 00:07:34,000
it's probably about the observation that any large heavily overparameterized model will discover features with these statistics
80
00:07:34,000 --> 00:07:35,599
because they're just the right features for this data
81
00:07:35,599 --> 00:07:39,199
so the features in some sense are perhaps a property of the data itself
82
00:07:39,199 --> 00:07:46,319
and a powerful enough model regardless of its internal design would acquire those features because they're the right ones
83
00:07:46,319 --> 00:07:52,560
there's also quite a bit of evidence in favor of reinforcement learning as a model of how the brain learns
84
00:07:52,560 --> 00:07:59,280
and in fact reinforced learning was studied in psychology and neuroscience long before it was a field of studying computer science
85
00:07:59,280 --> 00:08:06,000
so percepts that anticipate reward become associated with similar firing patterns as the reward itself this is a known observation
86
00:08:06,000 --> 00:08:09,360
sometimes referred to as spectrum dependent plasticity
87
00:08:09,360 --> 00:08:12,800
the basal ganglia appears to be related to the reward system in the brain
88
00:08:12,800 --> 00:08:22,000
and model 3rl like adaptation is often a good fit for experimental data of animal adaptation but not always
89
00:08:22,000 --> 00:08:33,919
all right so we might conclude from this i for some client that if in fact there is a single flexible algorithm that can acquire the broad range of behaviors that we associate with human intelligence
90
00:08:33,919 --> 00:08:37,839
that algorithm perhaps ought to look a little bit like a reinforcement learning algorithm
91
00:08:37,839 --> 00:08:43,519
and perhaps equipped with large high-capacity representations like deep models
92
00:08:43,519 --> 00:08:45,839
so then we could say well what can deep learning and rl do well now
93
00:08:45,839 --> 00:08:47,440
basically how close are we to that
94
00:08:47,440 --> 00:08:49,600
and what seems to be missing
95
00:08:49,600 --> 00:08:52,080
well current deep reinforcement learning algorithms are pretty good at somethings
96
00:08:52,080 --> 00:08:55,440
they're good at acquiring a high degree of proficiency
97
00:08:55,440 --> 00:08:59,920
and domains governed by simple known rules like board games and video games
98
00:08:59,920 --> 00:09:03,200
they're good at learning simple skills with raw sensory input
99
00:09:03,200 --> 00:09:05,279
given enough experience
100
00:09:05,279 --> 00:09:10,560
and they're pretty good at learning to imitate given enough human-provided expert behavior
101
00:09:10,560 --> 00:09:16,480
but they still fall short in a few very important ways compared to human intelligence or even animal intelligence
102
00:09:16,480 --> 00:09:18,720
humans can learn incredibly quickly
103
00:09:18,720 --> 00:09:21,279
deep reinforcement learning algorithms are not usually known for their efficiency
104
00:09:21,279 --> 00:09:24,240
they usually require a very large amount of experience
105
00:09:24,240 --> 00:09:28,240
and it may be because humans are very good at reusing past knowledge
106
00:09:28,240 --> 00:09:30,240
so humans can adapt quickly
107
00:09:30,240 --> 00:09:34,399
for instance this is a commonly done motor control experiment where a person moves a sliver
108
00:09:34,399 --> 00:09:37,680
a perturbation is introduced requiring the person to react
109
00:09:37,680 --> 00:09:46,800
and then the experimenter measures how many trials a human needs to learn how to overcome any perturbation introduced by the sliver in just a couple of trials
110
00:09:46,800 --> 00:09:55,600
but chances are humans aren't learning entirely from scratch they have past experience with physical manipulation moving their bodies and with responding to perturbation forces
111
00:09:55,600 --> 00:09:58,640
transfer learning in deep rl is still an open problem
112
00:09:58,640 --> 00:10:02,560
but its goal is to essentially address this capability
113
00:10:02,560 --> 00:10:06,160
it's also often not clear in reality what the reward function should be
114
00:10:06,160 --> 00:10:14,560
so in classical reinforcement learning we typically assume the reward function is known and it's correct but in the real world it's far less clear
115
00:10:14,560 --> 00:10:22,079
it's also not clear what the role of prediction should be so should we learn by modeling the entire world and then planning through that model
116
00:10:22,079 --> 00:10:24,079
or should we learn directly from trial and error or should
117
00:10:24,079 --> 00:10:27,760
we do a little bit of both
118
00:10:27,760 --> 00:10:30,800
but to sum all this up
119
00:10:30,800 --> 00:10:44,720
one of the things that deep reinforcement learning could potentially offer us is a way to think about the acquisition of intelligence in a more unified algorithmic way without having to think about designing individual algorithms or individual modules
120
00:10:44,720 --> 00:10:56,000
but this is not by any means a new idea going from this kind of modular very complex model to a simple learning driven model is something that is in some sense as old as computer science itself
121
00:10:56,000 --> 00:11:02,320
here's a quote that i like very much on this topic instead of trying to produce a program to simulate the adult mind
122
00:11:02,320 --> 00:11:11,680
why not rather try to produce one which simulates the child's if this were then subjected to an appropriate course of education one would obtain the adult brain
123
00:11:11,680 --> 00:11:13,600
who wrote this
124
00:11:13,600 --> 00:11:16,320
Alan Turing