-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
652 lines (380 loc) · 94.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>YesLiu的博客</title>
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:type" content="website">
<meta property="og:title" content="YesLiu的博客">
<meta property="og:url" content="https://zhizhou57.github.io/yes_liu.github.io/index.html">
<meta property="og:site_name" content="YesLiu的博客">
<meta property="og:locale" content="en_US">
<meta property="article:author" content="yes_liu">
<meta name="twitter:card" content="summary">
<link rel="alternate" href="/yes_liu.github.io/atom.xml" title="YesLiu的博客" type="application/atom+xml">
<link rel="shortcut icon" href="/yes_liu.github.io/favicon.png">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/index.min.css">
<link rel="stylesheet" href="/yes_liu.github.io/css/style.css">
<link rel="stylesheet" href="/yes_liu.github.io/fancybox/jquery.fancybox.min.css">
<meta name="generator" content="Hexo 6.3.0"></head>
<body>
<div id="container">
<div id="wrap">
<header id="header">
<div id="banner"></div>
<div id="header-outer" class="outer">
<div id="header-title" class="inner">
<h1 id="logo-wrap">
<a href="/yes_liu.github.io/" id="logo">YesLiu的博客</a>
</h1>
</div>
<div id="header-inner" class="inner">
<nav id="main-nav">
<a id="main-nav-toggle" class="nav-icon"><span class="fa fa-bars"></span></a>
<a class="main-nav-link" href="/yes_liu.github.io/">Home</a>
<a class="main-nav-link" href="/yes_liu.github.io/archives">Archives</a>
</nav>
<nav id="sub-nav">
<a class="nav-icon" href="/yes_liu.github.io/atom.xml" title="RSS Feed"><span class="fa fa-rss"></span></a>
<a class="nav-icon nav-search-btn" title="Search"><span class="fa fa-search"></span></a>
</nav>
<div id="search-form-wrap">
<form action="//google.com/search" method="get" accept-charset="UTF-8" class="search-form"><input type="search" name="q" class="search-form-input" placeholder="Search"><button type="submit" class="search-form-submit"></button><input type="hidden" name="sitesearch" value="https://zhizhou57.github.io/yes_liu.github.io"></form>
</div>
</div>
</div>
</header>
<div class="outer">
<section id="main">
<article id="post-可解释性-积分梯度算法" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/11/02/%E5%8F%AF%E8%A7%A3%E9%87%8A%E6%80%A7-%E7%A7%AF%E5%88%86%E6%A2%AF%E5%BA%A6%E7%AE%97%E6%B3%95/" class="article-date">
<time class="dt-published" datetime="2023-11-02T07:21:25.000Z" itemprop="datePublished">2023-11-02</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/11/02/%E5%8F%AF%E8%A7%A3%E9%87%8A%E6%80%A7-%E7%A7%AF%E5%88%86%E6%A2%AF%E5%BA%A6%E7%AE%97%E6%B3%95/">可解释性-积分梯度算法</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<p>最近阅读文章看到一篇可解释性的相关的,往上追溯到了积分梯度算法(Integrated Gradients),蛮有意思,写篇博客整理一下思路<br>参考论文:Axiomatic Attribution for Deep Networks <a target="_blank" rel="noopener" href="https://arxiv.org/abs/1703.01365">https://arxiv.org/abs/1703.01365</a></p>
<h1 id="motivation"><a href="#motivation" class="headerlink" title="motivation"></a>motivation</h1><p>这篇文章研究的是如何将模型的预测归因到模型的输入上,以对模型进行debug、提取规则以更好的使用模型。<br>先前的基于经验的归因评估方法,例如通过归因挑选Top k个像素,随机变化其值然后衡量得分下降的幅度。但是这种方法不够自然,因为模型可能没见过变化后的图像因此给出一个较低的得分。其他的经验评估技术都无法区分源于扰动数据的伪影、行为不当的模型和行为不当的归因方法。<br>因此本篇文章基于两个基本公理:Sensitivity和Implementation Invariance来设计自己的归因方法。</p>
<h1 id="两个基本公理"><a href="#两个基本公理" class="headerlink" title="两个基本公理"></a>两个基本公理</h1><h2 id="Sensitivity"><a href="#Sensitivity" class="headerlink" title="Sensitivity"></a>Sensitivity</h2><p>对于每个输入而言,如果baseline和与其在一个特征和预测值上不相同,那么这个特征应该被给予一个非零的归因。<br>用梯度作为归因,和Sensitivity是相违背的,例如函数$$ f(x) = 1 - ReLU(1-x) = [ f(x) = \begin{cases} x & \text{if } x < 1 \ 0 & \text{otherwise} \end{cases} ] $$ 当x大于等于1时,尽管此时的x与baseline不同,但是其梯度(即归因)为零。因此以梯度作为归因会导致focus到一些不相关的特征</p>
<h2 id="Implementation-Invariance"><a href="#Implementation-Invariance" class="headerlink" title="Implementation Invariance"></a>Implementation Invariance</h2><p>对于两个在功能上等价的神经网络(输入相同时网络输出相同,但实现方法可能不同),他们对于同一输入的归因也必须是相同的。<br>而基于梯度的归因是依赖于网络具体实现的,不满足该性质。</p>
<h1 id="Our-Method:Integrated-Gradients"><a href="#Our-Method:Integrated-Gradients" class="headerlink" title="Our Method:Integrated Gradients"></a>Our Method:Integrated Gradients</h1><p>该方法不直接使用梯度,而是对梯度进行积分。对于输入x和baseline x’,沿着第i维的积分梯度定义如下:<br><img src="/yes_liu.github.io/%E5%8F%AF%E8%A7%A3%E9%87%8A%E6%80%A7-%E7%A7%AF%E5%88%86%E6%A2%AF%E5%BA%A6%E7%AE%97%E6%B3%95/ig1.png" alt="img.png"><br>实际计算时选择一个路径进行积分,该论文选择直线,即<br><img src="/yes_liu.github.io/%E5%8F%AF%E8%A7%A3%E9%87%8A%E6%80%A7-%E7%A7%AF%E5%88%86%E6%A2%AF%E5%BA%A6%E7%AE%97%E6%B3%95/ig2.png" alt="img.png"><br>比较容易证明,这个方法是满足上面两个公理的<br>实验也发现,效果挺不错的,不愧是被引4000+的文章<br><img src="/yes_liu.github.io/%E5%8F%AF%E8%A7%A3%E9%87%8A%E6%80%A7-%E7%A7%AF%E5%88%86%E6%A2%AF%E5%BA%A6%E7%AE%97%E6%B3%95/img.png" alt="img.png"></p>
<p>算法核心代码大致如下:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">_compute_ig</span>(<span class="params">sess, input_tensors, embedding_tensor,</span></span><br><span class="line"><span class="params"> gradient_tensor, output_tensor, transformed_input_df,</span></span><br><span class="line"><span class="params"> baseline_df, num_reps</span>):</span><br><span class="line"> batch_size = <span class="number">20</span> <span class="comment"># keep small enough to ensure that we do not run out of</span></span><br><span class="line"> <span class="comment"># memory</span></span><br><span class="line"> num_reps = num_reps</span><br><span class="line"></span><br><span class="line"> tensor_values = sess.run(embedding_tensor,</span><br><span class="line"> _get_feed_dict(input_tensors,</span><br><span class="line"> transformed_input_df))</span><br><span class="line"></span><br><span class="line"> tensor_baseline_values = sess.run(embedding_tensor,</span><br><span class="line"> _get_feed_dict(input_tensors, baseline_df))</span><br><span class="line"></span><br><span class="line"> <span class="comment"># 计算</span></span><br><span class="line"> scaled_embeddings = _get_scaled_inputs(tensor_values[<span class="number">0</span>],</span><br><span class="line"> tensor_baseline_values[<span class="number">0</span>],</span><br><span class="line"> batch_size, num_reps)</span><br><span class="line"> scaled_input_feed = {}</span><br><span class="line"> <span class="keyword">for</span> key, tensor_info <span class="keyword">in</span> input_tensors.items():</span><br><span class="line"> scaled_input_feed[</span><br><span class="line"> get_tensor(sess, tensor_info.name)] = _get_unscaled_inputs(</span><br><span class="line"> transformed_input_df[key][<span class="number">0</span>], batch_size)</span><br><span class="line"></span><br><span class="line"> scores = []</span><br><span class="line"> path_gradients = []</span><br><span class="line"></span><br><span class="line"> <span class="comment"># 积分值估计计算</span></span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(num_reps):</span><br><span class="line"> scaled_input_feed[embedding_tensor] = scaled_embeddings[i]</span><br><span class="line"> path_gradients_rep, scores_rep = sess.run(</span><br><span class="line"> [gradient_tensor, output_tensor[:, <span class="number">1</span>]], scaled_input_feed)</span><br><span class="line"> path_gradients.append(path_gradients_rep[<span class="number">0</span>])</span><br><span class="line"> scores.append(scores_rep)</span><br><span class="line"></span><br><span class="line"> baseline_prediction = scores[<span class="number">0</span>][</span><br><span class="line"> <span class="number">0</span>] <span class="comment"># first score is the baseline prediction</span></span><br><span class="line"> prediction = scores[-<span class="number">1</span>][-<span class="number">1</span>] <span class="comment"># last score is the input prediction</span></span><br><span class="line"></span><br><span class="line"> <span class="comment"># integrating the gradients and multiplying with the difference of the</span></span><br><span class="line"> <span class="comment"># baseline and input.</span></span><br><span class="line"> ig = np.concatenate(path_gradients, axis=<span class="number">0</span>)</span><br><span class="line"> integral = _calculate_integral(ig)</span><br><span class="line"> integrated_gradients = (tensor_values[<span class="number">0</span>] - tensor_baseline_values[</span><br><span class="line"> <span class="number">0</span>]) * integral</span><br><span class="line"> integrated_gradients = np.<span class="built_in">sum</span>(integrated_gradients, axis=-<span class="number">1</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> integrated_gradients, baseline_prediction, prediction</span><br><span class="line"></span><br><span class="line"><span class="comment"># 获取积分路径中各点</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">_get_scaled_inputs</span>(<span class="params">input_val, baseline_val, batch_size, num_reps</span>):</span><br><span class="line"> list_scaled_embeddings = []</span><br><span class="line"> scaled_embeddings = \</span><br><span class="line"> [baseline_val + (<span class="built_in">float</span>(i) / (num_reps * batch_size - <span class="number">1</span>)) *</span><br><span class="line"> (input_val - baseline_val) <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">0</span>, num_reps * batch_size)]</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="built_in">range</span>(num_reps):</span><br><span class="line"> list_scaled_embeddings.append(</span><br><span class="line"> np.array(scaled_embeddings[i * batch_size:i * batch_size +</span><br><span class="line"> batch_size]))</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> list_scaled_embeddings</span><br></pre></td></tr></table></figure>
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/11/02/%E5%8F%AF%E8%A7%A3%E9%87%8A%E6%80%A7-%E7%A7%AF%E5%88%86%E6%A2%AF%E5%BA%A6%E7%AE%97%E6%B3%95/" data-id="clohdkyvz0002curad41aakk2" data-title="可解释性-积分梯度算法" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-NLP-Metric" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/10/23/NLP-Metric/" class="article-date">
<time class="dt-published" datetime="2023-10-23T10:38:27.000Z" itemprop="datePublished">2023-10-23</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/10/23/NLP-Metric/">NLP-Metric</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/10/23/NLP-Metric/" data-id="clohdkyvz0001curafgr7dx2o" data-title="NLP-Metric" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-GPT系列模型版本演进" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/10/23/GPT%E7%B3%BB%E5%88%97%E6%A8%A1%E5%9E%8B%E7%89%88%E6%9C%AC%E6%BC%94%E8%BF%9B/" class="article-date">
<time class="dt-published" datetime="2023-10-23T07:00:06.000Z" itemprop="datePublished">2023-10-23</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/10/23/GPT%E7%B3%BB%E5%88%97%E6%A8%A1%E5%9E%8B%E7%89%88%E6%9C%AC%E6%BC%94%E8%BF%9B/">GPT系列模型版本演进</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/10/23/GPT%E7%B3%BB%E5%88%97%E6%A8%A1%E5%9E%8B%E7%89%88%E6%9C%AC%E6%BC%94%E8%BF%9B/" data-id="clohdkyvx0000curahiwbdi72" data-title="GPT系列模型版本演进" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-大模型评测综述" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/10/20/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E8%AF%84%E6%B5%8B%E7%BB%BC%E8%BF%B0/" class="article-date">
<time class="dt-published" datetime="2023-10-20T01:55:02.000Z" itemprop="datePublished">2023-10-20</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/10/20/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E8%AF%84%E6%B5%8B%E7%BB%BC%E8%BF%B0/">大模型评测综述</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<h1 id="1-引言"><a href="#1-引言" class="headerlink" title="1.引言"></a>1.引言</h1><p>大模型的evaluation很重要,体现在一下几点:</p>
<ol>
<li>更好的理解模型的优势和劣势</li>
<li>更好的评测可以引导未来人与大模型交互的发展</li>
<li>大模型的广泛应用凸显了确保其安全性和可靠的的重要性,尤其是在金融机构和医疗机构等安全敏感领域</li>
<li>随着大模型越来越大,能力越来越多,已有的评测方法不能满足能力和安全性的需要</li>
</ol>
<h1 id="2-What-to-evaluate"><a href="#2-What-to-evaluate" class="headerlink" title="2. What to evaluate"></a>2. What to evaluate</h1><h2 id="自然语言理解任务"><a href="#自然语言理解任务" class="headerlink" title="自然语言理解任务"></a>自然语言理解任务</h2><h3 id="1-情感分析"><a href="#1-情感分析" class="headerlink" title="1.情感分析"></a>1.情感分析</h3><p>情感分析任务分析并解释文本中的情感倾向,通常是二分类(积极/消极)或三分类(积极/中性/消极)的经典分类问题。<br>ChatGPT在情感分析上的表现超过传统情感分析方法:在低资源场景下大模型比小模型表现好,但是理解低资源语言的能力依旧受限。</p>
<h3 id="2-文本分类"><a href="#2-文本分类" class="headerlink" title="2.文本分类"></a>2.文本分类</h3><p>文本分类与情感分析类似,不局限于情感而是所有的文本和任务。<br>大模型中分类任务上表现很好,包括一些非传统的任务。</p>
<h3 id="3-自然语言推理"><a href="#3-自然语言推理" class="headerlink" title="3.自然语言推理"></a>3.自然语言推理</h3><p>自然语言推理判断给定的“假设”是否遵从给定的“前提”。研究表明,大模型在NLI范围内表现不佳,并且在代表人类分歧方面进一步失败,这表明LLM在该领域仍有很大的改进空间。</p>
<h3 id="4-语义理解"><a href="#4-语义理解" class="headerlink" title="4.语义理解"></a>4.语义理解</h3><p>语义理解指对语言含义的理解及其相关概念,包含对词、词组、句子及其关系的理解和解释。大模型对单个事件有一定的理解,但他们感知事件之间语义相似性的能力受到限制。在推理任务中,大模型在因果关系和意向关系中表现出强大的推理能力,但在其他关系类型中的表现相对较弱。在预测任务中,大模型展示了对未来事件较好的预测能力。<br>有研究探索了大模型的语义能力,发现其评测基础词组的能力很差。更进一步的,GPT-3.5和BARD不能区分有意义和无意义的词组,而是将所有瞎编的词组都认为是有意义的。GPT-4提升了很多,但是表现依旧远不如人类。<br>在社会知识理解领域,有人衡量了模型在学习并识别社会知识的能力,发现有监督微调的方法及时参数笑得多也比zero-shot的LLMs效果要好。</p>
<h2 id="推理任务"><a href="#推理任务" class="headerlink" title="推理任务"></a>推理任务</h2><p>如今推理任务可以分粗略分为数学推理、常识推理以及领域知识推理。ChatGPT比GPT-3.5的算术能力要强很多,然而他在数学推理上的能力需要提升:在符号推理任务中,ChatGPT表现的比GPT-3.5要差,这可能是因为ChatGPT倾向于生成不确定的回答,导致性能不佳。由于大模型在反事实条件的任务上表现不佳,表明目前大模型在抽象推理能力方面存在一定的局限性。在逻辑推理任务中,有人证明ChatGPT和GPT-4超出了传统的微调的方法,但是两个模型都在面对新的和超出分布的数据时表现不佳。同时ChatGPT也不比其他大模型(GPT-3.5、BARD)表现更好。对于多步推理,PaLM和Claude2是两个仅有的和GPT性能相似的模型。一些论文研究了ChatGPT在因果推理中的表现:ChatGPT在常识推理任务上普遍表现不佳,但相对优于非文本语义推理。同时,ChatGPT也缺乏空间推理能力,但表现出更好的时间推理能力。ChatGPT也在多步推理能力上表现不佳,这与其他LLM在复杂推理上的弱点类似。在专家领域推理,zero-shot的InstructGPT和Codex能够完成复杂的医学推理任务,但仍需进一步提升。<br>值得注意的是,以上结论大部分是在特定数据集上测试的。总的来说大模型展现了推理上可持续提升的潜力,仍需深入研究和优化。</p>
<h2 id="自然语言生成"><a href="#自然语言生成" class="headerlink" title="自然语言生成"></a>自然语言生成</h2><p>自然语言生成衡量大模型生成特定文本的能力,包括摘要、对话生成、机器翻译、QA。</p>
<h3 id="1-文本总结"><a href="#1-文本总结" class="headerlink" title="1.文本总结"></a>1.文本总结</h3><p>摘要是意在生成给定句子的精确总结的生成任务。对于ChatGPT,比较失望的是他有时会生成比输入文本更长的摘要,而且更像提取式的摘要,仍需要更深的提升。</p>
<h3 id="2-对话系统"><a href="#2-对话系统" class="headerlink" title="2.对话系统"></a>2.对话系统</h3><p>在面向任务的对话中,ChatGPT的表现尚可;然而,它在面临以下挑战时容易出错:长期多轮依赖、基本推理失败和外在幻觉。</p>
<h3 id="3-QA"><a href="#3-QA" class="headerlink" title="3.QA"></a>3.QA</h3><p>ChatGPT在传统QA上表现良好,在社会知识、常识等问题上不如有监督模型。</p>
<h2 id="多语言任务"><a href="#多语言任务" class="headerlink" title="多语言任务"></a>多语言任务</h2><p>ChatGPT在非拉丁语系和低资源语言中表现较差。</p>
<h2 id="事实性任务"><a href="#事实性任务" class="headerlink" title="事实性任务"></a>事实性任务</h2><p>大模型的事实性,指模型提供的信息是否与现实世界的事实相匹配,这影响很多任务和下游应用。这些实验表明增大模型参数并不会改善事实性,提供了一些衡量的数据集。</p>
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/10/20/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E8%AF%84%E6%B5%8B%E7%BB%BC%E8%BF%B0/" data-id="clohdkyw00003cura62i43i5x" data-title="大模型评测综述" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-旋转位置编码RoPE" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/09/24/%E6%97%8B%E8%BD%AC%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81RoPE/" class="article-date">
<time class="dt-published" datetime="2023-09-24T08:10:10.000Z" itemprop="datePublished">2023-09-24</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/09/24/%E6%97%8B%E8%BD%AC%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81RoPE/">旋转位置编码RoPE</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<p>旋转位置编码,来源于文章(ROFORMER: ENHANCED TRANSFORMER WITH ROTARY POSITION EMBEDDING),是一种通过构造特定的绝对位置编码使得相对位置信息能够介入到attention中去的一种技术<br>RoPE在LLAMA、PaLM等模型中得到了应用。<br>不过笔者在对比中发现,META Github仓库中的RoPE实现、Huggingface LLAMA中的RoPE实现、Huggingface Roformer中的RoPE实现竟然均不相同,代码上其实是有所不同的(不知道是有意修改还是无意写错),下面将详细介绍</p>
<h1 id="RoPE原理"><a href="#RoPE原理" class="headerlink" title="RoPE原理"></a>RoPE原理</h1><p>引用Roformer中比较精髓的一句话:</p>
<blockquote>
<p>Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation.</p>
</blockquote>
<blockquote>
<p>特别的,提出的RoPE方法引入了旋转矩阵来编码绝对位置信息,与此同时隐式的在自注意力机制中引入了相对位置信息</p>
</blockquote>
<p>具体是怎么做的呢,首先先看传统的Transformer的做法, 用x表示token embedding,q、k、v分别表示queries, keys, and value representations. f就是一个将token embedding和位置m(n)编码为qkv表示的一个函数<br><img src="/yes_liu.github.io/2023/09/24/%E6%97%8B%E8%BD%AC%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81RoPE/qkv.png" alt="img.png"></p>
<p>在传统的Transformer里呢,f做的就是把token embedding加上sinusoidal function形式的绝对位置编码p,然后通过一个线性层(即一个矩阵W)映射到qkv表示的空间<br><img src="/yes_liu.github.io/2023/09/24/%E6%97%8B%E8%BD%AC%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81RoPE/positionEmb.png" alt="img.png"></p>
<p>这篇论文的动机其实很难评价,作者在论文的Conclusion部分也说了,相对位置编码可以work这件事很难解释,而且作者苏神在知乎上写了这么一段话:<br><img src="/yes_liu.github.io/2023/09/24/%E6%97%8B%E8%BD%AC%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81RoPE/zhihu.png" alt="img.png"><br>然后这篇论文其实也未真正发表,甚至作者自己都不知道motivation该怎么写,但这个方法可能确实work吧,所以大家都用了<br>(其实写到这里我都快不想写这篇博客了,未发表的论文感觉还是有点难评价,不够成熟)<br>参见:<a target="_blank" rel="noopener" href="https://www.zhihu.com/question/450936573/answer/1797187035">https://www.zhihu.com/question/450936573/answer/1797187035</a></p>
<p>作者的想法就是,改一下这个f的形式(其实就相当于,每对x_i, x_i+1都进行一次在复数极坐标系中的旋转):<br><img src="/yes_liu.github.io/2023/09/24/%E6%97%8B%E8%BD%AC%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81RoPE/f.png" alt="img.png"><br>具体细节后续补充,挖个坑先。<br>然后这个f就可以等价于这个形式,这就是RoPE<br><img src="/yes_liu.github.io/2023/09/24/%E6%97%8B%E8%BD%AC%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81RoPE/f_simple.png" alt="img.png"></p>
<h1 id="RoPE实现"><a href="#RoPE实现" class="headerlink" title="RoPE实现"></a>RoPE实现</h1><p>我发现RoPE有三种不同的代码实现,很奇怪,下面一一介绍</p>
<p>第一种是META的LLAMA( <a target="_blank" rel="noopener" href="https://github.com/facebookresearch/llama">https://github.com/facebookresearch/llama</a> )</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"><span class="keyword">from</span> typing <span class="keyword">import</span> <span class="type">Tuple</span></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">precompute_freqs_cis</span>(<span class="params">dim: <span class="built_in">int</span>, end: <span class="built_in">int</span>, theta: <span class="built_in">float</span> = <span class="number">10000.0</span></span>):</span><br><span class="line"> freqs = <span class="number">1.0</span> / (theta ** (torch.arange(<span class="number">0</span>, dim, <span class="number">2</span>)[: (dim // <span class="number">2</span>)].<span class="built_in">float</span>() / dim))</span><br><span class="line"> t = torch.arange(end, device=freqs.device) <span class="comment"># type: ignore</span></span><br><span class="line"> freqs = torch.outer(t, freqs).<span class="built_in">float</span>() <span class="comment"># type: ignore</span></span><br><span class="line"> freqs_cis = torch.polar(torch.ones_like(freqs), freqs) <span class="comment"># complex64</span></span><br><span class="line"> <span class="keyword">return</span> freqs_cis</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">reshape_for_broadcast</span>(<span class="params">freqs_cis: torch.Tensor, x: torch.Tensor</span>):</span><br><span class="line"> ndim = x.ndim</span><br><span class="line"> <span class="keyword">assert</span> <span class="number">0</span> <= <span class="number">1</span> < ndim</span><br><span class="line"> <span class="keyword">assert</span> freqs_cis.shape == (x.shape[<span class="number">1</span>], x.shape[-<span class="number">1</span>])</span><br><span class="line"> shape = [d <span class="keyword">if</span> i == <span class="number">1</span> <span class="keyword">or</span> i == ndim - <span class="number">1</span> <span class="keyword">else</span> <span class="number">1</span> <span class="keyword">for</span> i, d <span class="keyword">in</span> <span class="built_in">enumerate</span>(x.shape)]</span><br><span class="line"> <span class="keyword">return</span> freqs_cis.view(*shape)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">apply_rotary_emb</span>(<span class="params"></span></span><br><span class="line"><span class="params"> xq: torch.Tensor,</span></span><br><span class="line"><span class="params"> xk: torch.Tensor,</span></span><br><span class="line"><span class="params"> freqs_cis: torch.Tensor,</span></span><br><span class="line"><span class="params"></span>) -> <span class="type">Tuple</span>[torch.Tensor, torch.Tensor]:</span><br><span class="line"> xq_ = torch.view_as_complex(xq.<span class="built_in">float</span>().reshape(*xq.shape[:-<span class="number">1</span>], -<span class="number">1</span>, <span class="number">2</span>))</span><br><span class="line"> xk_ = torch.view_as_complex(xk.<span class="built_in">float</span>().reshape(*xk.shape[:-<span class="number">1</span>], -<span class="number">1</span>, <span class="number">2</span>))</span><br><span class="line"> freqs_cis = reshape_for_broadcast(freqs_cis, xq_)</span><br><span class="line"> xq_out = torch.view_as_real(xq_ * freqs_cis).flatten(<span class="number">3</span>)</span><br><span class="line"> xk_out = torch.view_as_real(xk_ * freqs_cis).flatten(<span class="number">3</span>)</span><br><span class="line"> <span class="keyword">return</span> xq_out.type_as(xq), xk_out.type_as(xk)</span><br><span class="line"></span><br><span class="line"><span class="comment"># 获取修改后的query、key</span></span><br><span class="line">freqs = precompute_freqs_cis(head_dim, seq_len)</span><br><span class="line">xq_meta_llama, xk_meta_pyllama = apply_rotary_emb(xq, xk, freqs_cis=freqs)</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>相当于只要了原文中的第一项,即(x_1, …, x_d) * (cos(m<em>theta1), …, cos(m</em>thetad/2))</p>
<p>第二种是huggingface里的LLAMA实现</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> torch</span><br><span class="line"><span class="keyword">class</span> <span class="title class_">RotaryEmbedding</span>(torch.nn.Module):</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, dim, max_position_embeddings=<span class="number">2048</span>, base=<span class="number">10000</span>, device=<span class="literal">None</span></span>):</span><br><span class="line"> <span class="built_in">super</span>().__init__()</span><br><span class="line"> inv_freq = <span class="number">1.0</span> / (base ** (torch.arange(<span class="number">0</span>, dim, <span class="number">2</span>).<span class="built_in">float</span>().to(device) / dim))</span><br><span class="line"> self.register_buffer(<span class="string">"inv_freq"</span>, inv_freq)</span><br><span class="line"></span><br><span class="line"> <span class="comment"># Build here to make `torch.jit.trace` work.</span></span><br><span class="line"> self.max_seq_len_cached = max_position_embeddings</span><br><span class="line"> t = torch.arange(</span><br><span class="line"> self.max_seq_len_cached,</span><br><span class="line"> device=self.inv_freq.device,</span><br><span class="line"> dtype=self.inv_freq.dtype,</span><br><span class="line"> )</span><br><span class="line"> freqs = torch.einsum(<span class="string">"i,j->ij"</span>, t, self.inv_freq)</span><br><span class="line"> <span class="comment"># Different from paper, but it uses a different permutation in order to obtain the same calculation</span></span><br><span class="line"> emb = torch.cat((freqs, freqs), dim=-<span class="number">1</span>)</span><br><span class="line"> self.cos_cached = emb.cos()[<span class="literal">None</span>, <span class="literal">None</span>, :, :]</span><br><span class="line"> self.sin_cached = emb.sin()[<span class="literal">None</span>, <span class="literal">None</span>, :, :]</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x, seq_len=<span class="literal">None</span></span>):</span><br><span class="line"> <span class="comment"># x: [bs, num_attention_heads, seq_len, head_size]</span></span><br><span class="line"> <span class="comment"># This `if` block is unlikely to be run after we build sin/cos in `__init__`. Keep the logic here just in case.</span></span><br><span class="line"> <span class="keyword">if</span> seq_len > self.max_seq_len_cached:</span><br><span class="line"> self.max_seq_len_cached = seq_len</span><br><span class="line"> t = torch.arange(</span><br><span class="line"> self.max_seq_len_cached, device=x.device, dtype=self.inv_freq.dtype</span><br><span class="line"> )</span><br><span class="line"> freqs = torch.einsum(<span class="string">"i,j->ij"</span>, t, self.inv_freq)</span><br><span class="line"> <span class="comment"># Different from paper, but it uses a different permutation in order to obtain the same calculation</span></span><br><span class="line"> emb = torch.cat((freqs, freqs), dim=-<span class="number">1</span>).to(x.device)</span><br><span class="line"> self.cos_cached = emb.cos()[<span class="literal">None</span>, <span class="literal">None</span>, :, :].to(dtype=x.dtype)</span><br><span class="line"> self.sin_cached = emb.sin()[<span class="literal">None</span>, <span class="literal">None</span>, :, :].to(dtype=x.dtype)</span><br><span class="line"> <span class="keyword">return</span> (</span><br><span class="line"> self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype, device=x.device),</span><br><span class="line"> self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype, device=x.device),</span><br><span class="line"> )</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">rotate_half</span>(<span class="params">x</span>):</span><br><span class="line"> <span class="string">"""Rotates half the hidden dims of the input."""</span></span><br><span class="line"> x1 = x[..., : x.shape[-<span class="number">1</span>] // <span class="number">2</span>]</span><br><span class="line"> x2 = x[..., x.shape[-<span class="number">1</span>] // <span class="number">2</span> :]</span><br><span class="line"> <span class="built_in">print</span>(x1)</span><br><span class="line"> <span class="built_in">print</span>(x2)</span><br><span class="line"> <span class="built_in">print</span>(torch.cat((-x2, x1), dim=-<span class="number">1</span>))</span><br><span class="line"> <span class="keyword">return</span> torch.cat((-x2, x1), dim=-<span class="number">1</span>)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">apply_rotary_pos_emb</span>(<span class="params">q, k, cos, sin, offset: <span class="built_in">int</span> = <span class="number">0</span></span>):</span><br><span class="line"> cos = cos[..., offset : q.shape[-<span class="number">2</span>] + offset, :]</span><br><span class="line"> sin = sin[..., offset : q.shape[-<span class="number">2</span>] + offset, :]</span><br><span class="line"> q_embed = (q * cos) + (rotate_half(q) * sin)</span><br><span class="line"> k_embed = (k * cos) + (rotate_half(k) * sin)</span><br><span class="line"> <span class="keyword">return</span> q_embed, k_embed</span><br></pre></td></tr></table></figure>
<p>相当于第二项变成了(x_1, …x_d/2, -x_d/2, …, -x_d) * (cos(m<em>theta1), …, cos(m</em>thetad/2))</p>
<p>最后一种就是 huggingface中的Roformer了, 与原文公式一致</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">apply_rotary_position_embeddings</span>(<span class="params">sinusoidal_pos, query_layer, key_layer, value_layer=<span class="literal">None</span></span>):</span><br><span class="line"> <span class="comment"># https://kexue.fm/archives/8265</span></span><br><span class="line"> <span class="comment"># sin [batch_size, num_heads, sequence_length, embed_size_per_head//2]</span></span><br><span class="line"> <span class="comment"># cos [batch_size, num_heads, sequence_length, embed_size_per_head//2]</span></span><br><span class="line"> sin, cos = sinusoidal_pos.chunk(<span class="number">2</span>, dim=-<span class="number">1</span>)</span><br><span class="line"> <span class="comment"># sin [θ0,θ1,θ2......θd/2-1] -> sin_pos [θ0,θ0,θ1,θ1,θ2,θ2......θd/2-1,θd/2-1]</span></span><br><span class="line"> sin_pos = torch.stack([sin, sin], dim=-<span class="number">1</span>).reshape_as(sinusoidal_pos)</span><br><span class="line"> <span class="comment"># cos [θ0,θ1,θ2......θd/2-1] -> cos_pos [θ0,θ0,θ1,θ1,θ2,θ2......θd/2-1,θd/2-1]</span></span><br><span class="line"> cos_pos = torch.stack([cos, cos], dim=-<span class="number">1</span>).reshape_as(sinusoidal_pos)</span><br><span class="line"> <span class="comment"># rotate_half_query_layer [-q1,q0,-q3,q2......,-qd-1,qd-2]</span></span><br><span class="line"> rotate_half_query_layer = torch.stack([-query_layer[..., <span class="number">1</span>::<span class="number">2</span>], query_layer[..., ::<span class="number">2</span>]], dim=-<span class="number">1</span>).reshape_as(</span><br><span class="line"> query_layer</span><br><span class="line"> )</span><br><span class="line"> query_layer = query_layer * cos_pos + rotate_half_query_layer * sin_pos</span><br><span class="line"> <span class="comment"># rotate_half_key_layer [-k1,k0,-k3,k2......,-kd-1,kd-2]</span></span><br><span class="line"> rotate_half_key_layer = torch.stack([-key_layer[..., <span class="number">1</span>::<span class="number">2</span>], key_layer[..., ::<span class="number">2</span>]], dim=-<span class="number">1</span>).reshape_as(key_layer)</span><br><span class="line"> key_layer = key_layer * cos_pos + rotate_half_key_layer * sin_pos</span><br><span class="line"> <span class="keyword">if</span> value_layer <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line"> <span class="comment"># rotate_half_value_layer [-v1,v0,-v3,v2......,-vd-1,vd-2]</span></span><br><span class="line"> rotate_half_value_layer = torch.stack([-value_layer[..., <span class="number">1</span>::<span class="number">2</span>], value_layer[..., ::<span class="number">2</span>]], dim=-<span class="number">1</span>).reshape_as(</span><br><span class="line"> value_layer</span><br><span class="line"> )</span><br><span class="line"> value_layer = value_layer * cos_pos + rotate_half_value_layer * sin_pos</span><br><span class="line"> <span class="keyword">return</span> query_layer, key_layer, value_layer</span><br><span class="line"> <span class="keyword">return</span> query_layer, key_layer</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">RoFormerSinusoidalPositionalEmbedding</span>(nn.Embedding):</span><br><span class="line"> <span class="string">"""This module produces sinusoidal positional embeddings of any length."""</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, num_positions: <span class="built_in">int</span>, embedding_dim: <span class="built_in">int</span>, padding_idx: <span class="type">Optional</span>[<span class="built_in">int</span>] = <span class="literal">None</span></span>) -> <span class="literal">None</span>:</span><br><span class="line"> <span class="built_in">super</span>().__init__(num_positions, embedding_dim)</span><br><span class="line"> self.weight = self._init_weight(self.weight)</span><br><span class="line"></span><br><span class="line"><span class="meta"> @staticmethod</span></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">_init_weight</span>(<span class="params">out: nn.Parameter</span>) -> nn.Parameter:</span><br><span class="line"> <span class="string">"""</span></span><br><span class="line"><span class="string"> Identical to the XLM create_sinusoidal_embeddings except features are not interleaved. The cos features are in</span></span><br><span class="line"><span class="string"> the 2nd half of the vector. [dim // 2:]</span></span><br><span class="line"><span class="string"> """</span></span><br><span class="line"> n_pos, dim = out.shape</span><br><span class="line"> position_enc = np.array(</span><br><span class="line"> [[pos / np.power(<span class="number">10000</span>, <span class="number">2</span> * (j // <span class="number">2</span>) / dim) <span class="keyword">for</span> j <span class="keyword">in</span> <span class="built_in">range</span>(dim)] <span class="keyword">for</span> pos <span class="keyword">in</span> <span class="built_in">range</span>(n_pos)]</span><br><span class="line"> )</span><br><span class="line"> out.requires_grad = <span class="literal">False</span> <span class="comment"># set early to avoid an error in pytorch-1.8+</span></span><br><span class="line"> sentinel = dim // <span class="number">2</span> <span class="keyword">if</span> dim % <span class="number">2</span> == <span class="number">0</span> <span class="keyword">else</span> (dim // <span class="number">2</span>) + <span class="number">1</span></span><br><span class="line"> out[:, <span class="number">0</span>:sentinel] = torch.FloatTensor(np.sin(position_enc[:, <span class="number">0</span>::<span class="number">2</span>]))</span><br><span class="line"> out[:, sentinel:] = torch.FloatTensor(np.cos(position_enc[:, <span class="number">1</span>::<span class="number">2</span>]))</span><br><span class="line"> out.detach_()</span><br><span class="line"> <span class="keyword">return</span> out</span><br><span class="line"></span><br><span class="line"><span class="meta"> @torch.no_grad()</span></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, input_ids_shape: torch.Size, past_key_values_length: <span class="built_in">int</span> = <span class="number">0</span></span>) -> torch.Tensor:</span><br><span class="line"> <span class="string">"""`input_ids_shape` is expected to be [bsz x seqlen]."""</span></span><br><span class="line"> bsz, seq_len = input_ids_shape[:<span class="number">2</span>]</span><br><span class="line"> positions = torch.arange(</span><br><span class="line"> past_key_values_length, past_key_values_length + seq_len, dtype=torch.long, device=self.weight.device</span><br><span class="line"> )</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">super</span>().forward(positions)</span><br></pre></td></tr></table></figure>
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/09/24/%E6%97%8B%E8%BD%AC%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81RoPE/" data-id="clmxpf9s9000080raekrj6d0k" data-title="旋转位置编码RoPE" class="article-share-link"><span class="fa fa-share">Share</span></a>
<ul class="article-tag-list" itemprop="keywords"><li class="article-tag-list-item"><a class="article-tag-list-link" href="/yes_liu.github.io/tags/LLAMA/" rel="tag">LLAMA</a></li><li class="article-tag-list-item"><a class="article-tag-list-link" href="/yes_liu.github.io/tags/RoPE/" rel="tag">RoPE</a></li></ul>
</footer>
</div>
</article>
<article id="post-设计模式系列-桥接模式" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/09/14/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E6%A1%A5%E6%8E%A5%E6%A8%A1%E5%BC%8F/" class="article-date">
<time class="dt-published" datetime="2023-09-14T04:33:39.000Z" itemprop="datePublished">2023-09-14</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/09/14/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E6%A1%A5%E6%8E%A5%E6%A8%A1%E5%BC%8F/">设计模式系列-桥接模式</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<p>首先介绍下类的功能层次和类的实现层次:<br>类的功能层次是指,对类进行继承后进行的功能拓展,例如Car(车类),所有车都有启动和停止方法以及转弯等方法,但是现在我有一个特殊的车需要在Car车类的基础上加一个倒车影像功能,此时只需要继承Car类再自己的类中加一个倒车影像的方法即可。<br>类的实现层次是指,对类只进行继承和方法实现,而不新增方法。比如不同的车都有鸣笛功能,但是鸣笛方式需要不同车来实现。</p>
<p>桥接模式适用于这样一种场景:有一个父类汽车,然后还有多个汽车品牌如宝马、奔驰、奥迪,每个品牌都有品牌信息,这是类的实现层次;同时每个品牌的车都会有有跑车、轿车、SUV三种品类,每种品类的汽车又有不同的功能,比如敞篷、越野等,这又是类的实现层次。这种情况下新增一个汽车品牌要增加该车对应的三个类,非常复杂。<br><img src="/yes_liu.github.io/2023/09/14/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E6%A1%A5%E6%8E%A5%E6%A8%A1%E5%BC%8F/bridge.png"><br>桥接模式是用于将类的功能层次与类的结构层次分离,其中所涉及的角色为:</p>
<ul>
<li>抽象化(Abstraction)角色:抽象化给出的定义,并保存一个对实现化对象的引用。</li>
<li>修正抽象化(RefinedAbstraction)角色:扩展抽象化角色,改变和修正父类对抽象化的定义。</li>
<li>实现化(Implementor)角色:这个角色给出实现化角色的接口,但不给出具体的实现。必须指出的是,这个接口不一定和抽象化角色的接口定义相同,实际上,这两个接口可以非常不一样。实现化角色应当只给出底层操作,而抽象化角色应当只给出基于底层操作的更高一层的操作。</li>
<li>具体实现化(ConcreteImplementor)角色:这个角色给出实现化角色接口的具体实现。</li>
</ul>
<p>说人话就是, 让修正抽象化角色在继承中通过增加方法实现功能层次,同时抽象化角色保持一个对实现化对象的引用;而具体实现化对象在继承中通过实现抽象方法完成实现层次的内容。通过这种组合将两种层次进行了桥接,减少了类的数目。</p>
<p>代码如下:</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 抽象化角色:品牌</span></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">interface</span> <span class="title class_">Brand</span>{</span><br><span class="line"> <span class="comment">/**</span></span><br><span class="line"><span class="comment"> * 品牌信息</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> <span class="keyword">void</span> <span class="title function_">info</span><span class="params">()</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title class_">BMW</span> <span class="keyword">implements</span> <span class="title class_">Brand</span>{</span><br><span class="line"> </span><br><span class="line"> <span class="meta">@override</span></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">void</span> <span class="title function_">info</span><span class="params">()</span>{</span><br><span class="line"> System.out.print(<span class="string">"宝马"</span>);</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title class_">Audi</span> <span class="keyword">implements</span> <span class="title class_">Brand</span>{</span><br><span class="line"> </span><br><span class="line"> <span class="meta">@override</span></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">void</span> <span class="title function_">info</span><span class="params">()</span>{</span><br><span class="line"> System.out.print(<span class="string">"奥迪"</span>);</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">abstract</span> <span class="keyword">class</span> <span class="title class_">Car</span>{</span><br><span class="line"></span><br><span class="line"> <span class="keyword">protected</span> Brand brand;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">public</span> <span class="title function_">Car</span><span class="params">(Brand brand)</span>{</span><br><span class="line"> <span class="built_in">this</span>.brand = brand;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">void</span> <span class="title function_">info</span><span class="params">()</span>{</span><br><span class="line"> brand.info();</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title class_">SportsCar</span> <span class="keyword">extends</span> <span class="title class_">Car</span>{</span><br><span class="line"> <span class="keyword">public</span> <span class="title function_">SportsCar</span><span class="params">(Brand brand)</span>{</span><br><span class="line"> <span class="built_in">super</span>(brand);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">void</span> <span class="title function_">info</span><span class="params">()</span>{</span><br><span class="line"> <span class="built_in">super</span>.info();</span><br><span class="line"> System.out.println(<span class="string">"跑车"</span>);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">void</span> <span class="title function_">convertile</span><span class="params">()</span>{</span><br><span class="line"> System.out.println(<span class="string">"打开敞篷"</span>);</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="comment">// 测试</span></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title class_">Test</span>{</span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title function_">main</span><span class="params">(String[] args)</span>{</span><br><span class="line"> <span class="type">Car</span> <span class="variable">car</span> <span class="operator">=</span> <span class="keyword">new</span> <span class="title class_">SportsCar</span>(<span class="keyword">new</span> <span class="title class_">BWM</span>());</span><br><span class="line"> car.info();</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>参考:<br><a target="_blank" rel="noopener" href="https://cloud.tencent.com/developer/article/1895612">https://cloud.tencent.com/developer/article/1895612</a><br><a target="_blank" rel="noopener" href="https://zhuanlan.zhihu.com/p/58903776">https://zhuanlan.zhihu.com/p/58903776</a></p>
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/09/14/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E6%A1%A5%E6%8E%A5%E6%A8%A1%E5%BC%8F/" data-id="clmn658po000f45ft8ahx33jo" data-title="设计模式系列-桥接模式" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-设计模式系列-代理模式" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/09/14/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E4%BB%A3%E7%90%86%E6%A8%A1%E5%BC%8F/" class="article-date">
<time class="dt-published" datetime="2023-09-14T01:57:15.000Z" itemprop="datePublished">2023-09-14</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/09/14/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E4%BB%A3%E7%90%86%E6%A8%A1%E5%BC%8F/">设计模式系列-代理模式</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<p>先讲静态代理模式,动态代理模式没太看懂….涉及到java的反射等语言特性,回头有时间再补吧</p>
<p>代理模式的主要角色如下。</p>
<p>抽象主题(Subject)类(业务接口类):通过接口或抽象类声明真实主题和代理对象实现的业务方法,服务端需要实现该方法。<br>真实主题(Real Subject)类(业务实现类):实现了抽象主题中的具体业务,是代理对象所代表的真实对象,是最终要引用的对象。<br>代理(Proxy)类:提供了与真实主题相同的接口,其内部含有对真实主题的引用,它可以访问、控制或扩展真实主题的功能。</p>
<h1 id="静态代理"><a href="#静态代理" class="headerlink" title="静态代理"></a>静态代理</h1><p>程序员事先创建好代理类或特定工具自动生成源代码再对其编译,在程序运行前代理类的 .class 文件就已经存在了。</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">//业务接口</span></span><br><span class="line"><span class="keyword">interface</span> <span class="title class_">DateService</span> {</span><br><span class="line"> <span class="keyword">void</span> <span class="title function_">add</span><span class="params">()</span>;</span><br><span class="line"> <span class="keyword">void</span> <span class="title function_">del</span><span class="params">()</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">DateServiceImplA</span> <span class="keyword">implements</span> <span class="title class_">DateService</span> {</span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">void</span> <span class="title function_">add</span><span class="params">()</span> {</span><br><span class="line"> System.out.println(<span class="string">"成功添加!"</span>);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">void</span> <span class="title function_">del</span><span class="params">()</span> {</span><br><span class="line"> System.out.println(<span class="string">"成功删除!"</span>);</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">DateServiceProxy</span> <span class="keyword">implements</span> <span class="title class_">DateService</span> {</span><br><span class="line"> DateService server;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">public</span> <span class="title function_">DateServiceProxy</span><span class="params">(DateService server)</span> {</span><br><span class="line"> <span class="built_in">this</span>.server = server;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">void</span> <span class="title function_">add</span><span class="params">()</span> {</span><br><span class="line"> server.add();</span><br><span class="line"> System.out.println(<span class="string">"程序执行add方法,记录日志."</span>);</span><br><span class="line"> }</span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">void</span> <span class="title function_">del</span><span class="params">()</span> {</span><br><span class="line"> server.del();</span><br><span class="line"> System.out.println(<span class="string">"程序执行del方法,记录日志."</span>);</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="comment">//客户端</span></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title class_">Test</span> {</span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title function_">main</span><span class="params">(String[] args)</span> {</span><br><span class="line"> <span class="type">DateService</span> <span class="variable">service</span> <span class="operator">=</span> <span class="keyword">new</span> <span class="title class_">DateServiceProxy</span>();</span><br><span class="line"> service.add();</span><br><span class="line"> service.del();</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>这样可以在不改变程序原有代码的情况下,扩展了一些功能!同时一个代理就可以同时代理多个实现了同一个业务接口的业务,但这种方式必须要求客户端传入一个具体的实现类(这样客户就必须要获得具体目标对象实例,目标对象就直接暴露在访问对象面前了,对于某些情况这是不可接受的)</p>
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/09/14/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E4%BB%A3%E7%90%86%E6%A8%A1%E5%BC%8F/" data-id="clmn658pm000a45ftb2mm2bpd" data-title="设计模式系列-代理模式" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-设计模式系列-原型模式" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/09/13/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E5%8E%9F%E5%9E%8B%E6%A8%A1%E5%BC%8F/" class="article-date">
<time class="dt-published" datetime="2023-09-13T06:24:52.000Z" itemprop="datePublished">2023-09-13</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/09/13/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E5%8E%9F%E5%9E%8B%E6%A8%A1%E5%BC%8F/">设计模式系列-原型模式</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<p>原型模式的类做了这样一个事情:该类可以创建当前对象的一个克隆,从而规避某些情况下直接创建对象代价比较大的问题。(某些情况下,通过new 去创建一个对象,需要非常繁琐的步骤,如:数据准备和检查访问权限等。使用原型模式可以简化这些操作。)<br>例如,一个对象需要在一个高代价的数据库操作之后被创建。我们可以缓存该对象,在下一个请求时返回它的克隆,在需要的时候更新数据库,以此来减少数据库调用。</p>
<p>在java中, 实现原型模式的步骤:</p>
<p>第一步: 原型类Prototype实现Cloneable接口,<br>第二步: 重写Object的clone()方法,对于不同属性,选择深拷贝or浅拷贝<br>第三步: 在目标类也就是PrototypeTest类型调用Prototype类的clone方法, 实现对象的复制.</p>
<figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title class_">Prototype</span> <span class="keyword">implements</span> <span class="title class_">Cloneable</span>{</span><br><span class="line"></span><br><span class="line"> <span class="keyword">private</span> String name;</span><br><span class="line"> <span class="keyword">private</span> <span class="type">int</span> age;</span><br><span class="line"> <span class="keyword">private</span> String sex;</span><br><span class="line"> <span class="keyword">private</span> ArrayList<String> hobbies;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">public</span> <span class="title function_">Prototype</span><span class="params">(String name, <span class="type">int</span> age, String sex, ArrayList<String> hobbies)</span> {</span><br><span class="line"> <span class="built_in">this</span>.name = name;</span><br><span class="line"> <span class="built_in">this</span>.age = age;</span><br><span class="line"> <span class="built_in">this</span>. sex = sex;</span><br><span class="line"> <span class="built_in">this</span>.hobbies = hobbies;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">/**</span></span><br><span class="line"><span class="comment"> * 重写object的clone()方法, 并将其作用域设置为public</span></span><br><span class="line"><span class="comment"> * <span class="doctag">@return</span></span></span><br><span class="line"><span class="comment"> * <span class="doctag">@throws</span> CloneNotSupportedException</span></span><br><span class="line"><span class="comment"> */</span></span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="keyword">public</span> Prototype <span class="title function_">clone</span><span class="params">()</span> <span class="keyword">throws</span> CloneNotSupportedException {</span><br><span class="line"> <span class="type">Prototype</span> <span class="variable">clone</span> <span class="operator">=</span> (Prototype)<span class="built_in">super</span>.clone();</span><br><span class="line"> System.out.println(<span class="string">"浅拷贝:"</span> + (clone.hobbies == <span class="built_in">this</span>.hobbies));</span><br><span class="line"> clone.hobbies = (ArrayList<String>) (<span class="built_in">this</span>.hobbies).clone();</span><br><span class="line"> System.out.println(<span class="string">"深拷贝:"</span> + (clone.hobbies == <span class="built_in">this</span>.hobbies));</span><br><span class="line"> <span class="keyword">return</span> clone;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="meta">@Override</span></span><br><span class="line"> <span class="keyword">public</span> String <span class="title function_">toString</span><span class="params">()</span> {</span><br><span class="line"> <span class="keyword">return</span> <span class="string">"Prototype{"</span> +</span><br><span class="line"> <span class="string">"name='"</span> + name + <span class="string">'\''</span> +</span><br><span class="line"> <span class="string">", age="</span> + age +</span><br><span class="line"> <span class="string">", sex='"</span> + sex + <span class="string">'\''</span> +</span><br><span class="line"> <span class="string">'}'</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title class_">PrototypeTest</span> {</span><br><span class="line"></span><br><span class="line"> <span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title function_">main</span><span class="params">(String[] args)</span> <span class="keyword">throws</span> CloneNotSupportedException {</span><br><span class="line"> <span class="type">ArrayList</span> <span class="variable">hobbies</span> <span class="operator">=</span> <span class="keyword">new</span> <span class="title class_">ArrayList</span>();</span><br><span class="line"> hobbies.add(<span class="string">"篮球"</span>);</span><br><span class="line"> hobbies.add(<span class="string">"排期"</span>);</span><br><span class="line"> <span class="type">Prototype</span> <span class="variable">prototype</span> <span class="operator">=</span> <span class="keyword">new</span> <span class="title class_">Prototype</span>(<span class="string">"张三"</span>, <span class="number">8</span>, <span class="string">"男"</span>, hobbies);</span><br><span class="line"> <span class="type">Prototype</span> <span class="variable">cloneObject</span> <span class="operator">=</span> (Prototype)prototype.clone();</span><br><span class="line"></span><br><span class="line"> System.out.println(<span class="string">"比较克隆前后的对象:"</span>+(prototype == cloneObject));</span><br><span class="line"> System.out.println(<span class="string">"比较克隆前后的List<String>属性:"</span> + (prototype.getHobbies() == cloneObject.getHobbies()));</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/09/13/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E5%8E%9F%E5%9E%8B%E6%A8%A1%E5%BC%8F/" data-id="clmn658pm000b45ftdczb0tpr" data-title="设计模式系列-原型模式" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-设计模式系列-建造者模式" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/09/13/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E5%BB%BA%E9%80%A0%E8%80%85%E6%A8%A1%E5%BC%8F/" class="article-date">
<time class="dt-published" datetime="2023-09-13T02:55:22.000Z" itemprop="datePublished">2023-09-13</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/09/13/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E5%BB%BA%E9%80%A0%E8%80%85%E6%A8%A1%E5%BC%8F/">设计模式系列-建造者模式</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<p>建造者模式的使用场景:<br>当一个类的构造函数参数个数超过4个,而且这些参数有些是可选的参数时,由于构造函数的重载、参数列表的过长,会导致使用这个类进行实例化的时候容易错误传递参数值。<br>直接看一段代码实例:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Pizza</span>:</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, builder</span>):</span><br><span class="line"> self.garlic = builder.garlic</span><br><span class="line"> self.extra_cheese = builder.extra_cheese</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__str__</span>(<span class="params">self</span>):</span><br><span class="line"> garlic = <span class="string">'yes'</span> <span class="keyword">if</span> self.garlic <span class="keyword">else</span> <span class="string">'no'</span></span><br><span class="line"> cheese = <span class="string">'yes'</span> <span class="keyword">if</span> self.extra_cheese <span class="keyword">else</span> <span class="string">'no'</span></span><br><span class="line"> info = (<span class="string">'Garlic: {}'</span>.<span class="built_in">format</span>(garlic), <span class="string">'Extra cheese: {}'</span>.<span class="built_in">format</span>(cheese))</span><br><span class="line"> <span class="keyword">return</span> <span class="string">'\n'</span>.join(info)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">class</span> <span class="title class_">PizzaBuilder</span>:</span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self</span>):</span><br><span class="line"> self.extra_cheese = <span class="literal">False</span></span><br><span class="line"> self.garlic = <span class="literal">False</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">add_garlic</span>(<span class="params">self</span>):</span><br><span class="line"> self.garlic = <span class="literal">True</span></span><br><span class="line"> <span class="keyword">return</span> self</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">add_extra_cheese</span>(<span class="params">self</span>):</span><br><span class="line"> self.extra_cheese = <span class="literal">True</span></span><br><span class="line"> <span class="keyword">return</span> self</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">build</span>(<span class="params">self</span>):</span><br><span class="line"> <span class="keyword">return</span> Pizza(self)</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line"> pizza = Pizza.PizzaBuilder().add_garlic().add_extra_cheese().build()</span><br><span class="line"> <span class="built_in">print</span>(pizza)</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>这个实例中我们需要生产一个Pizza,Pizza有很多可能的配料,如garlic、cheese,通过一个内置类PizzaBuilder来完成Pizza的构建。这里建造者的每一个为其添加了属性的方法之后,都进行了return self的操作,得以实现链式的调用。</p>
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/09/13/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F%E7%B3%BB%E5%88%97-%E5%BB%BA%E9%80%A0%E8%80%85%E6%A8%A1%E5%BC%8F/" data-id="clmn658pn000c45ftgr43dtjv" data-title="设计模式系列-建造者模式" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-bert复现-详解PositionEmbedding" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/yes_liu.github.io/2023/09/02/bert%E5%A4%8D%E7%8E%B0-%E8%AF%A6%E8%A7%A3PositionEmbedding/" class="article-date">
<time class="dt-published" datetime="2023-09-02T02:30:12.000Z" itemprop="datePublished">2023-09-02</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/yes_liu.github.io/2023/09/02/bert%E5%A4%8D%E7%8E%B0-%E8%AF%A6%E8%A7%A3PositionEmbedding/">bert复现-详解PositionEmbedding</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<p>talk is cheap, show me the code</p>
<p>首先看transformer原文中的定义:<br><img src="/yes_liu.github.io/2023/09/02/bert%E5%A4%8D%E7%8E%B0-%E8%AF%A6%E8%A7%A3PositionEmbedding/embedding.png"></p>
<p>PE是一个(max_len, d_model)的Tensor,其中每个位置的值由如上公式定义,其中pos表示单词的位置,i是纬度。<br>代码实现上来说,先构建position,令其为torch.arange(0, max_len)的序列,然后增加维度。再构建分母部分的div_term, div_term = (torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)).exp()。<br>二者相乘后分别取sin、cos值进行填入即可</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">PositionalEmbedding</span>(nn.Module):</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">__init__</span>(<span class="params">self, d_model, max_len=<span class="number">512</span></span>):</span><br><span class="line"> <span class="built_in">super</span>().__init__()</span><br><span class="line"></span><br><span class="line"> <span class="comment"># Compute the positional encodings once in log space.</span></span><br><span class="line"> pe = torch.zeros(max_len, d_model).<span class="built_in">float</span>()</span><br><span class="line"> pe.require_grad = <span class="literal">False</span></span><br><span class="line"></span><br><span class="line"> position = torch.arange(<span class="number">0</span>, max_len).<span class="built_in">float</span>().unsqueeze(<span class="number">1</span>)</span><br><span class="line"> div_term = (torch.arange(<span class="number">0</span>, d_model, <span class="number">2</span>).<span class="built_in">float</span>() * -(math.log(<span class="number">10000.0</span>) / d_model)).exp()</span><br><span class="line"></span><br><span class="line"> pe[:, <span class="number">0</span>::<span class="number">2</span>] = torch.sin(position * div_term)</span><br><span class="line"> pe[:, <span class="number">1</span>::<span class="number">2</span>] = torch.cos(position * div_term)</span><br><span class="line"></span><br><span class="line"> pe = pe.unsqueeze(<span class="number">0</span>)</span><br><span class="line"> self.register_buffer(<span class="string">'pe'</span>, pe)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">def</span> <span class="title function_">forward</span>(<span class="params">self, x</span>):</span><br><span class="line"> <span class="keyword">return</span> self.pe[:, :x.size(<span class="number">1</span>)]</span><br></pre></td></tr></table></figure>
</div>
<footer class="article-footer">
<a data-url="https://zhizhou57.github.io/yes_liu.github.io/2023/09/02/bert%E5%A4%8D%E7%8E%B0-%E8%AF%A6%E8%A7%A3PositionEmbedding/" data-id="clmn658pk000345ft8168eywl" data-title="bert复现-详解PositionEmbedding" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<nav id="page-nav">
<span class="page-number current">1</span><a class="page-number" href="/yes_liu.github.io/page/2/">2</a><a class="page-number" href="/yes_liu.github.io/page/3/">3</a><a class="extend next" rel="next" href="/yes_liu.github.io/page/2/">Next »</a>
</nav>
</section>
<aside id="sidebar">
<div class="widget-wrap">
<h3 class="widget-title">Tags</h3>
<div class="widget">
<ul class="tag-list" itemprop="keywords"><li class="tag-list-item"><a class="tag-list-link" href="/yes_liu.github.io/tags/LLAMA/" rel="tag">LLAMA</a></li><li class="tag-list-item"><a class="tag-list-link" href="/yes_liu.github.io/tags/RoPE/" rel="tag">RoPE</a></li></ul>
</div>
</div>
<div class="widget-wrap">
<h3 class="widget-title">Tag Cloud</h3>
<div class="widget tagcloud">
<a href="/yes_liu.github.io/tags/LLAMA/" style="font-size: 10px;">LLAMA</a> <a href="/yes_liu.github.io/tags/RoPE/" style="font-size: 10px;">RoPE</a>
</div>
</div>
<div class="widget-wrap">
<h3 class="widget-title">Archives</h3>
<div class="widget">
<ul class="archive-list"><li class="archive-list-item"><a class="archive-list-link" href="/yes_liu.github.io/archives/2023/11/">November 2023</a></li><li class="archive-list-item"><a class="archive-list-link" href="/yes_liu.github.io/archives/2023/10/">October 2023</a></li><li class="archive-list-item"><a class="archive-list-link" href="/yes_liu.github.io/archives/2023/09/">September 2023</a></li><li class="archive-list-item"><a class="archive-list-link" href="/yes_liu.github.io/archives/2023/08/">August 2023</a></li></ul>
</div>
</div>
<div class="widget-wrap">
<h3 class="widget-title">Recent Posts</h3>
<div class="widget">
<ul>
<li>
<a href="/yes_liu.github.io/2023/11/02/%E5%8F%AF%E8%A7%A3%E9%87%8A%E6%80%A7-%E7%A7%AF%E5%88%86%E6%A2%AF%E5%BA%A6%E7%AE%97%E6%B3%95/">可解释性-积分梯度算法</a>
</li>
<li>
<a href="/yes_liu.github.io/2023/10/23/NLP-Metric/">NLP-Metric</a>
</li>
<li>
<a href="/yes_liu.github.io/2023/10/23/GPT%E7%B3%BB%E5%88%97%E6%A8%A1%E5%9E%8B%E7%89%88%E6%9C%AC%E6%BC%94%E8%BF%9B/">GPT系列模型版本演进</a>
</li>
<li>
<a href="/yes_liu.github.io/2023/10/20/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E8%AF%84%E6%B5%8B%E7%BB%BC%E8%BF%B0/">大模型评测综述</a>
</li>
<li>
<a href="/yes_liu.github.io/2023/09/24/%E6%97%8B%E8%BD%AC%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81RoPE/">旋转位置编码RoPE</a>
</li>
</ul>
</div>
</div>
</aside>
</div>
<footer id="footer">
<div class="outer">
<div id="footer-info" class="inner">
© 2023 yes_liu<br>
Powered by <a href="https://hexo.io/" target="_blank">Hexo</a>
</div>
</div>
</footer>
</div>
<nav id="mobile-nav">
<a href="/yes_liu.github.io/" class="mobile-nav-link">Home</a>
<a href="/yes_liu.github.io/archives" class="mobile-nav-link">Archives</a>
</nav>
<script src="/yes_liu.github.io/js/jquery-3.6.4.min.js"></script>
<script src="/yes_liu.github.io/fancybox/jquery.fancybox.min.js"></script>
<script src="/yes_liu.github.io/js/script.js"></script>
</div>
</body>
</html>