-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathrun_1030_1100.log
1072 lines (1043 loc) · 54.2 KB
/
run_1030_1100.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
Python 3.12.5
['▁<', '|', 'endo', 'f', 'text', '|', '>']
[39, 19, 37, 35, 38, 19, 40]
['▁Before', '▁we', '▁pro', 'ce', 'ed', '▁any', '▁further', ',', '▁hear', '▁me', '▁speak', '.']
torch.Size([512, 128]) torch.Size([512, 128])
The <|endoftext|> id is : 1
total params: 12,597,504
model size: 48.243MB
Training on cuda.
Epoch 1/2:
Step 1/953240 - LR:0.0016 - train_loss: 8.461Step 11/953240 - LR:0.0016 - train_loss: 5.991Step 21/953240 - LR:0.0017 - train_loss: 5.930Step 31/953240 - LR:0.0016 - train_loss: 5.904Step 41/953240 - LR:0.0017 - train_loss: 5.869Step 51/953240 - LR:0.0016 - train_loss: 5.687Step 61/953240 - LR:0.0017 - train_loss: 5.554Step 71/953240 - LR:0.0016 - train_loss: 5.549Step 81/953240 - LR:0.0017 - train_loss: 5.382Step 91/953240 - LR:0.0016 - train_loss: 5.237Step 101/953240 - LR:0.0016 - train_loss: 5.171Step 111/953240 - LR:0.0016 - train_loss: 5.132Step 121/953240 - LR:0.0017 - train_loss: 5.105Step 131/953240 - LR:0.0016 - train_loss: 5.046Step 141/953240 - LR:0.0017 - train_loss: 5.023Step 151/953240 - LR:0.0016 - train_loss: 5.060Step 161/953240 - LR:0.0017 - train_loss: 5.077Step 171/953240 - LR:0.0016 - train_loss: 5.105Step 181/953240 - LR:0.0017 - train_loss: 4.973Step 191/953240 - LR:0.0016 - train_loss: 5.101Step 201/953240 - LR:0.0016 - train_loss: 5.027Step 211/953240 - LR:0.0016 - train_loss: 4.978Step 221/953240 - LR:0.0016 - train_loss: 4.911Step 231/953240 - LR:0.0016 - train_loss: 4.900Step 241/953240 - LR:0.0017 - train_loss: 4.855Step 251/953240 - LR:0.0016 - train_loss: 4.862Step 261/953240 - LR:0.0017 - train_loss: 4.829Step 271/953240 - LR:0.0016 - train_loss: 4.780Step 281/953240 - LR:0.0017 - train_loss: 4.754Step 291/953240 - LR:0.0016 - train_loss: 4.768Step 301/953240 - LR:0.0017 - train_loss: 4.719Step 311/953240 - LR:0.0016 - train_loss: 5.057Step 321/953240 - LR:0.0017 - train_loss: 4.878Step 331/953240 - LR:0.0016 - train_loss: 4.756Step 341/953240 - LR:0.0017 - train_loss: 4.633Step 351/953240 - LR:0.0016 - train_loss: 4.598Step 361/953240 - LR:0.0017 - train_loss: 4.528Step 371/953240 - LR:0.0016 - train_loss: 4.592Step 381/953240 - LR:0.0017 - train_loss: 4.459Step 391/953240 - LR:0.0016 - train_loss: 4.523Step 401/953240 - LR:0.0017 - train_loss: 4.391Step 411/953240 - LR:0.0016 - train_loss: 4.362Step 421/953240 - LR:0.0017 - train_loss: 4.291Step 431/953240 - LR:0.0016 - train_loss: 4.323Step 441/953240 - LR:0.0017 - train_loss: 4.254Step 451/953240 - LR:0.0016 - train_loss: 4.292Step 461/953240 - LR:0.0017 - train_loss: 4.209Step 471/953240 - LR:0.0016 - train_loss: 4.217Step 481/953240 - LR:0.0017 - train_loss: 4.162Step 491/953240 - LR:0.0016 - train_loss: 4.244Step 501/953240 - LR:0.0017 - train_loss: 4.177Step 511/953240 - LR:0.0016 - train_loss: 4.161Step 521/953240 - LR:0.0017 - train_loss: 4.063Step 531/953240 - LR:0.0016 - train_loss: 4.180Step 541/953240 - LR:0.0017 - train_loss: 4.094Step 551/953240 - LR:0.0016 - train_loss: 4.038Step 561/953240 - LR:0.0017 - train_loss: 3.944Step 571/953240 - LR:0.0016 - train_loss: 3.970Step 581/953240 - LR:0.0016 - train_loss: 3.922Step 591/953240 - LR:0.0016 - train_loss: 4.036Step 601/953240 - LR:0.0016 - train_loss: 3.937Step 611/953240 - LR:0.0016 - train_loss: 3.946Step 621/953240 - LR:0.0016 - train_loss: 3.831Step 631/953240 - LR:0.0016 - train_loss: 3.929Step 641/953240 - LR:0.0017 - train_loss: 3.840Step 651/953240 - LR:0.0016 - train_loss: 3.850Step 661/953240 - LR:0.0017 - train_loss: 3.807Step 671/953240 - LR:0.0016 - train_loss: 3.784Step 681/953240 - LR:0.0017 - train_loss: 3.765Step 691/953240 - LR:0.0016 - train_loss: 3.788Step 701/953240 - LR:0.0017 - train_loss: 3.765Step 711/953240 - LR:0.0016 - train_loss: 3.753Step 721/953240 - LR:0.0017 - train_loss: 3.680Step 731/953240 - LR:0.0016 - train_loss: 3.694Step 741/953240 - LR:0.0017 - train_loss: 3.633Step 751/953240 - LR:0.0016 - train_loss: 3.814Step 761/953240 - LR:0.0017 - train_loss: 3.739Step 771/953240 - LR:0.0016 - train_loss: 3.689Step 781/953240 - LR:0.0016 - train_loss: 3.648Step 791/953240 - LR:0.0016 - train_loss: 3.635Step 801/953240 - LR:0.0017 - train_loss: 3.648Step 811/953240 - LR:0.0016 - train_loss: 3.692Step 821/953240 - LR:0.0017 - train_loss: 3.606Step 831/953240 - LR:0.0016 - train_loss: 3.575Step 841/953240 - LR:0.0017 - train_loss: 3.562Step 851/953240 - LR:0.0016 - train_loss: 3.607Step 861/953240 - LR:0.0017 - train_loss: 3.492Step 871/953240 - LR:0.0016 - train_loss: 3.695Step 881/953240 - LR:0.0017 - train_loss: 3.550Step 891/953240 - LR:0.0016 - train_loss: 3.575Step 901/953240 - LR:0.0017 - train_loss: 3.498Step 911/953240 - LR:0.0016 - train_loss: 3.551Step 921/953240 - LR:0.0017 - train_loss: 3.533Step 931/953240 - LR:0.0016 - train_loss: 3.617Step 941/953240 - LR:0.0017 - train_loss: 3.490Step 951/953240 - LR:0.0016 - train_loss: 3.502Step 961/953240 - LR:0.0017 - train_loss: 3.462Step 971/953240 - LR:0.0016 - train_loss: 3.477Step 981/953240 - LR:0.0017 - train_loss: 3.414Step 991/953240 - LR:0.0016 - train_loss: 3.459Step 1001/953240 - LR:0.0017 - train_loss: 3.435Step 1011/953240 - LR:0.0016 - train_loss: 3.448Step 1021/953240 - LR:0.0017 - train_loss: 3.419Step 1031/953240 - LR:0.0016 - train_loss: 3.489Step 1041/953240 - LR:0.0017 - train_loss: 3.384Step 1051/953240 - LR:0.0016 - train_loss: 3.426Step 1061/953240 - LR:0.0017 - train_loss: 3.355Step 1071/953240 - LR:0.0016 - train_loss: 3.420Step 1081/953240 - LR:0.0017 - train_loss: 3.352Step 1091/953240 - LR:0.0016 - train_loss: 3.384Step 1101/953240 - LR:0.0017 - train_loss: 3.330Step 1111/953240 - LR:0.0016 - train_loss: 3.347Step 1121/953240 - LR:0.0017 - train_loss: 3.310Step 1131/953240 - LR:0.0016 - train_loss: 3.400Step 1141/953240 - LR:0.0017 - train_loss: 3.308Step 1151/953240 - LR:0.0016 - train_loss: 3.416Step 1161/953240 - LR:0.0017 - train_loss: 3.271Step 1171/953240 - LR:0.0016 - train_loss: 3.326Step 1181/953240 - LR:0.0017 - train_loss: 3.295Step 1191/953240 - LR:0.0016 - train_loss: 3.363Step 1201/953240 - LR:0.0017 - train_loss: 3.247Step 1211/953240 - LR:0.0016 - train_loss: 3.305Step 1221/953240 - LR:0.0017 - train_loss: 3.230Step 1231/953240 - LR:0.0016 - train_loss: 3.284Step 1241/953240 - LR:0.0017 - train_loss: 3.216Step 1251/953240 - LR:0.0016 - train_loss: 3.318Step 1261/953240 - LR:0.0017 - train_loss: 3.211Step 1271/953240 - LR:0.0016 - train_loss: 3.279Step 1281/953240 - LR:0.0017 - train_loss: 3.205Step 1291/953240 - LR:0.0016 - train_loss: 3.236Step 1301/953240 - LR:0.0017 - train_loss: 3.171Step 1311/953240 - LR:0.0016 - train_loss: 3.209Step 1321/953240 - LR:0.0017 - train_loss: 3.212Step 1331/953240 - LR:0.0016 - train_loss: 3.267Step 1341/953240 - LR:0.0016 - train_loss: 3.168Step 1351/953240 - LR:0.0016 - train_loss: 3.242Step 1361/953240 - LR:0.0017 - train_loss: 3.147Step 1371/953240 - LR:0.0016 - train_loss: 3.165Step 1381/953240 - LR:0.0017 - train_loss: 3.109Step 1391/953240 - LR:0.0016 - train_loss: 3.138Step 1401/953240 - LR:0.0017 - train_loss: 3.120Step 1411/953240 - LR:0.0016 - train_loss: 3.147Step 1421/953240 - LR:0.0016 - train_loss: 3.079Step 1431/953240 - LR:0.0016 - train_loss: 3.108Step 1441/953240 - LR:0.0017 - train_loss: 3.099Step 1451/953240 - LR:0.0016 - train_loss: 3.151Step 1461/953240 - LR:0.0017 - train_loss: 3.018Step 1471/953240 - LR:0.0016 - train_loss: 3.187Step 1481/953240 - LR:0.0017 - train_loss: 3.087Step 1491/953240 - LR:0.0016 - train_loss: 3.100Step 1501/953240 - LR:0.0017 - train_loss: 3.039Step 1511/953240 - LR:0.0016 - train_loss: 3.049Step 1521/953240 - LR:0.0017 - train_loss: 3.041Step 1531/953240 - LR:0.0016 - train_loss: 3.090Step 1541/953240 - LR:0.0017 - train_loss: 3.028Step 1551/953240 - LR:0.0016 - train_loss: 3.055Step 1561/953240 - LR:0.0016 - train_loss: 3.024Step 1571/953240 - LR:0.0016 - train_loss: 3.089Step 1581/953240 - LR:0.0017 - train_loss: 2.978Step 1591/953240 - LR:0.0016 - train_loss: 3.020Step 1601/953240 - LR:0.0017 - train_loss: 3.007Step 1611/953240 - LR:0.0016 - train_loss: 3.017Step 1621/953240 - LR:0.0017 - train_loss: 2.944Step 1631/953240 - LR:0.0016 - train_loss: 3.051Step 1641/953240 - LR:0.0016 - train_loss: 2.975Step 1651/953240 - LR:0.0016 - train_loss: 3.040Step 1661/953240 - LR:0.0017 - train_loss: 2.975Step 1671/953240 - LR:0.0016 - train_loss: 3.005Step 1681/953240 - LR:0.0017 - train_loss: 2.959Step 1691/953240 - LR:0.0016 - train_loss: 2.981Step 1701/953240 - LR:0.0017 - train_loss: 2.933Step 1711/953240 - LR:0.0016 - train_loss: 3.070Step 1721/953240 - LR:0.0016 - train_loss: 2.936Step 1731/953240 - LR:0.0016 - train_loss: 3.060Step 1741/953240 - LR:0.0017 - train_loss: 2.926Step 1751/953240 - LR:0.0016 - train_loss: 2.947Step 1761/953240 - LR:0.0017 - train_loss: 2.938Step 1771/953240 - LR:0.0016 - train_loss: 2.990Step 1781/953240 - LR:0.0017 - train_loss: 2.906Step 1791/953240 - LR:0.0016 - train_loss: 2.941Step 1801/953240 - LR:0.0017 - train_loss: 2.885Step 1811/953240 - LR:0.0016 - train_loss: 2.916Step 1821/953240 - LR:0.0017 - train_loss: 2.870Step 1831/953240 - LR:0.0016 - train_loss: 2.920Step 1841/953240 - LR:0.0017 - train_loss: 2.906Step 1851/953240 - LR:0.0016 - train_loss: 2.953Step 1861/953240 - LR:0.0017 - train_loss: 2.881Step 1871/953240 - LR:0.0016 - train_loss: 2.936Step 1881/953240 - LR:0.0017 - train_loss: 2.863Step 1891/953240 - LR:0.0016 - train_loss: 2.936Step 1901/953240 - LR:0.0017 - train_loss: 2.835Step 1911/953240 - LR:0.0016 - train_loss: 2.900Step 1921/953240 - LR:0.0017 - train_loss: 2.810Step 1931/953240 - LR:0.0016 - train_loss: 2.917Step 1941/953240 - LR:0.0016 - train_loss: 2.818Step 1951/953240 - LR:0.0016 - train_loss: 2.868Step 1961/953240 - LR:0.0017 - train_loss: 2.791Step 1971/953240 - LR:0.0016 - train_loss: 2.911Step 1981/953240 - LR:0.0017 - train_loss: 2.802Step 1991/953240 - LR:0.0016 - train_loss: 2.858Step 2001/953240 - LR:0.0017 - train_loss: 2.790Step 2011/953240 - LR:0.0016 - train_loss: 2.839Step 2021/953240 - LR:0.0016 - train_loss: 2.786Step 2031/953240 - LR:0.0016 - train_loss: 2.818Step 2041/953240 - LR:0.0017 - train_loss: 2.785Step 2051/953240 - LR:0.0016 - train_loss: 2.804Step 2061/953240 - LR:0.0017 - train_loss: 2.799Step 2071/953240 - LR:0.0016 - train_loss: 2.823Step 2081/953240 - LR:0.0017 - train_loss: 2.750Step 2091/953240 - LR:0.0016 - train_loss: 2.816Step 2101/953240 - LR:0.0017 - train_loss: 2.771Step 2111/953240 - LR:0.0016 - train_loss: 2.781Step 2121/953240 - LR:0.0017 - train_loss: 2.719Step 2131/953240 - LR:0.0016 - train_loss: 2.775Step 2141/953240 - LR:0.0017 - train_loss: 2.719Step 2151/953240 - LR:0.0016 - train_loss: 2.779Step 2161/953240 - LR:0.0017 - train_loss: 2.713Step 2171/953240 - LR:0.0016 - train_loss: 2.756Step 2181/953240 - LR:0.0017 - train_loss: 2.748Step 2191/953240 - LR:0.0016 - train_loss: 2.767Step 2201/953240 - LR:0.0017 - train_loss: 2.703Step 2211/953240 - LR:0.0016 - train_loss: 2.705Step 2221/953240 - LR:0.0017 - train_loss: 2.725Step 2231/953240 - LR:0.0016 - train_loss: 2.751Step 2241/953240 - LR:0.0017 - train_loss: 2.690Step 2251/953240 - LR:0.0016 - train_loss: 2.777Step 2261/953240 - LR:0.0017 - train_loss: 2.730Step 2271/953240 - LR:0.0016 - train_loss: 2.726Step 2281/953240 - LR:0.0017 - train_loss: 2.671Step 2291/953240 - LR:0.0016 - train_loss: 2.761Step 2301/953240 - LR:0.0017 - train_loss: 2.690Step 2311/953240 - LR:0.0016 - train_loss: 2.739Step 2321/953240 - LR:0.0016 - train_loss: 2.693Step 2331/953240 - LR:0.0016 - train_loss: 2.741Step 2341/953240 - LR:0.0017 - train_loss: 2.670Step 2351/953240 - LR:0.0016 - train_loss: 2.722Step 2361/953240 - LR:0.0017 - train_loss: 2.611Step 2371/953240 - LR:0.0016 - train_loss: 2.691Step 2381/953240 - LR:0.0017 - train_loss: 2.671Step 2391/953240 - LR:0.0016 - train_loss: 2.671Step 2401/953240 - LR:0.0017 - train_loss: 2.641Step 2411/953240 - LR:0.0016 - train_loss: 2.659Step 2421/953240 - LR:0.0017 - train_loss: 2.631Step 2431/953240 - LR:0.0016 - train_loss: 2.666Step 2441/953240 - LR:0.0017 - train_loss: 2.644Step 2451/953240 - LR:0.0016 - train_loss: 2.707Step 2461/953240 - LR:0.0017 - train_loss: 2.622Step 2471/953240 - LR:0.0016 - train_loss: 2.639Step 2481/953240 - LR:0.0017 - train_loss: 2.614Step 2491/953240 - LR:0.0016 - train_loss: 2.674Step 2501/953240 - LR:0.0017 - train_loss: 2.634Step 2511/953240 - LR:0.0016 - train_loss: 2.700Step 2521/953240 - LR:0.0017 - train_loss: 2.598Step 2531/953240 - LR:0.0016 - train_loss: 2.645Step 2541/953240 - LR:0.0017 - train_loss: 2.607Step 2551/953240 - LR:0.0016 - train_loss: 2.625Step 2561/953240 - LR:0.0017 - train_loss: 2.580Step 2571/953240 - LR:0.0016 - train_loss: 2.607Step 2581/953240 - LR:0.0017 - train_loss: 2.617Step 2591/953240 - LR:0.0016 - train_loss: 2.614Step 2601/953240 - LR:0.0017 - train_loss: 2.536Step 2611/953240 - LR:0.0016 - train_loss: 2.619Step 2621/953240 - LR:0.0017 - train_loss: 2.538Step 2631/953240 - LR:0.0016 - train_loss: 2.620Step 2641/953240 - LR:0.0017 - train_loss: 2.552Step 2651/953240 - LR:0.0016 - train_loss: 2.601Step 2661/953240 - LR:0.0017 - train_loss: 2.550Step 2671/953240 - LR:0.0016 - train_loss: 2.588Step 2681/953240 - LR:0.0017 - train_loss: 2.517Step 2691/953240 - LR:0.0016 - train_loss: 2.622Step 2701/953240 - LR:0.0017 - train_loss: 2.537Step 2711/953240 - LR:0.0016 - train_loss: 2.560Step 2721/953240 - LR:0.0017 - train_loss: 2.528Step 2731/953240 - LR:0.0016 - train_loss: 2.542Step 2741/953240 - LR:0.0017 - train_loss: 2.527Step 2751/953240 - LR:0.0016 - train_loss: 2.616Step 2761/953240 - LR:0.0017 - train_loss: 2.519Step 2771/953240 - LR:0.0016 - train_loss: 2.567Step 2781/953240 - LR:0.0017 - train_loss: 2.536Step 2791/953240 - LR:0.0016 - train_loss: 2.526Step 2801/953240 - LR:0.0017 - train_loss: 2.490Step 2811/953240 - LR:0.0016 - train_loss: 2.528Step 2821/953240 - LR:0.0016 - train_loss: 2.525Step 2831/953240 - LR:0.0016 - train_loss: 2.598Step 2841/953240 - LR:0.0017 - train_loss: 2.527Step 2851/953240 - LR:0.0016 - train_loss: 2.561Step 2861/953240 - LR:0.0017 - train_loss: 2.494Step 2871/953240 - LR:0.0016 - train_loss: 2.533Step 2881/953240 - LR:0.0017 - train_loss: 2.510Step 2891/953240 - LR:0.0016 - train_loss: 2.495Step 2901/953240 - LR:0.0017 - train_loss: 2.489Step 2911/953240 - LR:0.0016 - train_loss: 2.490Step 2921/953240 - LR:0.0017 - train_loss: 2.462Step 2931/953240 - LR:0.0016 - train_loss: 2.457Step 2941/953240 - LR:0.0017 - train_loss: 2.465Step 2951/953240 - LR:0.0016 - train_loss: 2.505Step 2961/953240 - LR:0.0017 - train_loss: 2.474Step 2971/953240 - LR:0.0016 - train_loss: 2.493Step 2981/953240 - LR:0.0016 - train_loss: 2.438Step 2991/953240 - LR:0.0016 - train_loss: 2.513Step 3001/953240 - LR:0.0017 - train_loss: 2.436Step 3011/953240 - LR:0.0016 - train_loss: 2.494Step 3021/953240 - LR:0.0017 - train_loss: 2.446Step 3031/953240 - LR:0.0016 - train_loss: 2.476Step 3041/953240 - LR:0.0017 - train_loss: 2.423Step 3051/953240 - LR:0.0016 - train_loss: 2.492Step 3061/953240 - LR:0.0017 - train_loss: 2.428Step 3071/953240 - LR:0.0016 - train_loss: 2.449Step 3081/953240 - LR:0.0017 - train_loss: 2.399Step 3091/953240 - LR:0.0016 - train_loss: 2.495Step 3101/953240 - LR:0.0017 - train_loss: 2.411Step 3111/953240 - LR:0.0016 - train_loss: 2.476Step 3121/953240 - LR:0.0017 - train_loss: 2.427Step 3131/953240 - LR:0.0016 - train_loss: 2.461Step 3141/953240 - LR:0.0017 - train_loss: 2.404Step 3151/953240 - LR:0.0016 - train_loss: 2.413Step 3161/953240 - LR:0.0017 - train_loss: 2.357Step 3171/953240 - LR:0.0016 - train_loss: 2.411Step 3181/953240 - LR:0.0017 - train_loss: 2.420Step 3191/953240 - LR:0.0016 - train_loss: 2.441Step 3201/953240 - LR:0.0016 - train_loss: 2.374Step 3211/953240 - LR:0.0016 - train_loss: 2.433Step 3221/953240 - LR:0.0017 - train_loss: 2.400Step 3231/953240 - LR:0.0016 - train_loss: 2.449Step 3241/953240 - LR:0.0017 - train_loss: 2.392Step 3251/953240 - LR:0.0016 - train_loss: 2.472Step 3261/953240 - LR:0.0017 - train_loss: 2.373Step 3271/953240 - LR:0.0016 - train_loss: 2.402Step 3281/953240 - LR:0.0017 - train_loss: 2.370Step 3291/953240 - LR:0.0016 - train_loss: 2.418Step 3301/953240 - LR:0.0017 - train_loss: 2.398Step 3311/953240 - LR:0.0016 - train_loss: 2.416Step 3321/953240 - LR:0.0017 - train_loss: 2.380Step 3331/953240 - LR:0.0016 - train_loss: 2.427Step 3341/953240 - LR:0.0017 - train_loss: 2.368Step 3351/953240 - LR:0.0016 - train_loss: 2.419Step 3361/953240 - LR:0.0016 - train_loss: 2.383Step 3371/953240 - LR:0.0016 - train_loss: 2.386Step 3381/953240 - LR:0.0017 - train_loss: 2.327Step 3391/953240 - LR:0.0016 - train_loss: 2.327Step 3401/953240 - LR:0.0017 - train_loss: 2.354Step 3411/953240 - LR:0.0016 - train_loss: 2.385Step 3421/953240 - LR:0.0017 - train_loss: 2.335Step 3431/953240 - LR:0.0016 - train_loss: 2.351Step 3441/953240 - LR:0.0017 - train_loss: 2.324Step 3451/953240 - LR:0.0016 - train_loss: 2.334Step 3461/953240 - LR:0.0017 - train_loss: 2.307Step 3471/953240 - LR:0.0016 - train_loss: 2.350Step 3481/953240 - LR:0.0017 - train_loss: 2.352Step 3491/953240 - LR:0.0016 - train_loss: 2.396Step 3501/953240 - LR:0.0017 - train_loss: 2.291Step 3511/953240 - LR:0.0016 - train_loss: 2.344Step 3521/953240 - LR:0.0017 - train_loss: 2.356Step 3531/953240 - LR:0.0016 - train_loss: 2.355Step 3541/953240 - LR:0.0017 - train_loss: 2.353Step 3551/953240 - LR:0.0016 - train_loss: 2.312Step 3561/953240 - LR:0.0017 - train_loss: 2.314Step 3571/953240 - LR:0.0016 - train_loss: 2.328Step 3581/953240 - LR:0.0016 - train_loss: 2.268Step 3591/953240 - LR:0.0016 - train_loss: 2.372Step 3601/953240 - LR:0.0017 - train_loss: 2.304Step 3611/953240 - LR:0.0016 - train_loss: 2.329Step 3621/953240 - LR:0.0017 - train_loss: 2.314Step 3631/953240 - LR:0.0016 - train_loss: 2.338Step 3641/953240 - LR:0.0017 - train_loss: 2.326Step 3651/953240 - LR:0.0016 - train_loss: 2.362Step 3661/953240 - LR:0.0017 - train_loss: 2.335Step 3671/953240 - LR:0.0016 - train_loss: 2.287Step 3681/953240 - LR:0.0017 - train_loss: 2.298Step 3691/953240 - LR:0.0016 - train_loss: 2.300Step 3701/953240 - LR:0.0017 - train_loss: 2.267Step 3711/953240 - LR:0.0016 - train_loss: 2.315Step 3721/953240 - LR:0.0017 - train_loss: 2.245Step 3731/953240 - LR:0.0016 - train_loss: 2.292Step 3741/953240 - LR:0.0017 - train_loss: 2.305Step 3751/953240 - LR:0.0016 - train_loss: 2.354Step 3761/953240 - LR:0.0017 - train_loss: 2.295Step 3771/953240 - LR:0.0016 - train_loss: 2.267Step 3781/953240 - LR:0.0017 - train_loss: 2.260Step 3791/953240 - LR:0.0016 - train_loss: 2.316Step 3801/953240 - LR:0.0017 - train_loss: 2.250Step 3811/953240 - LR:0.0016 - train_loss: 2.315Step 3821/953240 - LR:0.0017 - train_loss: 2.231Step 3831/953240 - LR:0.0016 - train_loss: 2.294Step 3841/953240 - LR:0.0017 - train_loss: 2.287Step 3851/953240 - LR:0.0016 - train_loss: 2.291Step 3861/953240 - LR:0.0017 - train_loss: 2.243Step 3871/953240 - LR:0.0016 - train_loss: 2.281Step 3881/953240 - LR:0.0017 - train_loss: 2.271Step 3891/953240 - LR:0.0016 - train_loss: 2.290Step 3901/953240 - LR:0.0017 - train_loss: 2.272Step 3911/953240 - LR:0.0016 - train_loss: 2.320Step 3921/953240 - LR:0.0017 - train_loss: 2.257Step 3931/953240 - LR:0.0016 - train_loss: 2.260Step 3941/953240 - LR:0.0017 - train_loss: 2.283Step 3951/953240 - LR:0.0016 - train_loss: 2.292Step 3961/953240 - LR:0.0016 - train_loss: 2.250Step 3971/953240 - LR:0.0016 - train_loss: 2.290Step 3981/953240 - LR:0.0017 - train_loss: 2.221Step 3991/953240 - LR:0.0016 - train_loss: 2.250Step 4001/953240 - LR:0.0017 - train_loss: 2.258Step 4011/953240 - LR:0.0016 - train_loss: 2.265Step 4021/953240 - LR:0.0017 - train_loss: 2.228Step 4031/953240 - LR:0.0016 - train_loss: 2.238Step 4041/953240 - LR:0.0017 - train_loss: 2.242Step 4051/953240 - LR:0.0016 - train_loss: 2.275Step 4061/953240 - LR:0.0017 - train_loss: 2.208Step 4071/953240 - LR:0.0016 - train_loss: 2.268Step 4081/953240 - LR:0.0017 - train_loss: 2.252Step 4091/953240 - LR:0.0016 - train_loss: 2.231Step 4101/953240 - LR:0.0017 - train_loss: 2.205Step 4111/953240 - LR:0.0016 - train_loss: 2.270Step 4121/953240 - LR:0.0017 - train_loss: 2.217Step 4131/953240 - LR:0.0016 - train_loss: 2.225Step 4141/953240 - LR:0.0017 - train_loss: 2.201Step 4151/953240 - LR:0.0016 - train_loss: 2.223Step 4161/953240 - LR:0.0017 - train_loss: 2.177Step 4171/953240 - LR:0.0016 - train_loss: 2.249Step 4181/953240 - LR:0.0017 - train_loss: 2.201Step 4191/953240 - LR:0.0016 - train_loss: 2.258Step 4201/953240 - LR:0.0017 - train_loss: 2.169Step 4211/953240 - LR:0.0016 - train_loss: 2.244Step 4221/953240 - LR:0.0017 - train_loss: 2.199Step 4231/953240 - LR:0.0016 - train_loss: 2.225Step 4241/953240 - LR:0.0017 - train_loss: 2.191Step 4251/953240 - LR:0.0016 - train_loss: 2.224Step 4261/953240 - LR:0.0017 - train_loss: 2.196Step 4271/953240 - LR:0.0016 - train_loss: 2.219Step 4281/953240 - LR:0.0017 - train_loss: 2.171Step 4291/953240 - LR:0.0016 - train_loss: 2.212Step 4301/953240 - LR:0.0017 - train_loss: 2.147Step 4311/953240 - LR:0.0016 - train_loss: 2.225Step 4321/953240 - LR:0.0017 - train_loss: 2.197Step 4331/953240 - LR:0.0016 - train_loss: 2.208Step 4341/953240 - LR:0.0017 - train_loss: 2.233Step 4351/953240 - LR:0.0016 - train_loss: 2.250Step 4361/953240 - LR:0.0017 - train_loss: 2.229Step 4371/953240 - LR:0.0016 - train_loss: 2.206Step 4381/953240 - LR:0.0017 - train_loss: 2.151Step 4391/953240 - LR:0.0016 - train_loss: 2.215Step 4401/953240 - LR:0.0017 - train_loss: 2.154Step 4411/953240 - LR:0.0016 - train_loss: 2.197Step 4421/953240 - LR:0.0017 - train_loss: 2.159Step 4431/953240 - LR:0.0016 - train_loss: 2.149Step 4441/953240 - LR:0.0017 - train_loss: 2.188Step 4451/953240 - LR:0.0016 - train_loss: 2.187Step 4461/953240 - LR:0.0017 - train_loss: 2.163Step 4471/953240 - LR:0.0016 - train_loss: 2.155Step 4481/953240 - LR:0.0017 - train_loss: 2.138Step 4491/953240 - LR:0.0016 - train_loss: 2.196Step 4501/953240 - LR:0.0017 - train_loss: 2.183Step 4511/953240 - LR:0.0016 - train_loss: 2.174Step 4521/953240 - LR:0.0017 - train_loss: 2.148Step 4531/953240 - LR:0.0016 - train_loss: 2.201Step 4541/953240 - LR:0.0017 - train_loss: 2.158Step 4551/953240 - LR:0.0016 - train_loss: 2.181Step 4561/953240 - LR:0.0017 - train_loss: 2.143Step 4571/953240 - LR:0.0016 - train_loss: 2.162Step 4581/953240 - LR:0.0017 - train_loss: 2.118Step 4591/953240 - LR:0.0016 - train_loss: 2.151Step 4601/953240 - LR:0.0017 - train_loss: 2.162Step 4611/953240 - LR:0.0016 - train_loss: 2.162Step 4621/953240 - LR:0.0017 - train_loss: 2.136Step 4631/953240 - LR:0.0016 - train_loss: 2.136Step 4641/953240 - LR:0.0017 - train_loss: 2.152Step 4651/953240 - LR:0.0016 - train_loss: 2.159Step 4661/953240 - LR:0.0017 - train_loss: 2.170Step 4671/953240 - LR:0.0016 - train_loss: 2.176Step 4681/953240 - LR:0.0017 - train_loss: 2.132Step 4691/953240 - LR:0.0016 - train_loss: 2.205Step 4701/953240 - LR:0.0017 - train_loss: 2.134Step 4711/953240 - LR:0.0016 - train_loss: 2.168Step 4721/953240 - LR:0.0017 - train_loss: 2.151Step 4731/953240 - LR:0.0016 - train_loss: 2.187Step 4741/953240 - LR:0.0017 - train_loss: 2.164Step 4751/953240 - LR:0.0016 - train_loss: 2.197Step 4761/953240 - LR:0.0017 - train_loss: 2.154Step 4771/953240 - LR:0.0016 - train_loss: 2.162Step 4781/953240 - LR:0.0017 - train_loss: 2.143Step 4791/953240 - LR:0.0016 - train_loss: 2.144Step 4801/953240 - LR:0.0017 - train_loss: 2.110Step 4811/953240 - LR:0.0016 - train_loss: 2.176Step 4821/953240 - LR:0.0017 - train_loss: 2.140Step 4831/953240 - LR:0.0016 - train_loss: 2.186Step 4841/953240 - LR:0.0017 - train_loss: 2.107Step 4851/953240 - LR:0.0016 - train_loss: 2.151Step 4861/953240 - LR:0.0017 - train_loss: 2.136Step 4871/953240 - LR:0.0016 - train_loss: 2.129Step 4881/953240 - LR:0.0017 - train_loss: 2.114Step 4891/953240 - LR:0.0016 - train_loss: 2.139Step 4901/953240 - LR:0.0017 - train_loss: 2.122Step 4911/953240 - LR:0.0016 - train_loss: 2.145Step 4921/953240 - LR:0.0017 - train_loss: 2.120Step 4931/953240 - LR:0.0016 - train_loss: 2.136Step 4941/953240 - LR:0.0017 - train_loss: 2.107Step 4951/953240 - LR:0.0016 - train_loss: 2.117Step 4961/953240 - LR:0.0017 - train_loss: 2.077Step 4971/953240 - LR:0.0016 - train_loss: 2.143Step 4981/953240 - LR:0.0017 - train_loss: 2.086Step 4991/953240 - LR:0.0016 - train_loss: 2.135Step 5001/953240 - LR:0.0017 - train_loss: 2.154
Epoch 2/2:
Step 1/953240 - LR:0.0003 - train_loss: 2.119Step 11/953240 - LR:0.0030 - train_loss: 2.098Step 21/953240 - LR:0.0003 - train_loss: 2.148Step 31/953240 - LR:0.0030 - train_loss: 2.097Step 41/953240 - LR:0.0003 - train_loss: 2.110Step 51/953240 - LR:0.0030 - train_loss: 2.118Step 61/953240 - LR:0.0003 - train_loss: 2.149Step 71/953240 - LR:0.0030 - train_loss: 2.152Step 81/953240 - LR:0.0003 - train_loss: 2.126Step 91/953240 - LR:0.0030 - train_loss: 2.129Step 101/953240 - LR:0.0003 - train_loss: 2.105Step 111/953240 - LR:0.0030 - train_loss: 2.099Step 121/953240 - LR:0.0003 - train_loss: 2.127Step 131/953240 - LR:0.0030 - train_loss: 2.129Step 141/953240 - LR:0.0003 - train_loss: 2.103Step 151/953240 - LR:0.0030 - train_loss: 2.131Step 161/953240 - LR:0.0003 - train_loss: 2.125Step 171/953240 - LR:0.0030 - train_loss: 2.149Step 181/953240 - LR:0.0003 - train_loss: 2.096Step 191/953240 - LR:0.0030 - train_loss: 2.128Step 201/953240 - LR:0.0003 - train_loss: 2.118Step 211/953240 - LR:0.0030 - train_loss: 2.122Step 221/953240 - LR:0.0003 - train_loss: 2.132Step 231/953240 - LR:0.0030 - train_loss: 2.063Step 241/953240 - LR:0.0003 - train_loss: 2.137Step 251/953240 - LR:0.0030 - train_loss: 2.104Step 261/953240 - LR:0.0003 - train_loss: 2.102Step 271/953240 - LR:0.0030 - train_loss: 2.092Step 281/953240 - LR:0.0003 - train_loss: 2.107Step 291/953240 - LR:0.0030 - train_loss: 2.109Step 301/953240 - LR:0.0003 - train_loss: 2.090Step 311/953240 - LR:0.0030 - train_loss: 2.159Step 321/953240 - LR:0.0003 - train_loss: 2.095Step 331/953240 - LR:0.0030 - train_loss: 2.088Step 341/953240 - LR:0.0003 - train_loss: 2.102Step 351/953240 - LR:0.0030 - train_loss: 2.135Step 361/953240 - LR:0.0003 - train_loss: 2.110Step 371/953240 - LR:0.0030 - train_loss: 2.083Step 381/953240 - LR:0.0003 - train_loss: 2.091Step 391/953240 - LR:0.0030 - train_loss: 2.102Step 401/953240 - LR:0.0003 - train_loss: 2.100Step 411/953240 - LR:0.0030 - train_loss: 2.107Step 421/953240 - LR:0.0003 - train_loss: 2.107Step 431/953240 - LR:0.0030 - train_loss: 2.118Step 441/953240 - LR:0.0003 - train_loss: 2.096Step 451/953240 - LR:0.0030 - train_loss: 2.110Step 461/953240 - LR:0.0003 - train_loss: 2.055Step 471/953240 - LR:0.0030 - train_loss: 2.082Step 481/953240 - LR:0.0003 - train_loss: 2.110Step 491/953240 - LR:0.0030 - train_loss: 2.124Step 501/953240 - LR:0.0003 - train_loss: 2.109Step 511/953240 - LR:0.0030 - train_loss: 2.108Step 521/953240 - LR:0.0003 - train_loss: 2.095Step 531/953240 - LR:0.0030 - train_loss: 2.086Step 541/953240 - LR:0.0003 - train_loss: 2.065Step 551/953240 - LR:0.0030 - train_loss: 2.091Step 561/953240 - LR:0.0003 - train_loss: 2.105Step 571/953240 - LR:0.0030 - train_loss: 2.090Step 581/953240 - LR:0.0003 - train_loss: 2.073Step 591/953240 - LR:0.0030 - train_loss: 2.064Step 601/953240 - LR:0.0003 - train_loss: 2.068Step 611/953240 - LR:0.0030 - train_loss: 2.063Step 621/953240 - LR:0.0003 - train_loss: 2.098Step 631/953240 - LR:0.0030 - train_loss: 2.078Step 641/953240 - LR:0.0003 - train_loss: 2.063Step 651/953240 - LR:0.0030 - train_loss: 2.086Step 661/953240 - LR:0.0003 - train_loss: 2.091Step 671/953240 - LR:0.0030 - train_loss: 2.079Step 681/953240 - LR:0.0003 - train_loss: 2.101Step 691/953240 - LR:0.0030 - train_loss: 2.079Step 701/953240 - LR:0.0003 - train_loss: 2.068Step 711/953240 - LR:0.0030 - train_loss: 2.048Step 721/953240 - LR:0.0003 - train_loss: 2.052Step 731/953240 - LR:0.0030 - train_loss: 2.068Step 741/953240 - LR:0.0003 - train_loss: 2.070Step 751/953240 - LR:0.0030 - train_loss: 2.052Step 761/953240 - LR:0.0003 - train_loss: 2.098Step 771/953240 - LR:0.0030 - train_loss: 2.120Step 781/953240 - LR:0.0003 - train_loss: 2.093Step 791/953240 - LR:0.0030 - train_loss: 2.065Step 801/953240 - LR:0.0003 - train_loss: 2.081Step 811/953240 - LR:0.0030 - train_loss: 2.057Step 821/953240 - LR:0.0003 - train_loss: 2.078Step 831/953240 - LR:0.0030 - train_loss: 2.048Step 841/953240 - LR:0.0003 - train_loss: 2.084Step 851/953240 - LR:0.0030 - train_loss: 2.056Step 861/953240 - LR:0.0003 - train_loss: 2.086Step 871/953240 - LR:0.0030 - train_loss: 2.068Step 881/953240 - LR:0.0003 - train_loss: 2.069Step 891/953240 - LR:0.0030 - train_loss: 2.052Step 901/953240 - LR:0.0003 - train_loss: 2.038Step 911/953240 - LR:0.0030 - train_loss: 2.056Step 921/953240 - LR:0.0003 - train_loss: 2.074Step 931/953240 - LR:0.0030 - train_loss: 2.077Step 941/953240 - LR:0.0003 - train_loss: 2.088Step 951/953240 - LR:0.0030 - train_loss: 2.096Step 961/953240 - LR:0.0003 - train_loss: 2.066Step 971/953240 - LR:0.0030 - train_loss: 2.080Step 981/953240 - LR:0.0003 - train_loss: 2.051Step 991/953240 - LR:0.0030 - train_loss: 2.047Step 1001/953240 - LR:0.0003 - train_loss: 2.061Step 1011/953240 - LR:0.0030 - train_loss: 2.088Step 1021/953240 - LR:0.0003 - train_loss: 2.092Step 1031/953240 - LR:0.0030 - train_loss: 2.066Step 1041/953240 - LR:0.0003 - train_loss: 2.061Step 1051/953240 - LR:0.0030 - train_loss: 2.057Step 1061/953240 - LR:0.0003 - train_loss: 2.042Step 1071/953240 - LR:0.0030 - train_loss: 2.012Step 1081/953240 - LR:0.0003 - train_loss: 2.070Step 1091/953240 - LR:0.0030 - train_loss: 2.048Step 1101/953240 - LR:0.0003 - train_loss: 2.037Step 1111/953240 - LR:0.0030 - train_loss: 2.058Step 1121/953240 - LR:0.0003 - train_loss: 2.054Step 1131/953240 - LR:0.0030 - train_loss: 2.019Step 1141/953240 - LR:0.0003 - train_loss: 2.052Step 1151/953240 - LR:0.0030 - train_loss: 2.027Step 1161/953240 - LR:0.0003 - train_loss: 2.015Step 1171/953240 - LR:0.0030 - train_loss: 2.016Step 1181/953240 - LR:0.0003 - train_loss: 2.035Step 1191/953240 - LR:0.0030 - train_loss: 2.044Step 1201/953240 - LR:0.0003 - train_loss: 2.049Step 1211/953240 - LR:0.0030 - train_loss: 2.035Step 1221/953240 - LR:0.0003 - train_loss: 2.058Step 1231/953240 - LR:0.0030 - train_loss: 2.022Step 1241/953240 - LR:0.0003 - train_loss: 2.032Step 1251/953240 - LR:0.0030 - train_loss: 2.022Step 1261/953240 - LR:0.0003 - train_loss: 2.024Step 1271/953240 - LR:0.0030 - train_loss: 2.027Step 1281/953240 - LR:0.0003 - train_loss: 2.042Step 1291/953240 - LR:0.0030 - train_loss: 2.078Step 1301/953240 - LR:0.0003 - train_loss: 2.065Step 1311/953240 - LR:0.0030 - train_loss: 2.039Step 1321/953240 - LR:0.0003 - train_loss: 2.041Step 1331/953240 - LR:0.0030 - train_loss: 2.065Step 1341/953240 - LR:0.0003 - train_loss: 2.012Step 1351/953240 - LR:0.0030 - train_loss: 2.035Step 1361/953240 - LR:0.0003 - train_loss: 2.023Step 1371/953240 - LR:0.0030 - train_loss: 2.031Step 1381/953240 - LR:0.0003 - train_loss: 2.045Step 1391/953240 - LR:0.0030 - train_loss: 2.041Step 1401/953240 - LR:0.0003 - train_loss: 2.026Step 1411/953240 - LR:0.0030 - train_loss: 2.024Step 1421/953240 - LR:0.0003 - train_loss: 2.025Step 1431/953240 - LR:0.0030 - train_loss: 2.048Step 1441/953240 - LR:0.0003 - train_loss: 2.055Step 1451/953240 - LR:0.0030 - train_loss: 2.066Step 1461/953240 - LR:0.0003 - train_loss: 2.039Step 1471/953240 - LR:0.0030 - train_loss: 2.072Step 1481/953240 - LR:0.0003 - train_loss: 2.059Step 1491/953240 - LR:0.0030 - train_loss: 2.021Step 1501/953240 - LR:0.0003 - train_loss: 2.006Step 1511/953240 - LR:0.0030 - train_loss: 2.021Step 1521/953240 - LR:0.0003 - train_loss: 2.038Step 1531/953240 - LR:0.0030 - train_loss: 2.036Step 1541/953240 - LR:0.0003 - train_loss: 2.014Step 1551/953240 - LR:0.0030 - train_loss: 2.026Step 1561/953240 - LR:0.0003 - train_loss: 2.026Step 1571/953240 - LR:0.0030 - train_loss: 2.027Step 1581/953240 - LR:0.0003 - train_loss: 2.018Step 1591/953240 - LR:0.0030 - train_loss: 2.002Step 1601/953240 - LR:0.0003 - train_loss: 2.044Step 1611/953240 - LR:0.0030 - train_loss: 2.020Step 1621/953240 - LR:0.0003 - train_loss: 2.030Step 1631/953240 - LR:0.0030 - train_loss: 2.034Step 1641/953240 - LR:0.0003 - train_loss: 2.012Step 1651/953240 - LR:0.0030 - train_loss: 2.014Step 1661/953240 - LR:0.0003 - train_loss: 2.023Step 1671/953240 - LR:0.0030 - train_loss: 1.993Step 1681/953240 - LR:0.0003 - train_loss: 1.995Step 1691/953240 - LR:0.0030 - train_loss: 2.024Step 1701/953240 - LR:0.0003 - train_loss: 2.013Step 1711/953240 - LR:0.0030 - train_loss: 2.014Step 1721/953240 - LR:0.0003 - train_loss: 2.036Step 1731/953240 - LR:0.0030 - train_loss: 2.015Step 1741/953240 - LR:0.0003 - train_loss: 1.998Step 1751/953240 - LR:0.0030 - train_loss: 1.989Step 1761/953240 - LR:0.0003 - train_loss: 2.038Step 1771/953240 - LR:0.0030 - train_loss: 2.059Step 1781/953240 - LR:0.0003 - train_loss: 2.038Step 1791/953240 - LR:0.0030 - train_loss: 2.007Step 1801/953240 - LR:0.0003 - train_loss: 1.986Step 1811/953240 - LR:0.0030 - train_loss: 2.046Step 1821/953240 - LR:0.0003 - train_loss: 1.999Step 1831/953240 - LR:0.0030 - train_loss: 1.994Step 1841/953240 - LR:0.0003 - train_loss: 2.021Step 1851/953240 - LR:0.0030 - train_loss: 2.008Step 1861/953240 - LR:0.0003 - train_loss: 2.026Step 1871/953240 - LR:0.0030 - train_loss: 2.009Step 1881/953240 - LR:0.0003 - train_loss: 1.947Step 1891/953240 - LR:0.0030 - train_loss: 2.013Step 1901/953240 - LR:0.0003 - train_loss: 1.992Step 1911/953240 - LR:0.0030 - train_loss: 2.010Step 1921/953240 - LR:0.0003 - train_loss: 2.010Step 1931/953240 - LR:0.0030 - train_loss: 2.021Step 1941/953240 - LR:0.0003 - train_loss: 2.016Step 1951/953240 - LR:0.0030 - train_loss: 2.015Step 1961/953240 - LR:0.0003 - train_loss: 2.013Step 1971/953240 - LR:0.0030 - train_loss: 2.044Step 1981/953240 - LR:0.0003 - train_loss: 1.972Step 1991/953240 - LR:0.0030 - train_loss: 2.027Step 2001/953240 - LR:0.0003 - train_loss: 2.021Step 2011/953240 - LR:0.0030 - train_loss: 1.992Step 2021/953240 - LR:0.0003 - train_loss: 2.003Step 2031/953240 - LR:0.0030 - train_loss: 2.030Step 2041/953240 - LR:0.0003 - train_loss: 2.034Step 2051/953240 - LR:0.0030 - train_loss: 2.027Step 2061/953240 - LR:0.0003 - train_loss: 2.020Step 2071/953240 - LR:0.0030 - train_loss: 2.007Step 2081/953240 - LR:0.0003 - train_loss: 2.011Step 2091/953240 - LR:0.0030 - train_loss: 1.979Step 2101/953240 - LR:0.0003 - train_loss: 1.996Step 2111/953240 - LR:0.0030 - train_loss: 2.013Step 2121/953240 - LR:0.0003 - train_loss: 2.000Step 2131/953240 - LR:0.0030 - train_loss: 1.987Step 2141/953240 - LR:0.0003 - train_loss: 1.994Step 2151/953240 - LR:0.0030 - train_loss: 2.008Step 2161/953240 - LR:0.0003 - train_loss: 1.997Step 2171/953240 - LR:0.0030 - train_loss: 2.007Step 2181/953240 - LR:0.0003 - train_loss: 1.989Step 2191/953240 - LR:0.0030 - train_loss: 1.955Step 2201/953240 - LR:0.0003 - train_loss: 2.008Step 2211/953240 - LR:0.0030 - train_loss: 2.032Step 2221/953240 - LR:0.0003 - train_loss: 1.996Step 2231/953240 - LR:0.0030 - train_loss: 2.009Step 2241/953240 - LR:0.0003 - train_loss: 2.071Step 2251/953240 - LR:0.0030 - train_loss: 1.998Step 2261/953240 - LR:0.0003 - train_loss: 2.013Step 2271/953240 - LR:0.0030 - train_loss: 1.990Step 2281/953240 - LR:0.0003 - train_loss: 2.034Step 2291/953240 - LR:0.0030 - train_loss: 1.991Step 2301/953240 - LR:0.0003 - train_loss: 1.992Step 2311/953240 - LR:0.0030 - train_loss: 2.020Step 2321/953240 - LR:0.0003 - train_loss: 1.986Step 2331/953240 - LR:0.0030 - train_loss: 1.979Step 2341/953240 - LR:0.0003 - train_loss: 1.983Step 2351/953240 - LR:0.0030 - train_loss: 1.986Step 2361/953240 - LR:0.0003 - train_loss: 2.002Step 2371/953240 - LR:0.0030 - train_loss: 1.958Step 2381/953240 - LR:0.0003 - train_loss: 1.983Step 2391/953240 - LR:0.0030 - train_loss: 1.964Step 2401/953240 - LR:0.0003 - train_loss: 1.988Step 2411/953240 - LR:0.0030 - train_loss: 1.988Step 2421/953240 - LR:0.0003 - train_loss: 1.990Step 2431/953240 - LR:0.0030 - train_loss: 1.990Step 2441/953240 - LR:0.0003 - train_loss: 1.991Step 2451/953240 - LR:0.0030 - train_loss: 1.983Step 2461/953240 - LR:0.0003 - train_loss: 2.002Step 2471/953240 - LR:0.0030 - train_loss: 1.975Step 2481/953240 - LR:0.0003 - train_loss: 1.960Step 2491/953240 - LR:0.0030 - train_loss: 1.981Step 2501/953240 - LR:0.0003 - train_loss: 1.980Step 2511/953240 - LR:0.0030 - train_loss: 1.976Step 2521/953240 - LR:0.0003 - train_loss: 1.972Step 2531/953240 - LR:0.0030 - train_loss: 1.974Step 2541/953240 - LR:0.0003 - train_loss: 1.966Step 2551/953240 - LR:0.0030 - train_loss: 2.000Step 2561/953240 - LR:0.0003 - train_loss: 1.984Step 2571/953240 - LR:0.0030 - train_loss: 2.012Step 2581/953240 - LR:0.0003 - train_loss: 1.970Step 2591/953240 - LR:0.0030 - train_loss: 1.964Step 2601/953240 - LR:0.0003 - train_loss: 2.011Step 2611/953240 - LR:0.0030 - train_loss: 1.986Step 2621/953240 - LR:0.0003 - train_loss: 1.992Step 2631/953240 - LR:0.0030 - train_loss: 1.945Step 2641/953240 - LR:0.0003 - train_loss: 1.953Step 2651/953240 - LR:0.0030 - train_loss: 1.992Step 2661/953240 - LR:0.0003 - train_loss: 1.972Step 2671/953240 - LR:0.0030 - train_loss: 1.972Step 2681/953240 - LR:0.0003 - train_loss: 1.969Step 2691/953240 - LR:0.0030 - train_loss: 1.967Step 2701/953240 - LR:0.0003 - train_loss: 1.959Step 2711/953240 - LR:0.0030 - train_loss: 1.987Step 2721/953240 - LR:0.0003 - train_loss: 2.006Step 2731/953240 - LR:0.0030 - train_loss: 1.962Step 2741/953240 - LR:0.0003 - train_loss: 1.945Step 2751/953240 - LR:0.0030 - train_loss: 1.953Step 2761/953240 - LR:0.0003 - train_loss: 1.956Step 2771/953240 - LR:0.0030 - train_loss: 1.957Step 2781/953240 - LR:0.0003 - train_loss: 1.976Step 2791/953240 - LR:0.0030 - train_loss: 1.977Step 2801/953240 - LR:0.0003 - train_loss: 1.939Step 2811/953240 - LR:0.0030 - train_loss: 1.996Step 2821/953240 - LR:0.0003 - train_loss: 1.986Step 2831/953240 - LR:0.0030 - train_loss: 1.996Step 2841/953240 - LR:0.0003 - train_loss: 1.979Step 2851/953240 - LR:0.0030 - train_loss: 1.976Step 2861/953240 - LR:0.0003 - train_loss: 1.943Step 2871/953240 - LR:0.0030 - train_loss: 1.968Step 2881/953240 - LR:0.0003 - train_loss: 2.000Step 2891/953240 - LR:0.0030 - train_loss: 1.988Step 2901/953240 - LR:0.0003 - train_loss: 1.971Step 2911/953240 - LR:0.0030 - train_loss: 1.945Step 2921/953240 - LR:0.0003 - train_loss: 1.958Step 2931/953240 - LR:0.0030 - train_loss: 1.975Step 2941/953240 - LR:0.0003 - train_loss: 1.979Step 2951/953240 - LR:0.0030 - train_loss: 1.960Step 2961/953240 - LR:0.0003 - train_loss: 1.977Step 2971/953240 - LR:0.0030 - train_loss: 1.978Step 2981/953240 - LR:0.0003 - train_loss: 1.970Step 2991/953240 - LR:0.0030 - train_loss: 1.951Step 3001/953240 - LR:0.0003 - train_loss: 1.971Step 3011/953240 - LR:0.0030 - train_loss: 1.961Step 3021/953240 - LR:0.0003 - train_loss: 1.933Step 3031/953240 - LR:0.0030 - train_loss: 1.959Step 3041/953240 - LR:0.0003 - train_loss: 1.967Step 3051/953240 - LR:0.0030 - train_loss: 1.996Step 3061/953240 - LR:0.0003 - train_loss: 1.972Step 3071/953240 - LR:0.0030 - train_loss: 1.961Step 3081/953240 - LR:0.0003 - train_loss: 1.970Step 3091/953240 - LR:0.0030 - train_loss: 1.973Step 3101/953240 - LR:0.0003 - train_loss: 1.969Step 3111/953240 - LR:0.0030 - train_loss: 1.961Step 3121/953240 - LR:0.0003 - train_loss: 1.967Step 3131/953240 - LR:0.0030 - train_loss: 1.962Step 3141/953240 - LR:0.0003 - train_loss: 1.975Step 3151/953240 - LR:0.0030 - train_loss: 1.964Step 3161/953240 - LR:0.0003 - train_loss: 1.973Step 3171/953240 - LR:0.0030 - train_loss: 1.961Step 3181/953240 - LR:0.0003 - train_loss: 1.956Step 3191/953240 - LR:0.0030 - train_loss: 1.986Step 3201/953240 - LR:0.0003 - train_loss: 1.975Step 3211/953240 - LR:0.0030 - train_loss: 1.934Step 3221/953240 - LR:0.0003 - train_loss: 1.966Step 3231/953240 - LR:0.0030 - train_loss: 1.946Step 3241/953240 - LR:0.0003 - train_loss: 1.976Step 3251/953240 - LR:0.0030 - train_loss: 1.978Step 3261/953240 - LR:0.0003 - train_loss: 1.945Step 3271/953240 - LR:0.0030 - train_loss: 1.981Step 3281/953240 - LR:0.0003 - train_loss: 1.950Step 3291/953240 - LR:0.0030 - train_loss: 1.950Step 3301/953240 - LR:0.0003 - train_loss: 1.942Step 3311/953240 - LR:0.0030 - train_loss: 1.957Step 3321/953240 - LR:0.0003 - train_loss: 1.953Step 3331/953240 - LR:0.0030 - train_loss: 1.948Step 3341/953240 - LR:0.0003 - train_loss: 1.945Step 3351/953240 - LR:0.0030 - train_loss: 1.976Step 3361/953240 - LR:0.0003 - train_loss: 1.960Step 3371/953240 - LR:0.0030 - train_loss: 1.954Step 3381/953240 - LR:0.0003 - train_loss: 1.948Step 3391/953240 - LR:0.0030 - train_loss: 1.957Step 3401/953240 - LR:0.0003 - train_loss: 1.971Step 3411/953240 - LR:0.0030 - train_loss: 1.985Step 3421/953240 - LR:0.0003 - train_loss: 1.936Step 3431/953240 - LR:0.0030 - train_loss: 1.923Step 3441/953240 - LR:0.0003 - train_loss: 1.993Step 3451/953240 - LR:0.0030 - train_loss: 1.963Step 3461/953240 - LR:0.0003 - train_loss: 1.920Step 3471/953240 - LR:0.0030 - train_loss: 1.965Step 3481/953240 - LR:0.0003 - train_loss: 1.940Step 3491/953240 - LR:0.0030 - train_loss: 1.948Step 3501/953240 - LR:0.0003 - train_loss: 1.953Step 3511/953240 - LR:0.0030 - train_loss: 1.983Step 3521/953240 - LR:0.0003 - train_loss: 1.941Step 3531/953240 - LR:0.0030 - train_loss: 1.975Step 3541/953240 - LR:0.0003 - train_loss: 1.952Step 3551/953240 - LR:0.0030 - train_loss: 1.914Step 3561/953240 - LR:0.0003 - train_loss: 1.940Step 3571/953240 - LR:0.0030 - train_loss: 1.956Step 3581/953240 - LR:0.0003 - train_loss: 1.977Step 3591/953240 - LR:0.0030 - train_loss: 1.954Step 3601/953240 - LR:0.0003 - train_loss: 1.966Step 3611/953240 - LR:0.0030 - train_loss: 1.967Step 3621/953240 - LR:0.0003 - train_loss: 1.962Step 3631/953240 - LR:0.0030 - train_loss: 1.968Step 3641/953240 - LR:0.0003 - train_loss: 1.953Step 3651/953240 - LR:0.0030 - train_loss: 1.954Step 3661/953240 - LR:0.0003 - train_loss: 1.972Step 3671/953240 - LR:0.0030 - train_loss: 1.944Step 3681/953240 - LR:0.0003 - train_loss: 1.932Step 3691/953240 - LR:0.0030 - train_loss: 1.942Step 3701/953240 - LR:0.0003 - train_loss: 1.908Step 3711/953240 - LR:0.0030 - train_loss: 1.966Step 3721/953240 - LR:0.0003 - train_loss: 1.916Step 3731/953240 - LR:0.0030 - train_loss: 1.957Step 3741/953240 - LR:0.0003 - train_loss: 1.938Step 3751/953240 - LR:0.0030 - train_loss: 1.958Step 3761/953240 - LR:0.0003 - train_loss: 1.958Step 3771/953240 - LR:0.0030 - train_loss: 1.969Step 3781/953240 - LR:0.0003 - train_loss: 1.925Step 3791/953240 - LR:0.0030 - train_loss: 1.963Step 3801/953240 - LR:0.0003 - train_loss: 1.946Step 3811/953240 - LR:0.0030 - train_loss: 1.958Step 3821/953240 - LR:0.0003 - train_loss: 1.939Step 3831/953240 - LR:0.0030 - train_loss: 1.930Step 3841/953240 - LR:0.0003 - train_loss: 1.952Step 3851/953240 - LR:0.0030 - train_loss: 1.941Step 3861/953240 - LR:0.0003 - train_loss: 1.933Step 3871/953240 - LR:0.0030 - train_loss: 1.928Step 3881/953240 - LR:0.0003 - train_loss: 1.963Step 3891/953240 - LR:0.0030 - train_loss: 1.972Step 3901/953240 - LR:0.0003 - train_loss: 1.943Step 3911/953240 - LR:0.0030 - train_loss: 1.920Step 3921/953240 - LR:0.0003 - train_loss: 1.947Step 3931/953240 - LR:0.0030 - train_loss: 1.946Step 3941/953240 - LR:0.0003 - train_loss: 1.960Step 3951/953240 - LR:0.0030 - train_loss: 1.940Step 3961/953240 - LR:0.0003 - train_loss: 1.974Step 3971/953240 - LR:0.0030 - train_loss: 1.950Step 3981/953240 - LR:0.0003 - train_loss: 1.960Step 3991/953240 - LR:0.0030 - train_loss: 1.934Step 4001/953240 - LR:0.0003 - train_loss: 1.955Step 4011/953240 - LR:0.0030 - train_loss: 1.949Step 4021/953240 - LR:0.0003 - train_loss: 1.936Step 4031/953240 - LR:0.0030 - train_loss: 1.947Step 4041/953240 - LR:0.0003 - train_loss: 1.954Step 4051/953240 - LR:0.0030 - train_loss: 1.956Step 4061/953240 - LR:0.0003 - train_loss: 1.932Step 4071/953240 - LR:0.0030 - train_loss: 1.943Step 4081/953240 - LR:0.0003 - train_loss: 1.945Step 4091/953240 - LR:0.0030 - train_loss: 1.934Step 4101/953240 - LR:0.0003 - train_loss: 1.955Step 4111/953240 - LR:0.0030 - train_loss: 1.975Step 4121/953240 - LR:0.0003 - train_loss: 1.980Step 4131/953240 - LR:0.0030 - train_loss: 1.959Step 4141/953240 - LR:0.0003 - train_loss: 1.924Step 4151/953240 - LR:0.0030 - train_loss: 1.980Step 4161/953240 - LR:0.0003 - train_loss: 1.937Step 4171/953240 - LR:0.0030 - train_loss: 1.935Step 4181/953240 - LR:0.0003 - train_loss: 1.936Step 4191/953240 - LR:0.0030 - train_loss: 1.948Step 4201/953240 - LR:0.0003 - train_loss: 1.944Step 4211/953240 - LR:0.0030 - train_loss: 1.940Step 4221/953240 - LR:0.0003 - train_loss: 1.948Step 4231/953240 - LR:0.0030 - train_loss: 1.946Step 4241/953240 - LR:0.0003 - train_loss: 1.912Step 4251/953240 - LR:0.0030 - train_loss: 1.946Step 4261/953240 - LR:0.0003 - train_loss: 1.919Step 4271/953240 - LR:0.0030 - train_loss: 1.933Step 4281/953240 - LR:0.0003 - train_loss: 1.950Step 4291/953240 - LR:0.0030 - train_loss: 1.948Step 4301/953240 - LR:0.0003 - train_loss: 1.917Step 4311/953240 - LR:0.0030 - train_loss: 1.936Step 4321/953240 - LR:0.0003 - train_loss: 1.945Step 4331/953240 - LR:0.0030 - train_loss: 1.967Step 4341/953240 - LR:0.0003 - train_loss: 1.909Step 4351/953240 - LR:0.0030 - train_loss: 1.931Step 4361/953240 - LR:0.0003 - train_loss: 1.940Step 4371/953240 - LR:0.0030 - train_loss: 1.963Step 4381/953240 - LR:0.0003 - train_loss: 1.947Step 4391/953240 - LR:0.0030 - train_loss: 1.918Step 4401/953240 - LR:0.0003 - train_loss: 1.946Step 4411/953240 - LR:0.0030 - train_loss: 1.949Step 4421/953240 - LR:0.0003 - train_loss: 1.935Step 4431/953240 - LR:0.0030 - train_loss: 1.969Step 4441/953240 - LR:0.0003 - train_loss: 1.957Step 4451/953240 - LR:0.0030 - train_loss: 1.959Step 4461/953240 - LR:0.0003 - train_loss: 1.907Step 4471/953240 - LR:0.0030 - train_loss: 1.937Step 4481/953240 - LR:0.0003 - train_loss: 1.908Step 4491/953240 - LR:0.0030 - train_loss: 1.930Step 4501/953240 - LR:0.0003 - train_loss: 1.941Step 4511/953240 - LR:0.0030 - train_loss: 1.942Step 4521/953240 - LR:0.0003 - train_loss: 1.936Step 4531/953240 - LR:0.0030 - train_loss: 1.907Step 4541/953240 - LR:0.0003 - train_loss: 1.874Step 4551/953240 - LR:0.0030 - train_loss: 1.909Step 4561/953240 - LR:0.0003 - train_loss: 1.896Step 4571/953240 - LR:0.0030 - train_loss: 1.904Step 4581/953240 - LR:0.0003 - train_loss: 1.906Step 4591/953240 - LR:0.0030 - train_loss: 1.887Step 4601/953240 - LR:0.0003 - train_loss: 1.949Step 4611/953240 - LR:0.0030 - train_loss: 1.898Step 4621/953240 - LR:0.0003 - train_loss: 1.933Step 4631/953240 - LR:0.0030 - train_loss: 1.963Step 4641/953240 - LR:0.0003 - train_loss: 1.922Step 4651/953240 - LR:0.0030 - train_loss: 1.933Step 4661/953240 - LR:0.0003 - train_loss: 1.894Step 4671/953240 - LR:0.0030 - train_loss: 1.952Step 4681/953240 - LR:0.0003 - train_loss: 1.927Step 4691/953240 - LR:0.0030 - train_loss: 1.907Step 4701/953240 - LR:0.0003 - train_loss: 1.922Step 4711/953240 - LR:0.0030 - train_loss: 1.974Step 4721/953240 - LR:0.0003 - train_loss: 1.946Step 4731/953240 - LR:0.0030 - train_loss: 1.929Step 4741/953240 - LR:0.0003 - train_loss: 1.942Step 4751/953240 - LR:0.0030 - train_loss: 1.939Step 4761/953240 - LR:0.0003 - train_loss: 1.964Step 4771/953240 - LR:0.0030 - train_loss: 1.957Step 4781/953240 - LR:0.0003 - train_loss: 1.960Step 4791/953240 - LR:0.0030 - train_loss: 1.948Step 4801/953240 - LR:0.0003 - train_loss: 1.927Step 4811/953240 - LR:0.0030 - train_loss: 1.919Step 4821/953240 - LR:0.0003 - train_loss: 1.941Step 4831/953240 - LR:0.0030 - train_loss: 1.932Step 4841/953240 - LR:0.0003 - train_loss: 1.960Step 4851/953240 - LR:0.0030 - train_loss: 1.924Step 4861/953240 - LR:0.0003 - train_loss: 1.951Step 4871/953240 - LR:0.0030 - train_loss: 1.939Step 4881/953240 - LR:0.0003 - train_loss: 1.957Step 4891/953240 - LR:0.0030 - train_loss: 1.914Step 4901/953240 - LR:0.0003 - train_loss: 1.926Step 4911/953240 - LR:0.0030 - train_loss: 1.943Step 4921/953240 - LR:0.0003 - train_loss: 1.917Step 4931/953240 - LR:0.0030 - train_loss: 1.935Step 4941/953240 - LR:0.0003 - train_loss: 1.908Step 4951/953240 - LR:0.0030 - train_loss: 1.923Step 4961/953240 - LR:0.0003 - train_loss: 1.957Step 4971/953240 - LR:0.0030 - train_loss: 1.961Step 4981/953240 - LR:0.0003 - train_loss: 1.957Step 4991/953240 - LR:0.0030 - train_loss: 1.912Step 5001/953240 - LR:0.0003 - train_loss: 1.908
exit train!
Once upon a time, Tom is very hungry, but he doesn't help the jug in need. One day, a foil came out from the feril. He really wanted to have the best time! Tom was so excited to have such a wonderful time helping her mom. So he and the fail had a great time helping their parents get to stay in the feril and make some yummy food. The end.
I like apple, but Lily loves always me. <r is very useful. He knows how to take a rns a good time. He also enjoyed his time rning a lot, like the ones thought he was very clever! He was very pleased and he made a promise to himself how much he loves his pet.
Once upon a time, there is a boy named lucius. He even had the best time ming the bell and he had a big smile on his face.
Once upon a time, there is a girl named xwx. One day, she was so excited to go around the store with her family. When she was done, she used all the things she wanted and they were so lovely to have. Everyone who was very impressed! The parents praised her for her a lot of love and decided to do something special to get home soon. The End.
I love the monkey, but my little friend is sad. Can't be the icing get in trouble?" the grumps and the searches for joy.
Once upon a time, there is a monkey named dada. That's the best occat for both." a voice from the back. The occasie asked the roosity to help her. She said yes and they all needed all their help. The rummo was a very sad and thanked dad for allowing her to manage it. The submother smiled and said, "You can do it, mom! You can do it all if you want." The raccasie was so happy and it took a long ride around the store to get more hugs and they all smiled.
Once upon a time, the sun is dimmed. He learned its lesson to have bigger." his sister listly, "y" a lot of love!"
Once upon a time, the water is dirty. She carefully correced all the matter them. Theirdd became a very good idea.
Once upon a time, sophia won the first prize in the competition. <|, the two lived happily ever since.
Once upon a time, there was a little girl named Lucy. She had a pet cat named Tom. He was very cute too! Tom was so happy. He thanked Lucy and gave her a big hug. Sam was so happy to be his friend. Now they can keep ob awheur frio his special pet.
Once upon a time, there was a little brown dog named Spot. She was so glad that she could mind her quirrrel lats.
Once upon a time, there was a little boy named Tom.
Once upon a time, there was a big whale. The juicy fish was very thrilled to make her patient fear so she helped her dad get the special ones that she could find in the future.
Once upon a time, there was a great day at the beach!
Once upon a time, she asked him how he was so cared about her heel to have it. The two children laughed and cheered when they noticed she'd done. They had a lot of fun discovering new things and she was very grateful for what they had had.
Once her body could be free.
Tim and Lily were playing in the park.
Tom had a coin that he liked very much. <oairy was right. He had followed the others, jud all around the world and was now my cafogic.
Tim and Mia like to play in the backyard.
Tom and Mia went to the zoo with Mom and Dad. <o was excited to see the animals eating guards. He worked very hard for them and was very good, for them. He also taught her how to be a part of the zoo, like how to squeeze the flowers and girabby?" "Yes, mom, I love the zoo best. And I love the animals," said Taking his eyes, that Cream is a popular chunk. The End
Anna liked to speak to her toys. <up was very important to the rest of your class.
Lily was playing with her doll in the garden. <find to the two friends were very happy that their daughter liked to be famous and keep her things safe. But one day, she got too close to the water. She soaked her fin and taking a bath because she was now wild and not enough. Lily felt foolish and told her mom that she should have a better life than ever. After the request, Lily's mom told her that she would be more careful and that she would take the time. And Lily knew that even though she was afraid she wouldn't be able to go to the park.
lucius likes to talk about politics. <other is very grateful for the help. She was sure she would love the chance one day her mommy showed her how to make the evengative water. She also was very alert to the lue and wanted to get home. The luc smiled and followed the lush voice at her request. It felt like she was balancing in her lue's skin. She felt so happy and satisfied that she ran away. But then her mommy had an idea. She asked the lush, "Lucy, why did you take things that are not our surceful?" Lily nodded and said, "Because I'm being so alert. Now, why don't I have to balance the pieces of the paper?" Her mommy said, "Well, I'm sure! I just want to have some fun and you can balance the paper, but she also knows how to balance it properly. It will be okay and you can hold it in time. And you have to go home now." Lily hugged her mommy and ran back to her room. She felt better and hugged her mommy. She said, "Thank you, mommy. You are very clever." "I'm glad you like my car, Lily. You are a good helper. The roof is not too big for you. It is for your dad and your son's drawing. And you have to ask for help when you need it. And you have to face her face again. And you have to be happy." Lily nodded and looked at her mom. She said, "I love you, mom. I love you. And I love my mom. And I love mom. And I love my mom too." Her mom smiled and said, "I love you too, Lily. You are a good girl too. I love you too."
sophia never eats breakfast.
Lucy tell a weird story. < a few days latering for a purcup. Lucy smiled to herself as she watched the movie with a big grin on her face. That night Lucy was so pleased that she got out of her own and read the story. She felt so proud that she had done a good job. The moral of the story is that it's important to enjoyable if you can trust someone.
Lucy and Lily are playing computer games. she was a very good helper and she was very thankful to her mom for taking her such a special rvition.
checkpoint saved!
Executing command >>>>
srun --pty -c 10 -p makkapakka --export=ALL --gpus=1 ./run.sh