at() method was rewritten in C. #145

naitoh · 2019-10-01T00:58:45Z

I rewrote the at() method using C language with this pull request.
Please confirm.

This is a pull request with the same content as ruby-numo/numo-narray#138.

The result is up to x21.8 times faster.

cumo benchmark

0.5.0 : this PR(556c53a)

$ benchmark-driver diagonal_at_fp32_diff.yaml
Calculating -------------------------------------
                                             cumo 0.4.3  cumo 0.5.0 
a.at(Array, Array)                              159.340      1.523k i/s -      1.000k times in 6.275874s 0.656575s
b.at(Cumo::Int32(X).seq, Cumo::Int32(X).seq)    250.335      5.457k i/s -      1.000k times in 3.994653s 0.183240s

Comparison:
             a.at(Array, Array)                          
                                  cumo 0.5.0:      1523.1 i/s 
                                  cumo 0.4.3:       159.3 i/s - 9.56x  slower

             b.at(Cumo::Int32(X).seq, Cumo::Int32(X).seq)
                                  cumo 0.5.0:      5457.3 i/s 
                                  cumo 0.4.3:       250.3 i/s - 21.80x  slower

cumo benchmark code

$ cat diagonal_at_fp32_diff.yaml
contexts:
  - gems: { cumo: 0.4.3 }
    require: false
    prelude: |
      require 'cumo/narray'
  - gems: { cumo: 0.5.0 }
    require: false
    prelude: |
      require 'cumo/narray'
loop_count: 1000
prelude: |
  require 'cumo/narray'
  X = 10000
  a = Cumo::SFloat.new(X, X).seq(0)
  b = Cumo::SFloat.new(X, X).seq(0)
  x = Cumo::Int32.new(X).seq.to_a   # Array Index
  y = Cumo::Int32.new(X).seq        # Cumo  Index
  sleep 30

benchmark:
  'a.at(Array, Array)                          '      : a.at(x, x)
  'b.at(Cumo::Int32(X).seq, Cumo::Int32(X).seq)'      : b.at(y, y)

red-chainer benchmark

1 epoch

cumo 0.4.3 & red-chainer : 11.9917 sec
cumo 0.4.3 & red-chainer patch : 15.0544 sec (slow)
cumo 0.5.0 & red-chainer patch : 7.49423 sec (x1.6 times faster)

20 epoch

cumo 0.4.3 & red-chainer : 267.318 sec
cumo 0.4.3 & red-chainer patch : 315.147 sec (slow)
cumo 0.5.0 & red-chainer patch : 162.756 sec (x1.64 times faster)

cumo 0.4.3 & red-chainer

$ ruby /home/naitoh/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/red-chainer-0.4.1/examples/mnist/mnist.rb  -gpu 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.190761    0.100618              0.941867       0.9706                    11.9917       
2           0.0748588   0.0785425             0.976449       0.976                     24.0079       
3           0.0510539   0.0706977             0.983498       0.9794                    37.6504       
4           0.0352464   0.0739257             0.988898       0.9792                    51.6359       
5           0.0292694   0.0786618             0.990448       0.9786                    65.7654       
6           0.0235765   0.0791803             0.991965       0.9811                    79.3966       
7           0.0202646   0.0802431             0.993048       0.9792                    93.043        
8           0.0178057   0.08063               0.994199       0.9803                    106.871       
9           0.0149522   0.0946839             0.994782       0.9781                    120.783       
10          0.0135423   0.095245              0.995582       0.9817                    134.396       
11          0.0137168   0.115822              0.995765       0.9766                    147.898       
12          0.0177419   0.10042               0.994782       0.9809                    161.486       
13          0.0109511   0.0905653             0.996515       0.981                     174.801       
14          0.0128614   0.0994406             0.996015       0.9808                    188.122       
15          0.00937718  0.0889525             0.997266       0.9827                    201.41        
16          0.00660626  0.106492              0.998132       0.9824                    214.687       
17          0.0115017   0.104797              0.996616       0.9819                    227.905       
18          0.015004    0.101159              0.995632       0.9815                    241.306       
19          0.0071712   0.0883342             0.997732       0.9848                    254.264       
20          0.00715317  0.0924275             0.998082       0.9845                    267.318

cumo 0.4.3 & red-chainer patch (replace from to_a to at() method.)

$ ruby /home/naitoh/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/red-chainer-0.4.1/examples/mnist/mnist.rb  -gpu 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.190813    0.0878515             0.9423         0.9736                    15.0544       
2           0.0736858   0.0739383             0.977383       0.9767                    30.6157       
3           0.0493456   0.0701066             0.984165       0.9801                    46.6689       
4           0.0356385   0.0748536             0.988565       0.9778                    62.4075       
5           0.0299261   0.0756134             0.990282       0.9799                    78.0029       
6           0.0222692   0.0754724             0.992731       0.9806                    93.5294       
7           0.0203081   0.110029              0.993448       0.9749                    109.225       
8           0.0176529   0.0795398             0.994582       0.9812                    124.921       
9           0.01684     0.0765245             0.994715       0.9828                    140.777       
10          0.0171627   0.0796931             0.994615       0.9818                    156.823       
11          0.0138622   0.0975792             0.995432       0.9809                    172.766       
12          0.0139532   0.0924991             0.995699       0.9806                    188.383       
13          0.00907505  0.101457              0.996949       0.9822                    204.072       
14          0.00982056  0.0851002             0.996882       0.9827                    219.74        
15          0.0135048   0.116922              0.996382       0.9771                    235.661       
16          0.0103016   0.0916616             0.997066       0.9822                    251.905       
17          0.00553294  0.0957612             0.998149       0.9831                    267.832       
18          0.0144366   0.0891243             0.996066       0.9825                    283.717       
19          0.0066627   0.0833053             0.99805        0.9856                    299.417       
20          0.00939514  0.0947659             0.997266       0.9839                    315.147

cumo 0.5.0 & red-chainer patch (replace from to_a to at() method.)

GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.192961    0.0995875             0.941716       0.9687                    7.49423       
2           0.0743639   0.089644              0.977199       0.9742                    15.2887       
3           0.050121    0.0650511             0.983916       0.9806                    23.5483       
4           0.035414    0.0653418             0.988781       0.9811                    31.6473       
5           0.0268108   0.104871              0.991181       0.9734                    38.2204       
6           0.0227885   0.0784725             0.992765       0.9802                    46.1436       
7           0.0220536   0.08113               0.992932       0.9794                    54.0929       
8           0.0179059   0.099458              0.994449       0.9763                    62.1249       
9           0.015757    0.0838891             0.994515       0.9806                    70.0504       
10          0.0143032   0.0926978             0.995415       0.9813                    78.0471       
11          0.0133268   0.0882717             0.995715       0.9794                    86.431        
12          0.014313    0.0871509             0.995499       0.9807                    95.0837       
13          0.0114782   0.0762304             0.996516       0.9848                    103.539       
14          0.010762    0.0889503             0.996732       0.9827                    111.849       
15          0.0103842   0.0969899             0.997016       0.9815                    120.288       
16          0.0110824   0.0930894             0.996815       0.982                     128.688       
17          0.00883565  0.10691               0.997415       0.982                     137.201       
18          0.0100846   0.116562              0.997149       0.9802                    145.888       
19          0.0112479   0.111252              0.996749       0.9806                    154.257       
20          0.00860258  0.112223              0.997482       0.9813                    162.756

Red-chainer Patch : replace from to_a to at() method.

Same to [WIP] Replaced to_a processing to improve Cumo's performance. red-data-tools/red-chainer#76

$ diff lib/chainer/functions/loss/softmax_cross_entropy.rb_org  lib/chainer/functions/loss/softmax_cross_entropy.rb -u
--- lib/chainer/functions/loss/softmax_cross_entropy.rb_org	2019-10-01 00:05:38.262983104 +0000
+++ lib/chainer/functions/loss/softmax_cross_entropy.rb	2019-10-01 00:06:53.986914527 +0000
@@ -119,7 +119,8 @@
           if y.ndim == 2
             gx = y
             # TODO(sonots): Avoid to_a especially in Cumo to improve performance
-            t.class.new(t.shape[0]).seq(0).to_a.zip(t.class.maximum(t, 0).to_a).each{|v| gx[*v] -= 1}
+#            t.class.new(t.shape[0]).seq(0).to_a.zip(t.class.maximum(t, 0).to_a).each{|v| gx[*v] -= 1}
+            gx.at(t.class.new(t.shape[0]).seq, t.class.maximum(t, 0)).inplace - 1
 
             if @class_weight
               shape = x.ndim.times.map { |d| d == 1 ? true : 1 }
@@ -141,7 +142,8 @@
             fst_index = xm::Int32.new(t.size).seq(0) / n_unit
             trd_index = xm::Int32.new(t.size).seq(0) % n_unit
             # TODO(sonots): Avoid to_a especially in Cumo to improve performance
-            fst_index.to_a.zip(t.class.maximum(t.flatten.dup, 0).to_a, trd_index.to_a).each{|v| gx[*v] -= 1}
+#            fst_index.to_a.zip(t.class.maximum(t.flatten.dup, 0).to_a, trd_index.to_a).each{|v| gx[*v] -= 1}
+            gx.at(fst_index, t.class.maximum(t.flatten.dup, 0), trd_index).inplace - 1
             if @class_weight
               shape = x.ndim.times.map{|d| d == 1 ? true : 1}
               c = Chainer::Utils::Array.broadcast_to(@class_weight.reshape(*shape), x.shape)

at() method was rewritten in C.

556c53a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

at() method was rewritten in C. #145

at() method was rewritten in C. #145

naitoh commented Oct 1, 2019

at() method was rewritten in C. #145

Are you sure you want to change the base?

at() method was rewritten in C. #145

Conversation

naitoh commented Oct 1, 2019

cumo benchmark

cumo benchmark code

red-chainer benchmark

1 epoch

20 epoch

cumo 0.4.3 & red-chainer

cumo 0.4.3 & red-chainer patch (replace from to_a to at() method.)

cumo 0.5.0 & red-chainer patch (replace from to_a to at() method.)

Red-chainer Patch : replace from to_a to at() method.