Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

at() method was rewritten in C. #145

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

at() method was rewritten in C. #145

wants to merge 1 commit into from

Conversation

naitoh
Copy link
Collaborator

@naitoh naitoh commented Oct 1, 2019

I rewrote the at() method using C language with this pull request.
Please confirm.

This is a pull request with the same content as ruby-numo/numo-narray#138.

The result is up to x21.8 times faster.

cumo benchmark

$ benchmark-driver diagonal_at_fp32_diff.yaml
Calculating -------------------------------------
                                             cumo 0.4.3  cumo 0.5.0 
a.at(Array, Array)                              159.340      1.523k i/s -      1.000k times in 6.275874s 0.656575s
b.at(Cumo::Int32(X).seq, Cumo::Int32(X).seq)    250.335      5.457k i/s -      1.000k times in 3.994653s 0.183240s

Comparison:
             a.at(Array, Array)                          
                                  cumo 0.5.0:      1523.1 i/s 
                                  cumo 0.4.3:       159.3 i/s - 9.56x  slower

             b.at(Cumo::Int32(X).seq, Cumo::Int32(X).seq)
                                  cumo 0.5.0:      5457.3 i/s 
                                  cumo 0.4.3:       250.3 i/s - 21.80x  slower

cumo benchmark code

$ cat diagonal_at_fp32_diff.yaml
contexts:
  - gems: { cumo: 0.4.3 }
    require: false
    prelude: |
      require 'cumo/narray'
  - gems: { cumo: 0.5.0 }
    require: false
    prelude: |
      require 'cumo/narray'
loop_count: 1000
prelude: |
  require 'cumo/narray'
  X = 10000
  a = Cumo::SFloat.new(X, X).seq(0)
  b = Cumo::SFloat.new(X, X).seq(0)
  x = Cumo::Int32.new(X).seq.to_a   # Array Index
  y = Cumo::Int32.new(X).seq        # Cumo  Index
  sleep 30

benchmark:
  'a.at(Array, Array)                          '      : a.at(x, x)
  'b.at(Cumo::Int32(X).seq, Cumo::Int32(X).seq)'      : b.at(y, y)

red-chainer benchmark

1 epoch

  • cumo 0.4.3 & red-chainer : 11.9917 sec
  • cumo 0.4.3 & red-chainer patch : 15.0544 sec (slow)
  • cumo 0.5.0 & red-chainer patch : 7.49423 sec (x1.6 times faster)

20 epoch

  • cumo 0.4.3 & red-chainer : 267.318 sec
  • cumo 0.4.3 & red-chainer patch : 315.147 sec (slow)
  • cumo 0.5.0 & red-chainer patch : 162.756 sec (x1.64 times faster)

cumo 0.4.3 & red-chainer

$ ruby /home/naitoh/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/red-chainer-0.4.1/examples/mnist/mnist.rb  -gpu 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.190761    0.100618              0.941867       0.9706                    11.9917       
2           0.0748588   0.0785425             0.976449       0.976                     24.0079       
3           0.0510539   0.0706977             0.983498       0.9794                    37.6504       
4           0.0352464   0.0739257             0.988898       0.9792                    51.6359       
5           0.0292694   0.0786618             0.990448       0.9786                    65.7654       
6           0.0235765   0.0791803             0.991965       0.9811                    79.3966       
7           0.0202646   0.0802431             0.993048       0.9792                    93.043        
8           0.0178057   0.08063               0.994199       0.9803                    106.871       
9           0.0149522   0.0946839             0.994782       0.9781                    120.783       
10          0.0135423   0.095245              0.995582       0.9817                    134.396       
11          0.0137168   0.115822              0.995765       0.9766                    147.898       
12          0.0177419   0.10042               0.994782       0.9809                    161.486       
13          0.0109511   0.0905653             0.996515       0.981                     174.801       
14          0.0128614   0.0994406             0.996015       0.9808                    188.122       
15          0.00937718  0.0889525             0.997266       0.9827                    201.41        
16          0.00660626  0.106492              0.998132       0.9824                    214.687       
17          0.0115017   0.104797              0.996616       0.9819                    227.905       
18          0.015004    0.101159              0.995632       0.9815                    241.306       
19          0.0071712   0.0883342             0.997732       0.9848                    254.264       
20          0.00715317  0.0924275             0.998082       0.9845                    267.318   

cumo 0.4.3 & red-chainer patch (replace from to_a to at() method.)

$ ruby /home/naitoh/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/red-chainer-0.4.1/examples/mnist/mnist.rb  -gpu 0
GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.190813    0.0878515             0.9423         0.9736                    15.0544       
2           0.0736858   0.0739383             0.977383       0.9767                    30.6157       
3           0.0493456   0.0701066             0.984165       0.9801                    46.6689       
4           0.0356385   0.0748536             0.988565       0.9778                    62.4075       
5           0.0299261   0.0756134             0.990282       0.9799                    78.0029       
6           0.0222692   0.0754724             0.992731       0.9806                    93.5294       
7           0.0203081   0.110029              0.993448       0.9749                    109.225       
8           0.0176529   0.0795398             0.994582       0.9812                    124.921       
9           0.01684     0.0765245             0.994715       0.9828                    140.777       
10          0.0171627   0.0796931             0.994615       0.9818                    156.823       
11          0.0138622   0.0975792             0.995432       0.9809                    172.766       
12          0.0139532   0.0924991             0.995699       0.9806                    188.383       
13          0.00907505  0.101457              0.996949       0.9822                    204.072       
14          0.00982056  0.0851002             0.996882       0.9827                    219.74        
15          0.0135048   0.116922              0.996382       0.9771                    235.661       
16          0.0103016   0.0916616             0.997066       0.9822                    251.905       
17          0.00553294  0.0957612             0.998149       0.9831                    267.832       
18          0.0144366   0.0891243             0.996066       0.9825                    283.717       
19          0.0066627   0.0833053             0.99805        0.9856                    299.417       
20          0.00939514  0.0947659             0.997266       0.9839                    315.147 

cumo 0.5.0 & red-chainer patch (replace from to_a to at() method.)

GPU: 0
# unit: 1000
# Minibatch-size: 100
# epoch: 20

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.192961    0.0995875             0.941716       0.9687                    7.49423       
2           0.0743639   0.089644              0.977199       0.9742                    15.2887       
3           0.050121    0.0650511             0.983916       0.9806                    23.5483       
4           0.035414    0.0653418             0.988781       0.9811                    31.6473       
5           0.0268108   0.104871              0.991181       0.9734                    38.2204       
6           0.0227885   0.0784725             0.992765       0.9802                    46.1436       
7           0.0220536   0.08113               0.992932       0.9794                    54.0929       
8           0.0179059   0.099458              0.994449       0.9763                    62.1249       
9           0.015757    0.0838891             0.994515       0.9806                    70.0504       
10          0.0143032   0.0926978             0.995415       0.9813                    78.0471       
11          0.0133268   0.0882717             0.995715       0.9794                    86.431        
12          0.014313    0.0871509             0.995499       0.9807                    95.0837       
13          0.0114782   0.0762304             0.996516       0.9848                    103.539       
14          0.010762    0.0889503             0.996732       0.9827                    111.849       
15          0.0103842   0.0969899             0.997016       0.9815                    120.288       
16          0.0110824   0.0930894             0.996815       0.982                     128.688       
17          0.00883565  0.10691               0.997415       0.982                     137.201       
18          0.0100846   0.116562              0.997149       0.9802                    145.888       
19          0.0112479   0.111252              0.996749       0.9806                    154.257       
20          0.00860258  0.112223              0.997482       0.9813                    162.756

Red-chainer Patch : replace from to_a to at() method.

$ diff lib/chainer/functions/loss/softmax_cross_entropy.rb_org  lib/chainer/functions/loss/softmax_cross_entropy.rb -u
--- lib/chainer/functions/loss/softmax_cross_entropy.rb_org	2019-10-01 00:05:38.262983104 +0000
+++ lib/chainer/functions/loss/softmax_cross_entropy.rb	2019-10-01 00:06:53.986914527 +0000
@@ -119,7 +119,8 @@
           if y.ndim == 2
             gx = y
             # TODO(sonots): Avoid to_a especially in Cumo to improve performance
-            t.class.new(t.shape[0]).seq(0).to_a.zip(t.class.maximum(t, 0).to_a).each{|v| gx[*v] -= 1}
+#            t.class.new(t.shape[0]).seq(0).to_a.zip(t.class.maximum(t, 0).to_a).each{|v| gx[*v] -= 1}
+            gx.at(t.class.new(t.shape[0]).seq, t.class.maximum(t, 0)).inplace - 1
 
             if @class_weight
               shape = x.ndim.times.map { |d| d == 1 ? true : 1 }
@@ -141,7 +142,8 @@
             fst_index = xm::Int32.new(t.size).seq(0) / n_unit
             trd_index = xm::Int32.new(t.size).seq(0) % n_unit
             # TODO(sonots): Avoid to_a especially in Cumo to improve performance
-            fst_index.to_a.zip(t.class.maximum(t.flatten.dup, 0).to_a, trd_index.to_a).each{|v| gx[*v] -= 1}
+#            fst_index.to_a.zip(t.class.maximum(t.flatten.dup, 0).to_a, trd_index.to_a).each{|v| gx[*v] -= 1}
+            gx.at(fst_index, t.class.maximum(t.flatten.dup, 0), trd_index).inplace - 1
             if @class_weight
               shape = x.ndim.times.map{|d| d == 1 ? true : 1}
               c = Chainer::Utils::Array.broadcast_to(@class_weight.reshape(*shape), x.shape)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant