Skip to content

KnowledgePerformance

James Edmondson edited this page Jul 10, 2018 · 24 revisions

Introduction

MADARA Knowledge Bases do not operate in isolation. Performance of distributed knowledge sharing between agents depends on operating system functions, compiler optimizations, and configuration of buffers and quality-of-service. In this wiki, we discuss some of the tools available in MADARA to gauge knowledge performance related to latency and throughput within a host (intrahost) and between hosts (interhost). To gain access to the tests mentioned in this wiki, you need to compile MADARA with the tests feature enabled with base_build.sh or the direct mwc.pl process.


Table of Contents


TLDR Summary

  • For intrahost performance, try to just use the same knowledge base between threads. The performance is orders of magnitude faster than using a network transport between processes on the operating system. This is true for all operating systems and architectures
  • For multi-process performance, the smaller the data packets, the more messages that can be transferred reliably between knowledge bases
  • Quality-of-service settings like TransportSettings::queue_length (the buffer size for the OS and transport layer to use) can be extremely important to performance. If possible, always try to use a queue_length that is big enough to hold at least 1s of max expected data throughput and possibly 5-10s if you want maximum throughput and reliability

Intrahost Performance

There are two major considerations for judging intrahost performance: 1) multi-threaded performance and 2) multi-processed performance. The former performance is mostly gated by time spent in OS critical sections but can also be affected by CPU load and memory latency. The latter is dictated and bottlenecked almost entirely by OS prioritization and handling of network protocols, file pipes, and sockets.


test_reasoning_throughput

The main test for multi-threaded performance can be found in $MADARA_ROOT/bin/test_reasoning_throughput. This test mainly tests function calls on the knowledge base and common data abstractions, such as Integer containers. Example final output for such a call is shown below.

Command: $MADARA_ROOT/bin/test_reasoning_throughput

Average time taken per rule evaluation was:
=========================================================================
 KaRL: Simple Increments           		                   348 ns
 KaRL: Multiple Increments         		                    84 ns
 KaRL: Simple Ternary Increments   		                   381 ns
 KaRL: Multiple Ternary Increments 		                   110 ns
 KaRL: Compiled Simple Increments  		                   207 ns
 KaRL: Compiled Multiple Inc       		                    80 ns
 KaRL: Compiled Simple Tern Inc    		                   224 ns
 KaRL: Compiled Multiple Tern Inc  		                   103 ns
 KaRL: Compiled Single Assign      		                   195 ns
 KaRL: Compiled Multiple Assign    		                    81 ns
 KaRL: Extern Function Call        		                   158 ns
 KaRL: Compiled Extern Inc Func    		                   234 ns
 KaRL: Compiled Extern Multi Calls 		                   105 ns
 KaRL: Looped Simple Increments    		                   185 ns
 KaRL: Optimized Loop              		                     0 ns
 KaRL: Looped Simple Ternary Inc   		                   196 ns
 KaRL: Looped Multiple Ternary Inc 		                   197 ns
 KaRL: Get Variable Reference      		                    54 ns
 KaRL: Get Expanded Reference      		                   640 ns
 KaRL: Normal Set Operation        		                   200 ns
 KaRL: Variable Reference Set      		                   145 ns
 KaRL: Variables Inc Var Ref       		                   208 ns
 KaRL container: Assignment        		                    51 ns
 KaRL container: Increments        		                    62 ns
 KaRL staged container: Assignment 		                     0 ns
 KaRL staged container: Increments 		                     0 ns
 C++: Optimized Assignments        		                     1 ns
 C++: Optimized Increments         		                     0 ns
 C++: Optimized Ternary Increments 		                     0 ns
 C++: Virtual Increments           		                     2 ns
 C++: Volatile Assignments         		                     0 ns
 C++: Volatile Increments          		                     1 ns
 C++: Volatile Ternary Increments  		                     1 ns
 C++: STL Atomic Increments        		                     6 ns
 C++: STL Recursive Increments     		                    21 ns
 C++: STL Mutex Increments         		                    20 ns
=========================================================================


Hertz for each test with 100000 iterations * 10 tests was:
=========================================================================
 KaRL: Simple Increments           		                 2.87 mhz
 KaRL: Multiple Increments         		                11.82 mhz
 KaRL: Simple Ternary Increments   		                 2.62 mhz
 KaRL: Multiple Ternary Increments 		                 9.07 mhz
 KaRL: Compiled Simple Increments  		                 4.82 mhz
 KaRL: Compiled Multiple Inc       		                12.45 mhz
 KaRL: Compiled Simple Tern Inc    		                 4.45 mhz
 KaRL: Compiled Multiple Tern Inc  		                 9.65 mhz
 KaRL: Compiled Single Assign      		                 5.11 mhz
 KaRL: Compiled Multiple Assign    		                12.28 mhz
 KaRL: Extern Function Call        		                 6.29 mhz
 KaRL: Compiled Extern Inc Func    		                 4.27 mhz
 KaRL: Compiled Extern Multi Calls 		                 9.46 mhz
 KaRL: Looped Simple Increments    		                 5.38 mhz
 KaRL: Optimized Loop              		               109.77 ghz
 KaRL: Looped Simple Ternary Inc   		                 5.10 mhz
 KaRL: Looped Multiple Ternary Inc 		                 5.06 mhz
 KaRL: Get Variable Reference      		                18.43 mhz
 KaRL: Get Expanded Reference      		                 1.56 mhz
 KaRL: Normal Set Operation        		                 4.98 mhz
 KaRL: Variable Reference Set      		                 6.85 mhz
 KaRL: Variables Inc Var Ref       		                 4.80 mhz
 KaRL container: Assignment        		                19.28 mhz
 KaRL container: Increments        		                15.89 mhz
 KaRL staged container: Assignment 		                 2.77 ghz
 KaRL staged container: Increments 		                 2.85 ghz
 C++: Optimized Assignments        		               722.60 mhz
 C++: Optimized Increments         		                 2.89 ghz
 C++: Optimized Ternary Increments 		                 2.87 ghz
 C++: Virtual Increments           		               411.79 mhz
 C++: Volatile Assignments         		                 1.45 ghz
 C++: Volatile Increments          		               527.58 mhz
 C++: Volatile Ternary Increments  		               530.15 mhz
 C++: STL Atomic Increments        		               160.61 mhz
 C++: STL Recursive Increments     		                46.64 mhz
 C++: STL Mutex Increments         		                48.18 mhz
=========================================================================

Takeaway: Intrahost Multi-threading performance can be in the megahertz (1M+ operations per second), and can even be this high when accessing data with the shared_ptr system for large data structures. Multi-threading is the best possible way to hit throughput and latency needs in mission-critical systems.


Intrahost network_profiler

The $MADARA_ROOT/bin/network_profiler tool can be used for testing most supported knowledge base transports including UDP unicast, broadcast, multicast, ZeroMQ, and DDS. The tool comes with built-in help (--help or -h options) and can be run on inter-and intra-process communication between knowledge bases on multiple hosts.

To run network_profiler on the same host for intrahost tests, open two terminals and launch the tool in each window. At least one network_profiler should be id 0 (-i 0, which is the publisher and default id), and at least one network_profiler should be not zero (e.g., i 1, which is a subscriber). The publisher will publish data of a user-specified size and frequency (default is to publish as fast as possible). The subscriber will receive data and post latency and throughput information for the configured QoS. This tool is very valuable to understand performance.

Below are some example runs on an Ubuntu 16.04 dedicated host OS for intrahost testing.


Intrahost Multicast Performance Small

Publisher: $MADARA_ROOT/bin/network_profiler
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1

Receiving for 60 s on UDP Multicast transport
Test: SUCCESS
Settings:
  Transport type: UDP Multicast
  Data size: 128 B
  Test time: 60 s
Latency:
  Min: 5586 ns
  Avg: 19676 ns
  Max: 880765 ns
Throughput:
  Messages received: 2392105
  Message rate: 39868.4 packets/s
  Data received: 306189440 B
  Data rate: 5.10316e+06 B/s

Intrahost Multicast Performance Medium (64KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 64000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1

Receiving for 60 s on UDP Multicast transport
Test: SUCCESS
Settings:
  Transport type: UDP Multicast
  Data size: 64000 B
  Test time: 60 s
Latency:
  Min: 50287 ns
  Avg: 146411 ns
  Max: 873667 ns
Throughput:
  Messages received: 49110
  Message rate: 818.5 packets/s
  Data received: 3143040000 B
  Data rate: 5.2384e+07 B/s

Intrahost Multicast Performance Large (500KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1

Receiving for 60 s on UDP Multicast transport
Subscriber received no data.
Test: FAIL.


Intrahost Multicast Performance Large (500KB) Deep (50MB buffer)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -q 50000000

Receiving for 60 s on UDP Multicast transport
Test: SUCCESS
Settings:
  Transport type: UDP Multicast
  Data size: 500000 B
  Test time: 60 s
Latency:
  Min: 4693015 ns
  Avg: 12484017 ns
  Max: 24725457 ns
Throughput:
  Messages received: 4662
  Message rate: 77.7 packets/s
  Data received: 2331000000 B
  Data rate: 3.885e+07 B/s


Intrahost Unicast Performance Small

Publisher: $MADARA_ROOT/bin/network_profiler -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001

Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
  Transport type: UDP
  Data size: 128 B
  Test time: 60 s
Latency:
  Min: 5353 ns
  Avg: 872228 ns
  Max: 3367879 ns
Throughput:
  Messages received: 9417890
  Message rate: 156965 packets/s
  Data received: 1205489920 B
  Data rate: 2.00915e+07 B/s

Intrahost Unicast Performance Medium (64KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 64000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001

Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
  Transport type: UDP
  Data size: 64000 B
  Test time: 60 s
Latency:
  Min: 36171 ns
  Avg: 153123 ns
  Max: 1124442 ns
Throughput:
  Messages received: 1800516
  Message rate: 30008.6 packets/s
  Data received: 115233024000 B
  Data rate: 1.92055e+09 B/s



Intrahost Unicast Performance Large (500KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001

Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
  Transport type: UDP
  Data size: 500000 B
  Test time: 60 s
Latency:
  Min: 183101 ns
  Avg: 282962 ns
  Max: 863708 ns
Throughput:
  Messages received: 267621
  Message rate: 4460.35 packets/s
  Data received: 133810500000 B
  Data rate: 2.23018e+09 B/s



Intrahost Unicast Performance Large (500KB) Deep (50MB buffer)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -q 50000000 -u 127.0.0.1:30001

Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
  Transport type: UDP
  Data size: 500000 B
  Test time: 60 s
Latency:
  Min: 184327 ns
  Avg: 292714 ns
  Max: 810309 ns
Throughput:
  Messages received: 253207
  Message rate: 4220.12 packets/s
  Data received: 126603500000 B
  Data rate: 2.11006e+09 B/s


Intrahost ZeroMQ TCP Performance Small

Publisher: $MADARA_ROOT/bin/network_profiler --zmq ipc:///tmp/network_profiler_0_0 --zmq ipc:///tmp/network_profiler_0_1
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 --zmq ipc:///tmp/network_profiler_0_1 --zmq ipc:///tmp/network_profiler_0_0

Receiving for 60 s on 0MQ transport
Test: SUCCESS
Settings:
  Transport type: 0MQ
  Data size: 128 B
  Test time: 60 s
Latency:
  Min: 190909 ns
  Avg: 5656424 ns
  Max: 12335760 ns
Throughput:
  Messages received: 15594000
  Message rate: 259900 packets/s
  Data received: 1996032000 B
  Data rate: 3.32672e+07 B/s

Intrahost ZeroMQ TCP Performance Medium (64KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 64000 --zmq ipc:///tmp/network_profiler_1_0 --zmq ipc:///tmp/network_profiler_1_1
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 --zmq ipc:///tmp/network_profiler_1_1 --zmq ipc:///tmp/network_profiler_1_0

Receiving for 60 s on 0MQ transport
Test: SUCCESS
Settings:
  Transport type: 0MQ
  Data size: 64000 B
  Test time: 60 s
Latency:
  Min: 169837 ns
  Avg: 12425523 ns
  Max: 29418164 ns
Throughput:
  Messages received: 3170000
  Message rate: 52833.3 packets/s
  Data received: 202880000000 B
  Data rate: 3.38133e+09 B/s

Intrahost ZeroMQ TCP Performance Large (500KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 --zmq ipc:///tmp/network_profiler_2_0 --zmq ipc:///tmp/network_profiler_2_1
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 --zmq ipc:///tmp/network_profiler_2_1 --zmq ipc:///tmp/network_profiler_2_0

Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
  Transport type: 0MQ
  Data size: 500000 B
  Test time: 60 s
Latency:
  Min: 196656 ns
  Avg: 49504619 ns
  Max: 118398998 ns
Throughput:
  Messages received: 691040
  Message rate: 11517.3 packets/s
  Data received: 345520000000 B
  Data rate: 5.75867e+09 B/s

Intrahost ZeroMQ TCP Performance Large (500KB) Deep (50MB buffer)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000 --zmq ipc:///tmp/network_profiler_3_0 --zmq ipc:///tmp/network_profiler_3_1
Subscriber: $MADARA_ROOT/bin/network_profiler -q 50000000 -i 1 --zmq ipc:///tmp/network_profiler_3_1 --zmq ipc:///tmp/network_profiler_3_0

Receiving for 60 s on 0MQ transport
Test: SUCCESS
Settings:
  Transport type: 0MQ
  Data size: 500000 B
  Test time: 60 s
Latency:
  Min: 199942 ns
  Avg: 60923178 ns
  Max: 107387418 ns
Throughput:
  Messages received: 684496
  Message rate: 11408.3 packets/s
  Data received: 342248000000 B
  Data rate: 5.70413e+09 B/s

Intrahost ZeroMQ IPC Performance Small

Publisher: $MADARA_ROOT/bin/network_profiler --zmq ipc:///tmp/network_profiler_0_0 --zmq ipc:///tmp/network_profiler_0_1
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 --zmq ipc:///tmp/network_profiler_0_1 --zmq ipc:///tmp/network_profiler_0_0

Receiving for 60 s on 0MQ transport
Test: SUCCESS
Settings:
  Transport type: 0MQ
  Data size: 128 B
  Test time: 60 s
Latency:
  Min: 190909 ns
  Avg: 5656424 ns
  Max: 12335760 ns
Throughput:
  Messages received: 15594000
  Message rate: 259900 packets/s
  Data received: 1996032000 B
  Data rate: 3.32672e+07 B/s

Intrahost ZeroMQ IPC Performance Medium (64KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 64000 --zmq ipc:///tmp/network_profiler_1_0 --zmq ipc:///tmp/network_profiler_1_1
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 --zmq ipc:///tmp/network_profiler_1_1 --zmq ipc:///tmp/network_profiler_1_0

Receiving for 60 s on 0MQ transport
Test: SUCCESS
Settings:
  Transport type: 0MQ
  Data size: 64000 B
  Test time: 60 s
Latency:
  Min: 169837 ns
  Avg: 12425523 ns
  Max: 29418164 ns
Throughput:
  Messages received: 3170000
  Message rate: 52833.3 packets/s
  Data received: 202880000000 B
  Data rate: 3.38133e+09 B/s

Intrahost ZeroMQ IPC Performance Large (500KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 --zmq ipc:///tmp/network_profiler_2_0 --zmq ipc:///tmp/network_profiler_2_1
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 --zmq ipc:///tmp/network_profiler_2_1 --zmq ipc:///tmp/network_profiler_2_0

Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
  Transport type: 0MQ
  Data size: 500000 B
  Test time: 60 s
Latency:
  Min: 196656 ns
  Avg: 49504619 ns
  Max: 118398998 ns
Throughput:
  Messages received: 691040
  Message rate: 11517.3 packets/s
  Data received: 345520000000 B
  Data rate: 5.75867e+09 B/s

Intrahost ZeroMQ IPC Performance Large (500KB) Deep (50MB buffer)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000 --zmq ipc:///tmp/network_profiler_3_0 --zmq ipc:///tmp/network_profiler_3_1
Subscriber: $MADARA_ROOT/bin/network_profiler -q 50000000 -i 1 --zmq ipc:///tmp/network_profiler_3_1 --zmq ipc:///tmp/network_profiler_3_0

Receiving for 60 s on 0MQ transport
Test: SUCCESS
Settings:
  Transport type: 0MQ
  Data size: 500000 B
  Test time: 60 s
Latency:
  Min: 199942 ns
  Avg: 60923178 ns
  Max: 107387418 ns
Throughput:
  Messages received: 684496
  Message rate: 11408.3 packets/s
  Data received: 342248000000 B
  Data rate: 5.70413e+09 B/s

Intrahost Summary

  • Both UDP unicast and multicast can be used for intraprocess communication, but unicast tends to be better latency and throughput due to the copy cost of multicast as implemented by the operating system
  • There is no real comparison between multi-threaded performance and networked multi-process performance. Use multi-threading with a single knowledge base wherever possible for maximum performance. This gets even more drastic in performance difference as you cross the UDP datagram boundary (64KB)

Interhost Performance

Interhost performance focuses on the capability of the operating system and network to handle knowledge sharing between knowledge bases on two or more hosts. Interhost performance is facilitated by knowledge transports such as UDP unicast, broadcast, multicast, DDS, and ZeroMQ.


Interhost network_profiler

The $MADARA_ROOT/bin/network_profiler tool can be used for testing most supported knowledge base transports including UDP unicast, broadcast, multicast, ZeroMQ, and DDS. The tool comes with built-in help (--help or -h options) and can be run on inter-and intra-process communication between knowledge bases on multiple hosts.

To run network_profiler on two hosts for intrahost tests, open one terminal on each host and launch the network_profiler tool in each terminal window. At least one network_profiler should be id 0 (-i 0, which is the publisher and default id), and at least one network_profiler should be not zero (e.g., i 1, which is a subscriber). The publisher will publish data of a user-specified size and frequency (default is to publish as fast as possible). The subscriber will receive data and post latency and throughput information for the configured QoS. This tool is very valuable to understand performance.

Below are some example runs on an Ubuntu 16.04 Virtual Machine for interhost testing.


Interhost Multicast Performance Small

Publisher: $MADARA_ROOT/bin/network_profiler
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1



Interhost Multicast Performance Medium (64KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 64000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1




Interhost Multicast Performance Large (500KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1




Interhost Multicast Performance Large (500KB) Deep (50MB buffer)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -q 50000000




Interhost Unicast Performance Small

Publisher: $MADARA_ROOT/bin/network_profiler -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001




Interhost Unicast Performance Medium (64KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 64000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001




Interhost Unicast Performance Large (500KB)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001




Interhost Unicast Performance Large (500KB) Deep (50MB buffer)

Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -q 50000000 -u 127.0.0.1:30001




Interhost Summary

  • Both UDP unicast and multicast can be used for intraprocess communication, but unicast tends to be better latency and throughput due to the copy cost of multicast as implemented by the operating system
  • There is no real comparison between multi-threaded performance and networked multi-process performance. Use multi-threading with a single knowledge base wherever possible for maximum performance. This gets even more drastic in performance difference as you cross the UDP datagram boundary (64KB)

More Information

For performance related tuning, you may want to check out the OptimizingKarl Wiki