-
Notifications
You must be signed in to change notification settings - Fork 19
KnowledgePerformance
MADARA Knowledge Bases do not operate in isolation. Performance of distributed knowledge sharing between agents depends on operating system functions, compiler optimizations, and configuration of buffers and quality-of-service. In this wiki, we discuss some of the tools available in MADARA to gauge knowledge performance related to latency and throughput within a host (intrahost) and between hosts (interhost). To gain access to the tests mentioned in this wiki, you need to compile MADARA with the tests
feature enabled with base_build.sh
or the direct mwc.pl
process.
- Introduction
- Table of Contents
- TLDR Summary
-
Intrahost Performance
- test_reasoning_throughput
-
Intrahost network_profiler
- Intrahost Multicast Performance Small
- Intrahost Multicast Performance Medium (64KB)
- Intrahost Multicast Performance Large (500KB)
- Intrahost Multicast Performance Large (500KB) Deep (50MB buffer)
- Intrahost Unicast Performance Small
- Intrahost Unicast Performance Medium (64KB)
- Intrahost Unicast Performance Large (500KB)
- Intrahost Unicast Performance Large (500KB) Deep (50MB buffer)
- Intrahost Summary
- Interhost Performance
- For intrahost performance, try to just use the same knowledge base between threads. The performance is orders of magnitude faster than using a network transport between processes on the operating system. This is true for all operating systems and architectures
- For multi-process performance, the smaller the data packets, the more messages that can be transferred reliably between knowledge bases
- Quality-of-service settings like TransportSettings::queue_length (the buffer size for the OS and transport layer to use) can be extremely important to performance. If possible, always try to use a queue_length that is big enough to hold at least 1s of max expected data throughput and possibly 5-10s if you want maximum throughput and reliability
There are two major considerations for judging intrahost performance: 1) multi-threaded performance and 2) multi-processed performance. The former performance is mostly gated by time spent in OS critical sections but can also be affected by CPU load and memory latency. The latter is dictated and bottlenecked almost entirely by OS prioritization and handling of network protocols, file pipes, and sockets.
The main test for multi-threaded performance can be found in $MADARA_ROOT/bin/test_reasoning_throughput
. This test mainly tests function calls on the knowledge base and common data abstractions, such as Integer containers. Example final output for such a call is shown below.
Command: $MADARA_ROOT/bin/test_reasoning_throughput
Average time taken per rule evaluation was:
=========================================================================
KaRL: Simple Increments 1067 ns
KaRL: Multiple Increments 258 ns
KaRL: Simple Ternary Increments 1140 ns
KaRL: Multiple Ternary Increments 348 ns
KaRL: Compiled Simple Increments 584 ns
KaRL: Compiled Multiple Inc 250 ns
KaRL: Compiled Simple Tern Inc 628 ns
KaRL: Compiled Multiple Tern Inc 335 ns
KaRL: Compiled Single Assign 544 ns
KaRL: Compiled Multiple Assign 262 ns
KaRL: Extern Function Call 426 ns
KaRL: Compiled Extern Inc Func 637 ns
KaRL: Compiled Extern Multi Calls 338 ns
KaRL: Looped Simple Increments 555 ns
KaRL: Optimized Loop 0 ns
KaRL: Looped Simple Ternary Inc 608 ns
KaRL: Looped Multiple Ternary Inc 619 ns
KaRL: Get Variable Reference 214 ns
KaRL: Get Expanded Reference 2177 ns
KaRL: Normal Set Operation 564 ns
KaRL: Variable Reference Set 353 ns
KaRL: Variables Inc Var Ref 574 ns
KaRL container: Assignment 97 ns
KaRL container: Increments 138 ns
KaRL staged container: Assignment 0 ns
KaRL staged container: Increments 0 ns
C++: Optimized Assignments 4 ns
C++: Optimized Increments 0 ns
C++: Optimized Ternary Increments 0 ns
C++: Virtual Increments 13 ns
C++: Volatile Assignments 3 ns
C++: Volatile Increments 1 ns
C++: Volatile Ternary Increments 1 ns
C++: STL Atomic Increments 5 ns
C++: STL Recursive Increments 35 ns
C++: STL Mutex Increments 28 ns
=========================================================================
Hertz for each test with 100000 iterations * 10 tests was:
=========================================================================
KaRL: Simple Increments 936.76 khz
KaRL: Multiple Increments 3.86 mhz
KaRL: Simple Ternary Increments 877.13 khz
KaRL: Multiple Ternary Increments 2.87 mhz
KaRL: Compiled Simple Increments 1.71 mhz
KaRL: Compiled Multiple Inc 3.99 mhz
KaRL: Compiled Simple Tern Inc 1.59 mhz
KaRL: Compiled Multiple Tern Inc 2.98 mhz
KaRL: Compiled Single Assign 1.84 mhz
KaRL: Compiled Multiple Assign 3.81 mhz
KaRL: Extern Function Call 2.35 mhz
KaRL: Compiled Extern Inc Func 1.57 mhz
KaRL: Compiled Extern Multi Calls 2.95 mhz
KaRL: Looped Simple Increments 1.80 mhz
KaRL: Optimized Loop 53.68 ghz
KaRL: Looped Simple Ternary Inc 1.64 mhz
KaRL: Looped Multiple Ternary Inc 1.61 mhz
KaRL: Get Variable Reference 4.66 mhz
KaRL: Get Expanded Reference 459.30 khz
KaRL: Normal Set Operation 1.77 mhz
KaRL: Variable Reference Set 2.83 mhz
KaRL: Variables Inc Var Ref 1.74 mhz
KaRL container: Assignment 10.27 mhz
KaRL container: Increments 7.21 mhz
KaRL staged container: Assignment 2.38 ghz
KaRL staged container: Increments 2.89 ghz
C++: Optimized Assignments 219.30 mhz
C++: Optimized Increments 2.24 ghz
C++: Optimized Ternary Increments 3.11 ghz
C++: Virtual Increments 71.51 mhz
C++: Volatile Assignments 328.91 mhz
C++: Volatile Increments 579.30 mhz
C++: Volatile Ternary Increments 561.99 mhz
C++: STL Atomic Increments 167.39 mhz
C++: STL Recursive Increments 28.44 mhz
C++: STL Mutex Increments 34.71 mhz
Takeaway: Intrahost Multi-threading performance can be in the megahertz (1M+ operations per second), and can even be this high when accessing data with the shared_ptr
system for large data structures. Multi-threading is the best possible way to hit throughput and latency needs in mission-critical systems.
The $MADARA_ROOT/bin/network_profiler
tool can be used for testing most supported knowledge base transports including UDP unicast, broadcast, multicast, ZeroMQ, and DDS. The tool comes with built-in help (--help or -h options) and can be run on inter-and intra-process communication between knowledge bases on multiple hosts.
To run network_profiler
on the same host for intrahost tests, open two terminals and launch the tool in each window. At least one network_profiler
should be id 0 (-i 0
, which is the publisher and default id), and at least one network_profiler
should be not zero (e.g., i 1
, which is a subscriber). The publisher will publish data of a user-specified size and frequency (default is to publish as fast as possible). The subscriber will receive data and post latency and throughput information for the configured QoS. This tool is very valuable to understand performance.
Below are some example runs on an Ubuntu 16.04 Virtual Machine for intrahost testing.
Publisher: $MADARA_ROOT/bin/network_profiler
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1
Receiving for 60 s on UDP Multicast transport
Test: SUCCESS
Settings:
Transport type: UDP Multicast
Data size: 128 B
Test time: 60 s
Latency:
Min: 6904 ns
Avg: 1142090 ns
Max: 31139447 ns
Throughput:
Messages received: 1267771
Message rate: 21129.5 packets/s
Data received: 162274688 B
Data rate: 2.70458e+06 B/s
Publisher: $MADARA_ROOT/bin/network_profiler -s 64000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1
Receiving for 60 s on UDP Multicast transport
Test: SUCCESS
Settings:
Transport type: UDP Multicast
Data size: 64000 B
Test time: 60 s
Latency:
Min: 68003 ns
Avg: 1889606 ns
Max: 7002278 ns
Throughput:
Messages received: 29882
Message rate: 498.033 packets/s
Data received: 1912448000 B
Data rate: 3.18741e+07 B/s
Publisher: $MADARA_ROOT/bin/network_profiler -s 500000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1
Receiving for 60 s on UDP Multicast transport
Test: SUCCESS
Settings:
Transport type: UDP Multicast
Data size: 500000 B
Test time: 60 s
Latency:
Min: 6348807 ns
Avg: 16113732 ns
Max: 20839996 ns
Throughput:
Messages received: 3616
Message rate: 60.2667 packets/s
Data received: 1808000000 B
Data rate: 3.01333e+07 B/s
Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -q 50000000
Receiving for 60 s on UDP Multicast transport
Test: SUCCESS
Settings:
Transport type: UDP Multicast
Data size: 500000 B
Test time: 60 s
Latency:
Min: 4693015 ns
Avg: 12484017 ns
Max: 24725457 ns
Throughput:
Messages received: 4662
Message rate: 77.7 packets/s
Data received: 2331000000 B
Data rate: 3.885e+07 B/s
Publisher: $MADARA_ROOT/bin/network_profiler -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001
Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
Transport type: UDP
Data size: 128 B
Test time: 60 s
Latency:
Min: 10708 ns
Avg: 3291926 ns
Max: 10518131 ns
Throughput:
Messages received: 4691469
Message rate: 78191.1 packets/s
Data received: 600508032 B
Data rate: 1.00085e+07 B/s
Publisher: $MADARA_ROOT/bin/network_profiler -s 64000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001
Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
Transport type: UDP
Data size: 64000 B
Test time: 60 s
Latency:
Min: 40491 ns
Avg: 208246 ns
Max: 10450103 ns
Throughput:
Messages received: 1035884
Message rate: 17264.7 packets/s
Data received: 66296576000 B
Data rate: 1.10494e+09 B/s
Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001
Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
Transport type: UDP
Data size: 500000 B
Test time: 60 s
Latency:
Min: 242553 ns
Avg: 375276 ns
Max: 5181377 ns
Throughput:
Messages received: 205562
Message rate: 3426.03 packets/s
Data received: 102781000000 B
Data rate: 1.71302e+09 B/s
Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -q 50000000 -u 127.0.0.1:30001
Receiving for 60 s on UDP transport
Test: SUCCESS
Settings:
Transport type: UDP
Data size: 500000 B
Test time: 60 s
Latency:
Min: 246819 ns
Avg: 374523 ns
Max: 5452111 ns
Throughput:
Messages received: 208807
Message rate: 3480.12 packets/s
Data received: 104403500000 B
Data rate: 1.74006e+09 B/s
- Both UDP unicast and multicast can be used for intraprocess communication, but unicast tends to be better latency and throughput due to the copy cost of multicast as implemented by the operating system
- There is no real comparison between multi-threaded performance and networked multi-process performance. Use multi-threading with a single knowledge base wherever possible for maximum performance. This gets even more drastic in performance difference as you cross the UDP datagram boundary (64KB)
Interhost performance focuses on the capability of the operating system and network to handle knowledge sharing between knowledge bases on two or more hosts. Interhost performance is facilitated by knowledge transports such as UDP unicast, broadcast, multicast, DDS, and ZeroMQ.
The $MADARA_ROOT/bin/network_profiler
tool can be used for testing most supported knowledge base transports including UDP unicast, broadcast, multicast, ZeroMQ, and DDS. The tool comes with built-in help (--help or -h options) and can be run on inter-and intra-process communication between knowledge bases on multiple hosts.
To run network_profiler
on two hosts for intrahost tests, open one terminal on each host and launch the network_profiler
tool in each terminal window. At least one network_profiler
should be id 0 (-i 0
, which is the publisher and default id), and at least one network_profiler
should be not zero (e.g., i 1
, which is a subscriber). The publisher will publish data of a user-specified size and frequency (default is to publish as fast as possible). The subscriber will receive data and post latency and throughput information for the configured QoS. This tool is very valuable to understand performance.
Below are some example runs on an Ubuntu 16.04 Virtual Machine for interhost testing.
Publisher: $MADARA_ROOT/bin/network_profiler
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1
Publisher: $MADARA_ROOT/bin/network_profiler -s 64000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1
Publisher: $MADARA_ROOT/bin/network_profiler -s 500000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1
Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -q 50000000
Publisher: $MADARA_ROOT/bin/network_profiler -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001
Publisher: $MADARA_ROOT/bin/network_profiler -s 64000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001
Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -u 127.0.0.1:30001
Publisher: $MADARA_ROOT/bin/network_profiler -s 500000 -q 50000000 -u 127.0.0.1:30000 -u 127.0.0.1:30001
Subscriber: $MADARA_ROOT/bin/network_profiler -i 1 -q 50000000 -u 127.0.0.1:30001
- Both UDP unicast and multicast can be used for intraprocess communication, but unicast tends to be better latency and throughput due to the copy cost of multicast as implemented by the operating system
- There is no real comparison between multi-threaded performance and networked multi-process performance. Use multi-threading with a single knowledge base wherever possible for maximum performance. This gets even more drastic in performance difference as you cross the UDP datagram boundary (64KB)
For performance related tuning, you may want to check out the OptimizingKarl Wiki