-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LKL latency/throughput #357
Comments
I don't think there is much change in performance since then. If the number
of TCP connections are small, then the performance of LKL is OK (still a
bit slower than host). But if the number is big (I'm talking about
hundreds), due to the single threading nature of it, there is a big gap vs
host.
…On Tue, Jul 11, 2017 at 6:35 AM, speedingdaemon ***@***.***> wrote:
I was reading this awesome paper written by Jerry :
https://netdevconf.org/1.2/papers/jerry_chu.pdf
It mentions that the performance of lkl in user space did not beat that of
host OS. Is that true still?
Does anyone know/has done some more performance benchmarking to find out
more such latency /throughput numbers?
Curious to know what to expect with a TCP proxy service based off of lkl.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#357>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEVfUkVmvFhsManibfIlIDE-XgEy0VRvks5sM3ongaJpZM4OUPoQ>
.
|
How would you guys compare performance of LKL vs MTCP? |
@speedingdaemon It uses raw socket backend, but modified some code. As OpenVZ’s venet0 network interface is a Cooked interface. Its raw packet has no MAC layer(14 bytes). It can’t use But it has much CPU usage. Recently, I want to use TPACKET_V2 (packet mmap) to achieve Zero Copy to reduce some CPU. |
@linhua55 @thehajime @tavip
Do these numbers sound reasonable? I would have thought that LKL-based server should have been much faster than it really turned out to be... If the throughput found above in experiment number 3 is true, then why would anyone use LKL for HFT applications? I am really really hoping that there is some init stuff that I am missing while using LKL. Wish someone could help. |
I expect LKL to perform a bit worse that host, so the host / LKL ratio seems right. I am not sure why do you think that LKL would be useful for HFT applications. |
Oh. Sorry. My bad. I thought that I read somewhere. |
Does this, the output of perf, look fine to folks?
Why do I see references to kernel.kallsyms at the top 4? I thought that with the use of LKL, I shouldn't have seen kernel.kallsyms stuff at the top. Also, what is all this raid stuff? Is it for the 8 lkl_netdev_raw_create() calls I made in myserver? |
the raw socket (AF_PACKET socket) backend uses system calls (of host kernel) where you may frequently uses those functions when sending/receiving packets. it can be amortized by several techniques (bulk processing, segmentation offloading e.g.) but need modification to the backend itself.
See #301 This is due to the benchmark code of btrfs (I guess) in the boot phase so you won't see better performance/throughput even if you disabled the feature. |
I don't mind modifying the backend. Can you please point me to the places that need to be modified? Also, I saw code in lkl_netdev_tap_init() that takes an input, offload, to offload certain functionality. |
I spent some time poking at this. lt might be faster to use a pthread spinlock instead of a mutex, but there's a dependency right now on having a recursive mutex which I haven't been able to entangle. At the extreme, it may be possible to run lkl on top of Xenomai, which does in fact support recursive mutexes. It would be nice if this particular design decision was optional, as there are quite a few threading libraries to explore that don't implement this feature. |
I have been working on moving the threading stuff out to host ops which removes a lot of dependencies (including mutexes), see: https://github.com/tavip/linux/commits/the-expanse The main problem I am still facing are the latency optimizations (direct irqs and syscalls), for which I could not find a model yet that allows us to move it to host ops. |
I tried removing the recursive flag from master and kernel booting just froze. So something's certainly using it. I did see you were trying to let the host declare a jump instruction. Where do the latency optimizations live? In my mental model, syscalls just become function calls, and IRQs are just blocking tasks that interrupted the host in a way we caught (userfaultfd, signal) or nonblocking tasks in a list that can be serviced at the host's discretion (perhaps between functions via -finstrument-functions). But I haven't quite grokked what needs to be locked at this layer. |
… missing When application fails to pass flags in netlink TLV for a new skbedit action, the kernel results in the following oops: [ 8.307732] BUG: unable to handle kernel paging request at 0000000000021130 [ 8.309167] PGD 80000000193d1067 P4D 80000000193d1067 PUD 180e0067 PMD 0 [ 8.310595] Oops: 0000 [#1] SMP PTI [ 8.311334] Modules linked in: kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper serio_raw [ 8.314190] CPU: 1 PID: 397 Comm: tc Not tainted 4.17.0-rc3+ lkl#357 [ 8.315252] RIP: 0010:__tcf_idr_release+0x33/0x140 [ 8.316203] RSP: 0018:ffffa0718038f840 EFLAGS: 00010246 [ 8.317123] RAX: 0000000000000001 RBX: 0000000000021100 RCX: 0000000000000000 [ 8.319831] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000021100 [ 8.321181] RBP: 0000000000000000 R08: 000000000004adf8 R09: 0000000000000122 [ 8.322645] R10: 0000000000000000 R11: ffffffff9e5b01ed R12: 0000000000000000 [ 8.324157] R13: ffffffff9e0d3cc0 R14: 0000000000000000 R15: 0000000000000000 [ 8.325590] FS: 00007f591292e700(0000) GS:ffff8fcf5bc40000(0000) knlGS:0000000000000000 [ 8.327001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.327987] CR2: 0000000000021130 CR3: 00000000180e6004 CR4: 00000000001606a0 [ 8.329289] Call Trace: [ 8.329735] tcf_skbedit_init+0xa7/0xb0 [ 8.330423] tcf_action_init_1+0x362/0x410 [ 8.331139] ? try_to_wake_up+0x44/0x430 [ 8.331817] tcf_action_init+0x103/0x190 [ 8.332511] tc_ctl_action+0x11a/0x220 [ 8.333174] rtnetlink_rcv_msg+0x23d/0x2e0 [ 8.333902] ? _cond_resched+0x16/0x40 [ 8.334569] ? __kmalloc_node_track_caller+0x5b/0x2c0 [ 8.335440] ? rtnl_calcit.isra.31+0xf0/0xf0 [ 8.336178] netlink_rcv_skb+0xdb/0x110 [ 8.336855] netlink_unicast+0x167/0x220 [ 8.337550] netlink_sendmsg+0x2a7/0x390 [ 8.338258] sock_sendmsg+0x30/0x40 [ 8.338865] ___sys_sendmsg+0x2c5/0x2e0 [ 8.339531] ? pagecache_get_page+0x27/0x210 [ 8.340271] ? filemap_fault+0xa2/0x630 [ 8.340943] ? page_add_file_rmap+0x108/0x200 [ 8.341732] ? alloc_set_pte+0x2aa/0x530 [ 8.342573] ? finish_fault+0x4e/0x70 [ 8.343332] ? __handle_mm_fault+0xbc1/0x10d0 [ 8.344337] ? __sys_sendmsg+0x53/0x80 [ 8.345040] __sys_sendmsg+0x53/0x80 [ 8.345678] do_syscall_64+0x4f/0x100 [ 8.346339] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 8.347206] RIP: 0033:0x7f591191da67 [ 8.347831] RSP: 002b:00007fff745abd48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 8.349179] RAX: ffffffffffffffda RBX: 00007fff745abe70 RCX: 00007f591191da67 [ 8.350431] RDX: 0000000000000000 RSI: 00007fff745abdc0 RDI: 0000000000000003 [ 8.351659] RBP: 000000005af35251 R08: 0000000000000001 R09: 0000000000000000 [ 8.352922] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000 [ 8.354183] R13: 00007fff745afed0 R14: 0000000000000001 R15: 00000000006767c0 [ 8.355400] Code: 41 89 d4 53 89 f5 48 89 fb e8 aa 20 fd ff 85 c0 0f 84 ed 00 00 00 48 85 db 0f 84 cf 00 00 00 40 84 ed 0f 85 cd 00 00 00 45 84 e4 <8b> 53 30 74 0d 85 d2 b8 ff ff ff ff 0f 8f b3 00 00 00 8b 43 2c [ 8.358699] RIP: __tcf_idr_release+0x33/0x140 RSP: ffffa0718038f840 [ 8.359770] CR2: 0000000000021130 [ 8.360438] ---[ end trace 60c66be45dfc14f0 ]--- The caller calls action's ->init() and passes pointer to "struct tc_action *a", which later may be initialized to point at the existing action, otherwise "struct tc_action *a" is still invalid, and therefore dereferencing it is an error as happens in tcf_idr_release, where refcnt is decremented. So in case of missing flags tcf_idr_release must be called only for existing actions. v2: - prepare patch for net tree Fixes: 5e1567a ("net sched: skbedit action fix late binding") Signed-off-by: Roman Mashak <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
I was reading this awesome paper written by Jerry : https://netdevconf.org/1.2/papers/jerry_chu.pdf
It mentions that the performance of lkl in user space did not beat that of host OS. Is that true still?
Does anyone know/has done some more performance benchmarking to find out more such latency /throughput numbers?
Curious to know what to expect with a TCP proxy service based off of lkl.
The text was updated successfully, but these errors were encountered: