-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf improvements #237
Comments
Hrm. So I've known for a while that a number of my early decisions in Chipmunk were fairly sub-optimal. In the past I did experiments where I rewrote large sections of Chipmunk to be SoA ordered data, and that helped a lot. The problem is that a lot of those structs are part of the public API and I can't really change them now. I'd never really considered the impact of just reordering fields though... I'm kind of surprised that inlining the contacts into the arbiter helped as the contacts are already linearly packed into memory in the order they are accessed. I've considered Chipmunk to be "stable" for quite a few years now where I don't really have any big or breaking changes to make. I'm not sure how I feel about this, but could maybe be convinced. I certainly wouldn't mind if you made a "turbo" fork or something. I'm currently making a new game, and I actually wrote a new (but very very simple) physics engine for it to finally try out some new ideas. It's vaguely ECS based, and it heavily uses SoA. I knew it would be faster, but I was shocked to find it running several times faster. (Though it's hard to make a very direct comparison) https://github.com/slembcke/veridian-expanse/blob/master/src/drift_physics.c Anyway... not really relevant, but it's interesting to think where Chipmunk could go in the future. |
I was also surprised that I got a clear effect from my very limited changes, especially for just rearranging stuff which is easy and safe to do (unless you care about binary compatibility of course). I can see your desire to keep it stable. I have had some user complain when moving from Pymunk 4.x to 5.0 which broke several things and its a difficult tradeoff. At least its better its stable than things changing for the worse, or more bugs etc are introduced. One thing Pymunk is used for is different kinds of research (maybe this is even the most common usage). Its an quick and easy way to simulate an environment and then try different things (for example reinforcement learning, motion prediction and a lot of other stuff). Some years ago I had a discussion about future of Pymunk with some toolkit for developing AI algorithms (i.e. simulate an environment for a robot and other stuff). They were using a python library built on top of Box2D, which was not maintained. Now, I just do Pymunk dev sometimes in the evening, so if I really will be able to make some breaking big performance improvements is still to open. If nothing else I will make a PR of rearranged structs, but needs some more test to see what was useful first. |
Btw, have you looked at GPU acceleration of Chipmunk? Or maybe its not really relevant for games when the GPU is busy doing graphics anyway? |
Yeah... Like there's a lot I want to do to modernize Chipmunk. At this point it's 17 years old! Like there's a lot of data changes I could do to vastly improve performance, and the whole API itself could use a pretty big "modern C" upgrade. On the other hand I have 2 big projects for work, and a host of other hobby projects. :( I just don't see myself being able to pull that off without making something much simpler and more focused like the physics I made for Veridian Expanse. On the other (other) hand, Erin Catto has been working towards Box2D 3.0, and it's absolutely a wishlist of everything I would put in a Chipmunk refresh. Modern C API (not C++!), heavy multi-threading focus from the start, object handles instead of direct pointers, etc. It sounds pretty great. Pymunk for research: That reminds me! Not sure if I shared this story with you before. A few years back I helped mentor high school students for the FIRST Robotics competition. One of those students is now getting his PhD in robotics actually. I ended up sitting next to him at a LAN party of all places and we talked a lot about game tech. He didn't realize that I worked in games, and I mentioned a bunch of our projects including Chipmunk. Apparently he used Pymunk in his PhD research to do his initial simulations to develop the control systems they were working on. Small world! :) I don't think I've said this in a while, but it's awesome that you made Pymunk. It really seems to help a lot of people. GPU acceleration: Not really. I've read a bit about how people have approached some of the algorithms involved, but that's about it. I think realistically Chipmunk's OO-ish API is a terrible fit for how GPU accelerated data needs to be organized and you'd need to copy a lot of data around. I've done just enough related work too to know that there are a lot of subtle mistakes you can make when trying to optimize at that level that can destroy performance too. |
Yeah, the lack of time/focus is always a biggie. There's so many things that would be interesting to try! This is also why I made the post here in the first place, best to share the small thing while its fresh, even if nothing more is done.. Small world indeed :) As always, cool to hear about someone using Pymunk! GPU: Ah, yeah that is what I guessed. Difficult/impossible to shoehorn into an existing design, and it can have wide-ranging side-effects. |
This is not really a issue, more of a FYI and question if someone else has looked into these things that maybe could fit on the forum, but I feel its more visible to put it here..
Anyway, just a couple of days ago I added a batch api to Pymunk (Python 2d physic library built on Chipmunk) to get some data quicker, since its quite expensive to call C code from Python. In my simple test case I was mainly bottle-necked by Chipmunk performance and not Python as is the usual case. So, I started to look into how I could increase the performance of Chipmunk (on desktop) if possible. This is a short report so far:
I did all tests on Windows 11 in WSL (Ubuntu) on my Thinkpad X1 get 7 laptop with a i5-8265U CPU (~Skylake). To compare performance I used bench.c, but shortened it to 1000 steps.
I tried to reorder the structs:
cpArbiterApplyImpulse
if I run the demo bench.c throughperf record
.pahole
I could see that bothcpArbiter
struct and thecpBody
struct are not cache line aligned for how they are used in the apply impulse function.cpArbiterApplyImpulse
first in those two structs.I also tried to compile with
march=skylake
. Not sure how I would use this in a real case with Pymunk, but worth testing at least. It saved another 5% of the remaining times for a total saving of 9% .These 2 things were the easiest I could think of (after I researched a bit how easy SIMD for x86 would be) to try.
Some other things I thought about to put all the data needed closer in memory (if they help or not I do not know yet)
cpArbiterApplyImpulse
I should note that I had 0 experience of optimizing C code before this. Actually I have almost 0 experience writing C code at all.
Any input is welcome!
The text was updated successfully, but these errors were encountered: