-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo a pass on memory allocations #169
Comments
Reminder: keep track of what has been tried in the performance Discussion |
@PierreMartinon should more or less be subsumed by #188 |
Ha, you wish :D
with
An annoying point is that, probably due to the very small size of our vectors, there is no consistent performance gain when using dot operators and/or views. Right now both versions have similar allocations, and in_place seems a bit faster. Still working on it. |
@PierreMartinon When you pass sth like julia> f!(y, x) = begin
y[:] .= [sum(cos.(x[1:2])), x[1] * x[3]]
end
f! (generic function with 1 method)
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(y[1:2], x))
end
320
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(@view(y[1:2]), x))
end
240
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(y[1:2], x[1:3]))
end
400
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(@view(y[1:2]), x[1:3]))
end
320
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(@view(y[1:2]), @view(x[1:3])))
end
240 Also note that more allocations can be saved using julia> f!(y, x) = begin
@views y[:] .= [sum(cos.(x[1:2])), x[1] * x[3]]
end
f! (generic function with 1 method)
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(y[1:2], x))
end
240
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(@view(y[1:2]), x))
end
160
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(y[1:2], x[1:3]))
end
320
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(@view(y[1:2]), x[1:3]))
end
240
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(@view(y[1:2]), @view(x[1:3])))
end
160 So as rule, the good combination seems to be: julia> f!(y, x) = begin
@views y[:] .= [sum(cos.(x[1:2])), x[1] * x[3]]
end
f! (generic function with 1 method)
julia> let x = ones(3)
y = zeros(4)
println(@allocated f!(@view(y[1:2]), x))
end
160 Moreover, you should be able to write a code that is uniform and works both for vectors and scalars. |
@amontoison any further comments on the post above? |
@PierreMartinon comments on #169 (comment) above? |
In the example x was already a view, and adding another view in _x gives slightly worse results. For very small sizes, views are just not better than slices, due to the cost of creating the view (which is not just a pointer). Regarding types, we still have some unstable ones due to the unions from CTBase (mayer for instance is now a Union{Nothing,Mayer,Mayer!}), but it's hard to tell the actual impact of these. Same with the scalar / vector case, I ended up having similar performance using either the 'converters' (_x, _u) or getters that directly return scalar or vector values. I'll probably merge the current inplace branch once it's cleaned up a bit. Performance is slightly better than current main. |
I think we can still improve the allocations in the constraints: compared to the We can probably use work arrays better too. Started to look at Profile.Allocs with PProf, flamegraphs are nice, however the 'source' view is not always accurate. More handy than the .mem files though. |
@PierreMartinon ready to merge and test inplace code? (sorry if I missed sth!) |
Hi @jbcaillau ! Almost. The branch is similar in performance to main so merging is not a regression. In CTBase could we have the option to enable/disable inplace for abstract OCPs ? |
Hi
well, performance is supposed to be much better by reducing the number of allocations!
it is planned to keep only the memory efficient code, that is inplace alone. what would be te point to keep out of place? |
Following our discussion this morning, a few details on the current version, more specifically the constraints.
Here is the loop core function, here for midpoint since it is a bit simpler than trapeze (no save/reuse of dynamics). First we have the discretized dynamics
followed by the path constraints (it is the same function, I split the quote for clarity).*
*Here you can see it still accepts the out of place format for comparison purposes. This is also true for the dynamics part, although the encapsulating function dynamics_ext! is always inplace. The getter for x (u is similar)
JET gives a number of runtime dispatch alerts, unfortunately the output is not comfortable to navigate. |
The getters for x's and u's seem to allocate 80 bytes for the views (a view with constant indices is 48bytes), regardless of the size. NB. @jbcaillau the size 1 view is still a subarray and is not necessarily compatible with scalar syntax in OCP functions, did I miss something ? Copying values for slices takes 96, 112, 112, 128, 128, etc (16 bytes alignment), so Views seem always better in terms of allocations. For whatever reason, when testing the constraints evaluation, we see a bit less allocations for the slice version (299kB vs 308kB for views) but performance is similar. The main drawback of views is that they are allocated at each getter call, while the slice version may work inplace with a reusable work array ? I'll try the variant without getters at all first to check what could be gained. |
A |
Hi @amontoison ,
|
Inplace version, the constraints still allocate apparently:
The last one goes to
I defined the constraint for the ocp as
with
(basic c[1] = x[3] is similar) |
Did you try |
I think so too. Warntype was mostly ok on the constraints but it did not descend into the subfunctions. JET gave runtime dispatch at all OCP function calls, but I find the output harder to read. I'll try warntype on the subfunctions, thanks ! |
Some progress. Fixed a type instability on the times, and got rid of the scalar/vector Union for the state/control/optim variables. Reduction of allocations is not as big as hoped though, still looking into it. Update: getters for t,x,u,v are now type-stable and show 0 allocations. For the times we split the Union and use 2 distinct variables for fixed time vs free time index. For the scalar/vector x,u,v we now have 3 parametric types in DOCP and the getters dispatch along these types. Each getter only has 2 methods (scalar / vector) for its own variable, regardless of the others. The vast majority of the remaining allocations occurs when calling OCP functions: mayer, dynamics, lagrange, and all constraints. JET indicates runtime dispatch at all these calls, and code_warntype flags them as 'Any'. I'll start with mayer. |
@PierreMartinon thanks for the update (and @amontoison for the feedback):
|
Yes it certainly seems so. Older code used single methods with
See below. Maybe we'll add some post-processing in the OCP to define functions with more fixed types ? |
Looking at the mayer cost which is one of the simpler calls, and also the main part of the objective. 'Local Mayer' is the (inplace) locally defined version
while 'OCP Mayer' refers to
We see that the local Mayer does not allocate, either with hardcoded views / scalar arguments or the new getters (while old getters added some allocations there).
Code_warntype looks clean except for the F.f! function that is tagged as 'Any'.
JET indicates 2 runtime dispatches for the call to F.f!, the second one seems related to the return type ?
Profile locates the 3 allocations at the f! call, and below that the trace looks very low level.
|
@jbcaillau @ocots Question: |
This confirms what we saw yesterday with @ocots. Your main problem everywhere (here and in CTBase) is fields with abstract types. For more performance tips, as well as a faster way of profiling ( |
Hi there @jbcaillau @ocots @gdalle @joseph-gergaud, After reworking the code a bit, the state equation part is no longer allocating :-) |
To illustrate, here are the results of the tests in
a)
outputs
b)
gives (not showing the Body part which details each line / operation and is quite verbose)
and
gives
There are no 'Any' types anymore with the parametrized getters handling the scalar/vector distinction. c) Another tool is JET that signals possible runtime dispatch cases in particular (I had to force the display for whatever reason):
gives a runtime dispatch when calling the Mayer cost in
For the constraints
we get something similar with 1 runtime dispatch for the calls to the dynamics and lagrange cost, and 2 for the control/state/mixed constraints, and boundary/variable constraints.
It seems each call to an OCP function is flagged with 2 possible runtime dispatch, except dynamics and lagrange cost with only 1. d) To conclude, we show the profiler ouptut for the constraints, to see where the allocations occur
collect the profiling data and gives a link to a local webpage with the visualization from PProf
I find the flamegraph and source views the most useful, even if they tend to be cluttered with a lot of noise from the julia internals (maybe the depth of analysis can be limited or something). The source view matches allocations with lines, such as
which shows that the discretized dynamics in
|
Yes, that is due to field annotations like struct MyType
f::Function
end which should be struct MyType{F}
f::F
end |
back on track: 🚀 @PierreMartinon for cleaning up this and thanks @gdalle for the input. current refactoring by @ocots should take care of the remaining issues 🤞🏾 |
Todo: check the .mem files thingy (julia --track-allocation=user). Also recheck code_warntype for type unstabilities.
The text was updated successfully, but these errors were encountered: