Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize transfer operator estimation #425

Open
rusandris opened this issue Aug 18, 2024 · 5 comments
Open

Optimize transfer operator estimation #425

rusandris opened this issue Aug 18, 2024 · 5 comments

Comments

@rusandris
Copy link
Contributor

We were looking into reducing the overlap between our package StateTransitionNetworks.jl and ComplexityMeasures.

Since the transition matrix or transfer (Perron-Frobenius) operator plays a central role in our package, we also wrote code for it and we noticed that the approach here used in transferoperator can be made more minimal and more performant.

This measures the time to get the transition matrix from a fairly long ($10^7$) orbit of the Hénon map using the two ways: transferoperator from ComplexityMeasures and calculate_transition_matrix from StateTransitionNetworks. 2D binning is used to discretize in both.

using DynamicalSystems
using BenchmarkTools

#Henon system orbit
henon_rule(x, p, n) = SVector{2}(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])
henon = DeterministicIteratedMap(henon_rule, zeros(2), [1.4, 0.3])
orbit, t = trajectory(henon, 10^7; Ttr = 10^4)
grid_size = 10 #10x10 grid binning
#grid_size = 100 #100x100 grid binning

#-----------------STN way-------------------
using StateTransitionNetworks

@btime begin
symbolic_ts = timeseries_to_grid(orbit,grid_size) #get the symbolic time series
P_stn,Q_stn,x_stn = calculate_transition_matrix(symbolic_ts) #calculate transition matrix
P_stn 
end
#for grid_size = 10 ->  1.480 s (20000065 allocations: 1.64 GiB)
#for grid_size = 100 -> 1.602 s (20000083 allocations: 1.64 GiB)
#----------------CM way----------------------

using ComplexityMeasures

@btime begin
binning = RectangularBinning(grid_size)
to = ComplexityMeasures.transferoperator(orbit, binning);
P_cm = to.transfermatrix
end
#for grid_size = 10 ->  2.098 s (1545 allocations: 828.16 MiB)
#for grid_size = 100 -> 3.333 s (19205 allocations: 830.85 MiB)

#are these the same?
println(P_stn.nzval[1:3]) #[0.5980376275657787, 0.7666443626630981, 0.6996943940406837]
println(P_cm.nzval[1:3]) #[5.894488653109343e-6, 0.5980376275657788, 0.766644362663098]
#yes except one additional value in P_cm which is almost zero

all(isapprox.(P_stn.nzval, P_cm.nzval[2:end];atol=1e-3)) #true

I think one of the reasons for these differences in runtime and scaling is that the visitors list used in transferoperator isn't actually necessary if we just want the transition matrix. Right now a list is created (that contains all visitors of each bin) and transition probabilities are computed by looping through the visitors list.

This is what I mean by making more minimal. If visitors isn't actually used anywhere else, it can be skipped and the transition probabilities can be calculated by looping through the symbolic orbit (series of outcomes, pts) once.

I've thrown together an initial PR just to show the difference: #423 (very rough, but some features can be added back later like boundary conditions, stochasticity checking etc. )

To illustrate this point further, I timed (using simple @time inside transferoperator) how much is actually taken up by creating visitors and the calculation of probabilities:

#---------------------with visitors (original)-------------------
@time ComplexityMeasures.transferoperator(pts, binning);
visitors list time:
  0.787510 seconds 
loop time (compute probabilities):
  0.257154 seconds 

Total  1.747746 seconds 

#---------------------without visitors (PR)-------------------
@time ComplexityMeasures.transferoperator(pts, binning);
loop time compute probabilities): 
  0.157713 seconds 

Total: 0.730922 seconds 

Let me know if this is worth the effort. There was a discussion about this on Slack with @Datseris and @sbulcsu but we decided to move it here. Also see #424 about generalizing the transfer operator because the design here might also change if we decide to introduce a transferoperator that's not tied to binning.

@kahaaga
Copy link
Member

kahaaga commented Aug 18, 2024

Since the transition matrix or transfer (Perron-Frobenius) operator plays a central role in our package, we also wrote code for it and we noticed that the approach here used in transferoperator can be made more minimal and more performant.

That is absolutely true! Thanks for bringing this up.

@rusandris It's been some time since I had a perfect mental model of the code here, so I have to dive a bit deeper to comment on implementation detail. What I can say in general is that the visitors were explicitly computed because of some algorithms where we did some refinement of the binning and needed to explicitly track where points and bins ended up. I still have some unpublished papers depending on this, so it would be nice to not remove it completely. However, If this ends up being in the final implementation after #424 is finished, we can let the user decide using a keyword whether they need the visitors. The default can be to not compute visitors, since that obviously slows down the estimation (as your investigation above demonstrates).

Is the idea to use the implementation in ComplexityMeasures.jl in StateTransitionNetworks.jl, or will there exist separate implementations? We should probably think about merging efforts, so that the community can benefit from one "source of truth".

At the moment, I am swamped with work, so I won't be very useful at least until mid September when it comes to actually coding something here and in #424. I'm happy, however, to discuss possible designs! It seems like you have a pretty good grasp of the methods and on the coding part, @rusandris, so perhaps you'd like to take charge on #424 once we decide on a design?

Maybe we can both sketch a rough design for #424 that we can perhaps discuss over a quick Zoom/Teams talk or something.

@Datseris
Copy link
Member

Datseris commented Aug 20, 2024

Is the idea to use the implementation in ComplexityMeasures.jl in StateTransitionNetworks.jl, or will there exist separate implementations? We should probably think about merging efforts, so that the community can benefit from one "source of truth".

As far as I can tell the idea is to update the Transfer operator here with the best code, that is what is in StateTransitionNetworks.jl. Then StateTransitionNetworks.jl can add ComplexityMeasures.jl as a dependency and re-use the transfer operator.


@rusandris do you mind finishing PR #424 from a draft to an open PR? @kahaaga the good thing with using GitHub is that everything remains in history. Nothing is truly removed ever. If you need the old code for some old paper, then you can check out older versions of ComplexityMeasures.jl. In my eyes, this does not constitute a reason to complicate the source code of newer versions by having multiple versions if for the API provided some things are not actually needed (here the visitors).

@kahaaga
Copy link
Member

kahaaga commented Aug 20, 2024

@rusandris Is there any application you can think of in StateTransitionsNetworks.jl where you'll need the visitors?

@rusandris
Copy link
Contributor Author

Then StateTransitionNetworks.jl can add ComplexityMeasures.jl as a dependency and re-use the transfer operator.

Exactly, this is what I had in mind as well. If the design we decide on here allows easy access to simple objects (for ex. a wrapper method can be written either here or in StateTransitionNetworks that returns a simple sparse matrix instead of more complex structs), it can work.

Is there any application you can think of in StateTransitionsNetworks.jl where you'll need the visitors?

At this time, not really. We've only ever needed the number of each transition so far, not the concrete list.

@rusandris
Copy link
Contributor Author

@rusandris do you mind finishing PR #424 from a draft to an open PR?

#423 is now an open PR, see the details there.
For now, it's only for this issue, I tried to keep it separate from #424 (that would be another, bigger PR some time in the near future I hope) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants