any way you can boost performance of building the process_map ? #13

trikiamine23 · 2020-02-17T10:34:50Z

This tool is great, very great actually.
But is it possible to add multi-processing in the process_map function:
Here you can see (in this photo) that during the build of the dataframe nodes and edges, it is using only one processor. If you add the multiprocessing part it will be very fast, and we can deploy this package in our servers.
I have nearly 3 Millions (300 uniques) rows.

fmannhardt · 2020-02-17T12:47:38Z

I did some work on improving the performance of the data preparation in the process_map function last year by using data.table instead of dplyr in more places. This should use multiple threads where applicable.

There are certainly some parts which could be further optimised. Do debug your performance problem, we would need some more information on where exactly the bottleneck is. Could you try executing your code with the RStudio profiler activated and upload the saved profvis file? There are some options in using bupaR that can lead to performance degradation.

trikiamine23 · 2020-02-17T15:01:00Z

My data input for the function process_map is a data.frame, eventlog
do I need to convert it to data.table before ?
I have the feeling I do not understand.

PS: trying to fix my profvis problem, get back with the result as soon as possible.

fmannhardt · 2020-02-17T22:55:26Z

I looked at the profvis log you send me:

It looks like R is taking most of the time for garbage collection. That suggests you have to little memory to keep the full data (plus the computation). I will compare it with a normal situation tomorrow, but I think this cannot easily be improved except for having more memory.

data.table is currently only used inside and it is not possible to supply a data.table.

trikiamine23 · 2020-02-18T11:27:02Z

Thank you very much for your response but this is my memory usage while running the process_map function:

It's some kind of loop that takes an eternity to finish (no results at the end for 3 Million rows with 300 uniques events).
I know it is not very wise to do so, but sometimes management would like to see the spaghetti shape of the processes.
PS: I did not put the validate option to TRUE (for the check conformance) in the eventlog function

fmannhardt · 2020-02-18T14:01:10Z

PS: I did not put the validate option to TRUE (for the check conformance) in the eventlog function

That is probably a good idea since it would take a lot of time to validate the event log.

I see that the available memory should not be the issue. Just realised that I forgot to ask, do you use the current development version (installed from Github master) or the CRAN? Since, there are some improvements in the development version.

trikiamine23 · 2020-02-18T14:20:30Z

I use the CRAN version, I will test the development version.
Thank you

trikiamine23 · 2020-04-15T10:08:47Z

After getting a closer look, it was actually the SVG export that takes a lot of time
here is my code

grf %>%
     generate_dot() `%>%`
     grViz() %>%
     export_svg %>%
     svgPanZoom()

grf is a diagrammeR graph structure

noamanemobidata · 2023-11-18T20:13:16Z

Hello! considering the replacement of dplyr and data.table with duckdb as a potential strategy to enhance performance in the data preparation process. This change may contribute to improved efficiency and overall better performance as shown in the recent benchmark : https://duckdblabs.github.io/db-benchmark/

trikiamine23 changed the title ~~any way you can boost performance of building the process_map~~ any way you can boost performance of building the process_map ? Feb 17, 2020

fmannhardt self-assigned this Feb 17, 2020

trikiamine23 closed this as completed Feb 17, 2020

trikiamine23 reopened this Feb 17, 2020

gertjanssenswillen mentioned this issue Feb 18, 2020

perf issue question bupaverse/bupaR#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

any way you can boost performance of building the process_map ? #13

any way you can boost performance of building the process_map ? #13

trikiamine23 commented Feb 17, 2020

fmannhardt commented Feb 17, 2020

trikiamine23 commented Feb 17, 2020

fmannhardt commented Feb 17, 2020 •

edited

Loading

trikiamine23 commented Feb 18, 2020 •

edited

Loading

fmannhardt commented Feb 18, 2020

trikiamine23 commented Feb 18, 2020

trikiamine23 commented Apr 15, 2020

noamanemobidata commented Nov 18, 2023

any way you can boost performance of building the process_map ? #13

any way you can boost performance of building the process_map ? #13

Comments

trikiamine23 commented Feb 17, 2020

fmannhardt commented Feb 17, 2020

trikiamine23 commented Feb 17, 2020

fmannhardt commented Feb 17, 2020 • edited Loading

trikiamine23 commented Feb 18, 2020 • edited Loading

fmannhardt commented Feb 18, 2020

trikiamine23 commented Feb 18, 2020

trikiamine23 commented Apr 15, 2020

noamanemobidata commented Nov 18, 2023

fmannhardt commented Feb 17, 2020 •

edited

Loading

trikiamine23 commented Feb 18, 2020 •

edited

Loading