Improving Performance #8

j3soon · 2022-04-21T02:27:06Z

Let's say we have a event file containing 10^6 scalar events:

import os
from torch.utils.tensorboard import SummaryWriter

N_EVENTS = 10 ** 6
log_dir = "./tmp"
writer = SummaryWriter(os.path.join(log_dir, f'run'))
for i in range(N_EVENTS):
    writer.add_scalar('y=2x', i, i)

and compare the loading time between pivot=False and pivot=True:

import time
from tbparse import SummaryReader

def time_tbparse():
    for use_pivot in {False, True}:
        start = time.time()
        reader = SummaryReader("./tmp", pivot=use_pivot)
        df = reader.scalars
        end = time.time()
        print(f"pivot={use_pivot}:", end - start)
time_tbparse()

The results are 11 seconds and 24 seconds respectively on my Intel i7-9700 CPU and Seagate ST8000DM004 HDD. Using pivot=True costs twice the time of pivot=False, and the performance is much worse when parsing multiple files.

If we profile the code with cProfile:

import cProfile
cProfile.run('time_tbparse()')

we can see the results:

         206029117 function calls (191028625 primitive calls) in 66.427 seconds

   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
        6    0.000    0.000   34.819    5.803 apply.py:143(agg)
        3    0.000    0.000   34.819   11.606 apply.py:308(agg_list_like)
...
  3000000    5.838    0.000   24.541    0.000 summary_reader.py:209(_merge_values)
      6/2    0.001    0.000   35.408   17.704 summary_reader.py:237(get_events)
...
        2    0.001    0.001   35.409   17.705 summary_reader.py:304(scalars)
      6/2    0.169    0.028   31.403   15.701 summary_reader.py:61(__init__)
...

The bottleneck is in located in the _merge_values function called here, which is not executed when pivot=False.

I believe the _merge_values function can be optimized to improve the performance when using pivot=True.

Moreover, it would be nice to provide some benchmarks and document the performance analysis in the README file, which will be useful for future optimizations.

The text was updated successfully, but these errors were encountered:

j3soon · 2022-05-18T14:35:39Z

The performance is slightly improved in commit 5d69fa1 and 4bd8740. Several benchmarks are provided in tbparse/profiling.

To further accelerate the parsing process, there are two potential solutions: Numba (supported by pandas) and cuDF.

For parsing single event files, the bottleneck is located in get_cols(...) and grouped.aggregate(self._merge_values).

Accelerating _merge_values with Numba is not straightforward due to the object data type and the unknown length of the outcome results.
As for get_cols(...), we know the number of rows/columns and the data type beforehand (based on the tensorboard event data). Therefore, it's possible to replace the list as numpy arrays with fixed length and non-object data type.

So the next step is to re-write the get_cols(...) functions in numpy array style and provide an option to allow Numba to JIT-compile these functions.

Update (2022/11/17): Similar to Numba, cuDF also does not support the object data type as mentioned here.

j3soon · 2022-08-06T16:36:49Z

When parsing many event files inside a deep filesystem hierarchy, the parsing speed might be very slow.

This is due to the use of a recursive tree parsing logic (bad design) to combine the DataFrames constructed in each subroutines, making the worst time complexity $O(n^2)$ for $n$ files.

The solution to this is to remove the recursive parsing logic and combine all DataFrames at once, improving the worst time complexity to $O(n)$.

ReHoss · 2024-11-27T22:53:42Z

@j3soon Hello,

For a final 750k DataFrame file, this code takes 22min on my with a i9-12900H
(there are 172 DataFrame read)
~20s / 4500 rows

list_df_run_tb_data = []

for name_id in list_id_hash:
    path_run_config_folder = (
        f"{path_xp}/{name_id}/generated_data/trainer_data/ode_trainer"
    )
    # Get the files which have "tfevents" in their name
    list_files = [
        path.name for path in pathlib.Path(path_run_config_folder).glob("*tfevents*")
    ]
    assert len(list_files) == 1, f"More than one file in {path_run_config_folder}"
    path_run_config_file = f"{path_run_config_folder}/{list_files[0]}"
    # Load config with tbparser
    # noinspection PyPackageRequirements

    tb_reader = tbparse.SummaryReader(path_run_config_file)
    df_run_tb_data = tb_reader.scalars
    list_df_run_tb_data.append(df_run_tb_data)

Is it expected?
I run that in a .ipynb.

j3soon changed the title ~~Poor performance when using pivot=True~~ Improving Performance Aug 6, 2022

j3soon mentioned this issue Dec 10, 2022

reload functionality #11

Open

j3soon added the enhancement New feature or request label Dec 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Performance #8

Improving Performance #8

j3soon commented Apr 21, 2022

j3soon commented May 18, 2022 •

edited

Loading

j3soon commented Aug 6, 2022

ReHoss commented Nov 27, 2024 •

edited

Loading

Improving Performance #8

Improving Performance #8

Comments

j3soon commented Apr 21, 2022

j3soon commented May 18, 2022 • edited Loading

j3soon commented Aug 6, 2022

ReHoss commented Nov 27, 2024 • edited Loading

j3soon commented May 18, 2022 •

edited

Loading

ReHoss commented Nov 27, 2024 •

edited

Loading