Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: File conversion progress bars #774

Closed
CodyCBakerPhD opened this issue May 6, 2024 · 6 comments
Closed

[Feature]: File conversion progress bars #774

CodyCBakerPhD opened this issue May 6, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@CodyCBakerPhD
Copy link
Collaborator

CodyCBakerPhD commented May 6, 2024

What would you like to see added to the NWB GUIDE?

Next step after #676 is to do parallel file write and sub bars for the buffers of each file

Here is some basic code that should accomplish this (I'll have to open a small PR to HDMF and HDMF-Zarr to facilitate proper progress bar class specification)

    futures = []
    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        for session_to_nwb_kwargs in session_to_nwb_kwargs_per_session:

            # Errors in subprocesses will not propagate to top level stderr
            # So dump them to a file instead
            # Might want dedicated subfolder for this?
            exception_file_path = data_dir_path / f"ERROR_<nwbfile_name>.txt"

           # This is where we inject sub par progress info
           # This is passed via the iterator options of relevant interfaces through conversion options of the converter
           # For example
           # conversion_options = dict()
           # for interface_key, interface in data_formats_page.items():  
           #     if isinstance(interface, BaseRecordingInterface):
           #         conversion_options.update({interface_key: dict(iterator_opts=dict(display_progress=True, progress_bar_class=TQDMPublisher, progress_bar_options=dict(...)))})

            futures.append(
                executor.submit(
                    safe_session_to_nwb,
                    session_to_nwb_kwargs=session_to_nwb_kwargs,
                    exception_file_path=exception_file_path,
                )
            )
        for _ in TQDMPublisher(as_completed(futures), total=len(futures), ...):
            pass

where safe_session_to_nwb looks like this

def safe_session_to_nwb(*, session_to_nwb_kwargs: dict, exception_file_path: Union[Path, str]):
    """Convert a session to NWB while handling any errors by recording error messages to the exception_file_path.
    Parameters
    ----------
    session_to_nwb_kwargs : dict
        The arguments for session_to_nwb.
    exception_file_path : Path
        The path to the file where the exception messages will be saved.
    """
    exception_file_path = Path(exception_file_path)
    try:
        session_to_nwb(**session_to_nwb_kwargs)
    except Exception as e:
        with open(exception_file_path, mode="w") as f:
            f.write(f"session_to_nwb_kwargs: \n {pformat(session_to_nwb_kwargs)}\n\n")
            f.write(traceback.format_exc())

where session_to_nwb is whatever current function we currently use in the GUIDE to convert a single session to create a single file

Inspired by @pauladkisson's contribution on catalystneuro/cookiecutter-my-lab-to-nwb-template#23, which is based on strategies we've used many times for past conversions

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

Yes

Did you confirm this feature was not already reported?

Yes

@CodyCBakerPhD CodyCBakerPhD added the enhancement New feature or request label May 6, 2024
@CodyCBakerPhD
Copy link
Collaborator Author

This branch should work for HDF5 at least: hdmf-dev/hdmf#1110

I don't think NeuroConv changes should be needed since they just dynamically pass everything down the chain

@garrettmflynn
Copy link
Member

Just flagging that adding Paul's safe conversion strategy to #778 would change a good amount of our error handling for the GUIDE in general, and might be better in a separate Issue / PR

@CodyCBakerPhD
Copy link
Collaborator Author

We can probably just use our own existing endpoint for per-file conversions; the main goal with any kind of 'safe' way of doing that is just that the traceback error stack gets dumped to a persistent file on disk since stdout/stderr pipes are not easily accessible during multiprocessing

@garrettmflynn
Copy link
Member

Just implemented a general log system in #778 with a specific endpoint for registering errors from parallel processes! Should give us something useful to work with generally for remote debugging :)

Here's an example file:
2024-05-15_16-03-22.log

As shown by the Exception on /neuroconv/convert line, this will automatically log errors from any failing endpoint in addition to the process-specific errors

@CodyCBakerPhD
Copy link
Collaborator Author

Just implemented a general log system in #778 with a specific endpoint for registering errors from parallel processes! Should give us something useful to work with generally for remote debugging :)

Very cool!

@CodyCBakerPhD
Copy link
Collaborator Author

Added in #778

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants