Improve reading from tar archives #178

schlegelp · 2025-01-02T18:54:46Z

Addresses #173 by using archive file stream instead of random access when reading from tar archives.

Due to the way this is implemented now, we won't be using parallel processes (i.e. the parallel parameter is ignored). We could create chunks of files that are adjacent in the archive and split the chunks across multiple processes. However, that in turn would generate issues with the process bar.

Ultimately, the file streaming seems to be very performant (possibly because we're not having to open/close individual files?) and I'm not too worried about performance. On my machine I can read the tar archive with 97k hemibrain skeletons in around 3 minutes which isn't too shabby.

In addition to the above this PR contains:

making read_swc more robust against unexpected number of columns
following URL changes in two of the tutorials (download.brainlib.org:8811 -> download.brainimagelibrary.org)

…gelibrary.org

schlegelp added 5 commits January 2, 2025 18:42

improve reading from tar files

7e18234

read_swc: deal with potential additional columns + formatting

6d4390d

BaseReader: fix typo in parameter name

44e4c31

tutorials: change URL download.brainlib.org:8811 -> download.brainima…

4940a1e

…gelibrary.org

i/o base: remove left-over on_error parameters, use self.errors instead

5be8f96

schlegelp changed the title ~~Fix reading from tar archives~~ Improve reading from tar archives Jan 3, 2025

schlegelp merged commit ab4de9e into master Jan 3, 2025
20 of 21 checks passed

schlegelp deleted the read_tar_fix branch January 3, 2025 10:58

schlegelp mentioned this pull request Jan 3, 2025

read_swc importing compressed file #173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve reading from tar archives #178

Improve reading from tar archives #178

schlegelp commented Jan 2, 2025 •

edited

Loading

Improve reading from tar archives #178

Improve reading from tar archives #178

Conversation

schlegelp commented Jan 2, 2025 • edited Loading

schlegelp commented Jan 2, 2025 •

edited

Loading