rdsquashfs either hangs or is very slow #118

goverp · 2023-07-25T07:01:46Z

I'm using squashfs-tools-ng v1.2.0 on Gentoo on an amd64 machine with lots of memory and a Zen 3 chip. I have a 70 MB file that's a squashfs version (lzo compressed) of a 190 MB directory tree with very many small files (146,000 inodes). For some testing I wanted the original uncompressed tree, so I ran "rdsquashfs -qu / foo".

It appeared to run very slowly (without the -q, the screen filled rapidly with the names of files, as expected, but there are rather a lot). After an age I killed it with Ctrl-C. The top level directories appeared to all exist - I don't know if they were fully populated. I repeated the extract, assuming I hadn't given enough time, or something, but it was still running after more than an hour. "top" showed no significant processing; "iotop" showed rdsquashfs was the heaviest I/O consumer, but only doing 100-200 KB/sec (my 5-disk RAID10 system can achieve 400 MB/sec, so it's not that holding it up).

At this point I realised I could do what I wanted by mounting the squash image and reading it as input (Doh!) - I didn't need to run rdsquashfs at all. This was goodness, as I could read and process the entire directory tree in less than a second! But that leaves something weird in rdsquashfs!

I don't know how the squashfs image was created - it's a Gentoo portage snapshot from a Gentoo mirror, for example:
https://www.mirrorservice.org/sites/distfiles.gentoo.org/snapshots/squashfs/gentoo-20230713.lzo.sqfs

AgentD · 2023-07-25T11:04:31Z

Hi,

if you are unpacking the entire image, that is going to be slower than mounting it and accessing it. rdsquashfs essentially does the following:

The entire directory tree is scanned and reconstructed in memory
It is sorted and sanity checked (i.e. no two files with the same name in a directory; if one of them was a symlink, this could be used for directory traversal, a well known issue with archiving programs)
The directory tree is recursively created on the output filesystem
The files are sorted so that the image is accessed mostly sequentially and tail-end blocks don't have to be unpacked several times over
The files are unpacked.

In contrast, if you mount the image, only step 1 one happens. It also happens asynchronously, on demand as you start traversing directories. If you don't access the file contents, no file blocks have to be unpacked either, only the meta data blocks from the inode and directory table. The SquashFS kernel driver furthermore has a multi thread decompressor queue, and caches meta data blocks.

If you are only interested in inspecting directory listings, rdsquashfs -l <path> <image> produces a tar-style listing of a selected directory.

Alternatively, rdsquashfs -d <image> produces a listing of the entire image, intended to be compatible with the input format for gensquashfs, i.e. you'll get lines of the shape <type> <path> <mode> <uid> <gid> <extra>. For the image you linked to, producing such a listing takes about a second of pre-processing time on my 6 year old laptop, as it recurses through the directory tree.

Dr-Emann · 2023-07-25T14:25:39Z

Over an hour for a 190 MB directory tree seems excessive though.

unsquashfs unpacks the same image in about 3 seconds, and I gave up after a few minutes with rdsquashfs, something seems off.

Dr-Emann · 2023-07-25T14:32:07Z

Ah, needed to wait a little more, not seeing over an hour here, but still pretty long:

Executed in  198.97 secs    fish           external
   usr time    3.35 secs    0.00 micros    3.35 secs
   sys time   16.46 secs  780.00 micros   16.46 secs

Gottox · 2023-08-08T20:29:43Z

It is sorted and sanity checked (i.e. no two files with the same name in a directory; if one of them was a symlink, this could be used for directory traversal, a well known issue with archiving programs)

Do you have a testcase for this issue or a malformed sqfs archive?

AgentD · 2023-08-10T06:52:51Z

@Gottox there is an intentionally broken archive in https://github.com/AgentD/squashfs-tools-ng/blob/master/bin/rdsquashfs/test/pathtraversal.sqfs, along with a script that runs rdsquashfs to unpack it and checks if the file in question was created. This test is run by make check along with all the other unit & integration tests.

unsquashfs from squashfs-tools also guards against this kind of issue. There allegedly are "extensive tests" run before releases, but but I'm not aware of any publicly available test suites.

Other archivers guard against this as well (e.g. GNU tar, BusyBox tar, ....), as this kind of problem plagues pretty much every format that supports symlinks.

Gottox · 2023-09-06T15:33:14Z

Thanks @AgentD!

libsqsh does sanity checking while extracting, not beforehand, I guess that's a faster approach at the cost of accepting some malformed archives. So, in that regard, it's just as secure as tar. sqsh-unpack uses mkstemp-extract-rename semantic to prevent writing through symlinks. That means doing the check in the library isn't needed.

Personally, I doubt that squashfs-tools has a decent test suite.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rdsquashfs either hangs or is very slow #118

rdsquashfs either hangs or is very slow #118

goverp commented Jul 25, 2023

AgentD commented Jul 25, 2023 •

edited

Loading

Dr-Emann commented Jul 25, 2023

Dr-Emann commented Jul 25, 2023

Gottox commented Aug 8, 2023 •

edited

Loading

AgentD commented Aug 10, 2023

Gottox commented Sep 6, 2023

rdsquashfs either hangs or is very slow #118

rdsquashfs either hangs or is very slow #118

Comments

goverp commented Jul 25, 2023

AgentD commented Jul 25, 2023 • edited Loading

Dr-Emann commented Jul 25, 2023

Dr-Emann commented Jul 25, 2023

Gottox commented Aug 8, 2023 • edited Loading

AgentD commented Aug 10, 2023

Gottox commented Sep 6, 2023

AgentD commented Jul 25, 2023 •

edited

Loading

Gottox commented Aug 8, 2023 •

edited

Loading