Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdsquashfs either hangs or is very slow #118

Open
goverp opened this issue Jul 25, 2023 · 6 comments
Open

rdsquashfs either hangs or is very slow #118

goverp opened this issue Jul 25, 2023 · 6 comments

Comments

@goverp
Copy link

goverp commented Jul 25, 2023

I'm using squashfs-tools-ng v1.2.0 on Gentoo on an amd64 machine with lots of memory and a Zen 3 chip. I have a 70 MB file that's a squashfs version (lzo compressed) of a 190 MB directory tree with very many small files (146,000 inodes). For some testing I wanted the original uncompressed tree, so I ran "rdsquashfs -qu / foo".

It appeared to run very slowly (without the -q, the screen filled rapidly with the names of files, as expected, but there are rather a lot). After an age I killed it with Ctrl-C. The top level directories appeared to all exist - I don't know if they were fully populated. I repeated the extract, assuming I hadn't given enough time, or something, but it was still running after more than an hour. "top" showed no significant processing; "iotop" showed rdsquashfs was the heaviest I/O consumer, but only doing 100-200 KB/sec (my 5-disk RAID10 system can achieve 400 MB/sec, so it's not that holding it up).

At this point I realised I could do what I wanted by mounting the squash image and reading it as input (Doh!) - I didn't need to run rdsquashfs at all. This was goodness, as I could read and process the entire directory tree in less than a second! But that leaves something weird in rdsquashfs!

I don't know how the squashfs image was created - it's a Gentoo portage snapshot from a Gentoo mirror, for example:
https://www.mirrorservice.org/sites/distfiles.gentoo.org/snapshots/squashfs/gentoo-20230713.lzo.sqfs

@AgentD
Copy link
Owner

AgentD commented Jul 25, 2023

Hi,

if you are unpacking the entire image, that is going to be slower than mounting it and accessing it. rdsquashfs essentially does the following:

  1. The entire directory tree is scanned and reconstructed in memory
  2. It is sorted and sanity checked (i.e. no two files with the same name in a directory; if one of them was a symlink, this could be used for directory traversal, a well known issue with archiving programs)
  3. The directory tree is recursively created on the output filesystem
  4. The files are sorted so that the image is accessed mostly sequentially and tail-end blocks don't have to be unpacked several times over
  5. The files are unpacked.

In contrast, if you mount the image, only step 1 one happens. It also happens asynchronously, on demand as you start traversing directories. If you don't access the file contents, no file blocks have to be unpacked either, only the meta data blocks from the inode and directory table. The SquashFS kernel driver furthermore has a multi thread decompressor queue, and caches meta data blocks.

If you are only interested in inspecting directory listings, rdsquashfs -l <path> <image> produces a tar-style listing of a selected directory.

Alternatively, rdsquashfs -d <image> produces a listing of the entire image, intended to be compatible with the input format for gensquashfs, i.e. you'll get lines of the shape <type> <path> <mode> <uid> <gid> <extra>. For the image you linked to, producing such a listing takes about a second of pre-processing time on my 6 year old laptop, as it recurses through the directory tree.

@Dr-Emann
Copy link
Contributor

Over an hour for a 190 MB directory tree seems excessive though.

unsquashfs unpacks the same image in about 3 seconds, and I gave up after a few minutes with rdsquashfs, something seems off.

@Dr-Emann
Copy link
Contributor

Ah, needed to wait a little more, not seeing over an hour here, but still pretty long:

Executed in  198.97 secs    fish           external
   usr time    3.35 secs    0.00 micros    3.35 secs
   sys time   16.46 secs  780.00 micros   16.46 secs

@Gottox
Copy link
Contributor

Gottox commented Aug 8, 2023

  1. It is sorted and sanity checked (i.e. no two files with the same name in a directory; if one of them was a symlink, this could be used for directory traversal, a well known issue with archiving programs)

Do you have a testcase for this issue or a malformed sqfs archive?

@AgentD
Copy link
Owner

AgentD commented Aug 10, 2023

@Gottox there is an intentionally broken archive in https://github.com/AgentD/squashfs-tools-ng/blob/master/bin/rdsquashfs/test/pathtraversal.sqfs, along with a script that runs rdsquashfs to unpack it and checks if the file in question was created. This test is run by make check along with all the other unit & integration tests.

unsquashfs from squashfs-tools also guards against this kind of issue. There allegedly are "extensive tests" run before releases, but but I'm not aware of any publicly available test suites.

Other archivers guard against this as well (e.g. GNU tar, BusyBox tar, ....), as this kind of problem plagues pretty much every format that supports symlinks.

@Gottox
Copy link
Contributor

Gottox commented Sep 6, 2023

Thanks @AgentD!

libsqsh does sanity checking while extracting, not beforehand, I guess that's a faster approach at the cost of accepting some malformed archives. So, in that regard, it's just as secure as tar. sqsh-unpack uses mkstemp-extract-rename semantic to prevent writing through symlinks. That means doing the check in the library isn't needed.

Personally, I doubt that squashfs-tools has a decent test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants