Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower down the number of disk operations #94

Open
AndyWatterman opened this issue Nov 15, 2021 · 4 comments
Open

Lower down the number of disk operations #94

AndyWatterman opened this issue Nov 15, 2021 · 4 comments
Assignees

Comments

@AndyWatterman
Copy link

Have a nice day!

I find out that during process scanning "pe-sieve" as well as "mal_unpack" does a huge amount of disk operations. This is fine when you are inside VM at the physical machine. However, if you are in the sandbox environment which analysis any disk operation it causes a problem.

I did not analyze code, but I think the problem is comparing between mapped image and the original ones. Would it better to have a kind of cache of the most often used libraries? Or at least map library images to avoid disk operations? It might be an option?

@hasherezade
Copy link
Owner

Hi!
Caching indeed could be worth it in case if we scan multiple processes (i.e. in case of Hollows Hunter or mal_unpack). By this way we could reuse the previously loaded libraries if they occur in the next process. And in fact there are plenty of libraries that repeat in majority of the processes (ntdll, kernel32 etc.) - so there will be a performance gain.
That could benefit HollowsHunter the most, because PE-sieve (the standalone version) scans just one process at the time, so preloading the common libraries, or keeping them in memory after the scan would not make much sense.

Yet, the question occurs, is it worth it? By minimizing the reading from the disk, I could increase the speed - but with current model, the speed is still satisfactory, so I don't see it as a high priority. I see more benefits in low memory consumption, and keeping in RAM only what is necessary at the time.

I can understand that it is an issue when you use PE-sieve on a sandbox - but to be very honest, sandboxes are not the target environment to which this tool is dedicated. Not only because of the problem that you described, but also because of the fact that sandbox environment can generate many in-memory artifacts that will be picked up by PE-sieve unnecessarily, and generate noise. Also, the sandbox already monitors your API calls - so it can help you picking up the implants by other ways.

@AndyWatterman
Copy link
Author

AndyWatterman commented Nov 15, 2021

Thank you for your kindly reply.

Let me please to clarify a bit. To be more precise, it about 250k+ disk events in a few seconds. Is it a lot? Maybe not. However, for example, if we take common browser it generates about 20k+ operations in the first minute, then much less. Browsers are quite heavy today. If we take some kind of search tool it will be similar to "mal_unpack". Usually we run unpacker up to 10 minutes, so it is gonna be about ~100000k disk events. It includes reading, writing and spending physical resources.

Answering your question:

is it worth it?

It seems yes. Сonsidering the facts that (1) system DLLs load only once and "all programs share the same in-memory copy of code" (link), where only in case of changes new page will be allocated. Moreover, usually malware uses a limit number of DLLs (2), so you don't need to allocate a lot of memory.

So, it might be caching DLLs will not lead to high memory consumption, but will cause less resource wasting.

artifacts that will be picked up by PE-sieve unnecessarily

This is not really necessary. It depends on the sandbox.

Anyway, I highly appreciate your great efforts in developing this project. Thank you.

@hasherezade
Copy link
Owner

Thank you for your remarks.

It seems yes. Сonsidering the facts that (1) system DLLs load only once and "all programs share the same in-memory copy of code" (link), where only in case of changes new page will be allocated.

It is not that simple in this case. Each DLL, before it can be compared, has to be loaded manually and preprocessed. The modules that I am scanning cannot be loaded into PE-sieve just by LoadLibrary, because of various reasons. And even if it was fine to load them like this, they still take up space withing the process memory, and you still need to read the file to load them.

Moreover, usually malware uses a limit number of DLLs (2), so you don't need to allocate a lot of memory.

PE-sieve scans various processes, and not all of them use small number of DLLs, so this argument doesn't really hold. Also I can't really agree that malware uses small number of DLLs. Malware usually has small number of DLLs in the import table - but can load a lot of DLLs as it runs, and PE-sieve has to scan all of them.

Yet, I do understand the problem:

To be more precise, it about 250k+ disk events in a few seconds. Is it a lot? Maybe not. However, for example, if we take common browser it generates about 20k+ operations in the first minute, then much less. Browsers are quite heavy today. If we take some kind of search tool it will be similar to "mal_unpack". Usually we run unpacker up to 10 minutes, so it is gonna be about ~100000k disk events. It includes reading, writing and spending physical **resources.

I will try to minimize it, first by refactoring the existing code, then, eventually by adding the caching.
So far my priority was a low memory consumption, so I was freeing the memory immediately when I stopped using it, even if it meant re-reading some files further.
I know there are places in the code where the number of reads/writes could be reduced. For now I didn't treat it as a high priority, but I will change it.

Also, I can agree to cache some of the most often used DLLs, such as:

  • ntdll.dll
  • kernel32.dll
  • user32.dll
  • maybe some other system DLLs

Honestly speaking, I didn't plan PE-sieve to be used on sandboxes, but I always try to come forward user's needs.

@hasherezade hasherezade self-assigned this Nov 17, 2021
@hasherezade hasherezade changed the title A lot of disk operations Lower down the number of disk operations Nov 17, 2021
hasherezade added a commit that referenced this issue Jan 9, 2022
@hasherezade
Copy link
Owner

hasherezade commented Jan 9, 2022

@AndyWatterman - I implemented the caching, please check it out and share your opinion! It can be enabled in hollows_hunter and mal_unpack by adding a parameter /cache.
You can get the latest builds from the AppVeyor server (as described here )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants