-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for voidtools everything DB #515
base: main
Are you sure you want to change the base?
Conversation
90979c3
to
a228c74
Compare
I've now added support for all filesystem types supported by Everything stable (Currently NTFS/REFS/EFU/Folder), along with tests for each. When I have some more time I'll add support for more versions (Everything 1.5.0alpha currently uses version 1.7.49 and also supports FAT, network drives, and network indexes) |
Really cool PR! Since this is another big one, please give it some time for us to do the review :). Stay tuned! |
tests/_data/plugins/os/windows/everything/Everything_NTFS_ONLY.db
Outdated
Show resolved
Hide resolved
Hey, thanks for the review. The only request I haven't worked on yet is the request regarding using dissect.cstruct. I'll have to think a bit about how to implement it, because of differences between structs for multiple versions. I'd be happy to hear thoughts about how I handled different versions in the code (I'm not quite happy about with the version handling). |
f5d0fb9
to
b101283
Compare
Hey! I haven't been following any project changes for a while, so if I need to change anything let me know (the tests pass 🤷). I'm resolving the previous comments as they are no longer relevant (and are all fixed in this implementation) |
@cobyge yes, it has been a while! Thanks for the time you took to change your implementation. We plan to make time so we can properly review it soon :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some time to review your code, and here you go.
I also found a weird inconsistency. After running it multiple times on static data, the number of records kept changing. So there is something weird going on there.
from .everything import EverythingPlugin | ||
|
||
__all__ = ["EverythingPlugin"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change the file endings inside this file? It currently uses the windows style file endings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can merge #788 first, this could be done a lot nicer.
for path in self.find_user_files(): | ||
self.configs.append(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for path in self.find_user_files(): | |
self.configs.append(path) | |
self.configs.extend(self.find_user_files()) |
for path in self.target.fs.path().glob(path_option): | ||
if path.exists(): | ||
self.configs.append(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the glob
only returns paths that actually patch your pattern. So the check whether it exists is a bit redundant.
for path in self.target.fs.path().glob(path_option): | |
if path.exists(): | |
self.configs.append(path) | |
self.configs.extend(self.target.fs.path().glob(path_option)) |
for db in user_details.home_path.glob(self.USER_PATH): | ||
if db.exists(): | ||
yield db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing about the if statement as above. And you can reduce the amount of indentation in this instance with:
for db in user_details.home_path.glob(self.USER_PATH): | |
if db.exists(): | |
yield db | |
yield from user_details.home_path.glob(self.USER_PATH) |
_target=self.target, | ||
) | ||
except (NotImplementedError, ValueError) as e: | ||
logger.warning("Invalid EverythingDB %s: %s", path, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For plugins, we try to put out logging related to the target that it is executed on. Hence we do:
logger.warning("Invalid EverythingDB %s: %s", path, e) | |
self.target.log.warning("Invalid EverythingDB %s: %s", path, e) |
folder.date_accessed = self.read_u64() | ||
if self.header.flag_has_attributes: | ||
folder.attributes = self.read_u32() | ||
if isinstance(self.filesystem_list[folder.fs_index], EverythingREFS): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(self.filesystem_list[folder.fs_index], EverythingREFS): | |
if isinstance(self.filesystem_list[folder.fs_index], EverythingREFS): |
if folder.parent_index is None: | ||
# The EFU format does not contain the root drive, so it just puts random data into | ||
# the metadata. This will cause errors if passed to flow.record, so we remove it here | ||
folder.date_accessed = None | ||
folder.date_modified = None | ||
folder.date_created = None | ||
folder.size = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if folder.parent_index is None: | |
# The EFU format does not contain the root drive, so it just puts random data into | |
# the metadata. This will cause errors if passed to flow.record, so we remove it here | |
folder.date_accessed = None | |
folder.date_modified = None | |
folder.date_created = None | |
folder.size = None | |
if folder.parent_index: | |
continue | |
# The EFU format does not contain the root drive, so it just puts random data into | |
# the metadata. This will cause errors if passed to flow.record, so we remove it here | |
folder.date_accessed = None | |
folder.date_modified = None | |
folder.date_created = None | |
folder.size = None |
# index of filesystem | ||
# This is later used to build a hierarchy for folders | ||
folder_list = [EverythingIndexObj() for _ in range(self.header.number_of_folders)] | ||
for i, folder in enumerate(folder_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i is currently being unused, so I would remove the whole enumerate
here too.
trunc_from_prev = self.read_byte_or_4() | ||
if trunc_from_prev > len(temp_buf): | ||
raise ValueError(f"Error while parsing file name {trunc_from_prev} > {len(temp_buf)}") | ||
temp_buf = temp_buf[: len(temp_buf) - trunc_from_prev] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that it keeps reusing tmp_buf here? Wasn't it already filled in at line 412 repeatedly.
Is there no chance that it will corrupt the info in these iterations?
// Flags | ||
uint32_t flag_has_file_size:1; | ||
uint32_t flag_has_date_created:1; | ||
uint32_t flag_has_date_modified:1; | ||
uint32_t flag_has_date_accessed:1; | ||
uint32_t flag_has_attributes:1; | ||
uint32_t flag_has_folder_size:1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of doing it like this, look into the cstruct Flags
That way you can interact with these flags as you'd a python flag type. And would eventually reduce the number of lines of code
Inspired by #505, I remembered I had some code lying around to parse the database of Voidtools Everything, very similar to mlocate/plocate, but for Windows.
I updated the code and added it to the codebase.
Because Everything is closed source, this is completely based off of reverse-engineering the code, and I haven't found any reference implementation on the internet to help (AFAIK this is the only parser), so this is all based off of my (not too great) reversing skills.
I've tested this on ~10 random database files I had lying around, from multiple computers, all of them have given exactly the same exact results as Everything itself (checked by exporting to CSV and comparing md5sums).
It should support any DB created since 2017, and if given a broken file, I'm willing to add support for earlier versions as well.
All comments are mine, written while reversing the code.
This is relatively slow code (takes 4.5 seconds for a DB with 126828 files),
I have a version written in Rust which is 22 times faster, and if that's something you are interested in, then I'm happy to try creating bindings with Py03.