Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for voidtools everything DB #515

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cobyge
Copy link
Contributor

@cobyge cobyge commented Jan 26, 2024

Inspired by #505, I remembered I had some code lying around to parse the database of Voidtools Everything, very similar to mlocate/plocate, but for Windows.

I updated the code and added it to the codebase.
Because Everything is closed source, this is completely based off of reverse-engineering the code, and I haven't found any reference implementation on the internet to help (AFAIK this is the only parser), so this is all based off of my (not too great) reversing skills.
I've tested this on ~10 random database files I had lying around, from multiple computers, all of them have given exactly the same exact results as Everything itself (checked by exporting to CSV and comparing md5sums).
It should support any DB created since 2017, and if given a broken file, I'm willing to add support for earlier versions as well.

All comments are mine, written while reversing the code.

This is relatively slow code (takes 4.5 seconds for a DB with 126828 files),
I have a version written in Rust which is 22 times faster, and if that's something you are interested in, then I'm happy to try creating bindings with Py03.

@cobyge cobyge force-pushed the feature/add-everything-plugin branch 3 times, most recently from 90979c3 to a228c74 Compare January 27, 2024 23:14
@cobyge
Copy link
Contributor Author

cobyge commented Jan 27, 2024

I've now added support for all filesystem types supported by Everything stable (Currently NTFS/REFS/EFU/Folder), along with tests for each.

When I have some more time I'll add support for more versions (Everything 1.5.0alpha currently uses version 1.7.49 and also supports FAT, network drives, and network indexes)

@Horofic Horofic self-requested a review January 28, 2024 16:12
@Horofic
Copy link
Contributor

Horofic commented Jan 29, 2024

Really cool PR! Since this is another big one, please give it some time for us to do the review :). Stay tuned!

@Schamper Schamper self-requested a review January 29, 2024 23:00
pyproject.toml Outdated Show resolved Hide resolved
dissect/target/helpers/locate/everything.py Outdated Show resolved Hide resolved
dissect/target/helpers/locate/everything.py Outdated Show resolved Hide resolved
dissect/target/helpers/locate/everything.py Outdated Show resolved Hide resolved
dissect/target/helpers/locate/everything.py Outdated Show resolved Hide resolved
dissect/target/plugins/os/windows/everything.py Outdated Show resolved Hide resolved
dissect/target/plugins/os/windows/everything.py Outdated Show resolved Hide resolved
dissect/target/plugins/os/windows/everything.py Outdated Show resolved Hide resolved
dissect/target/plugins/os/windows/everything.py Outdated Show resolved Hide resolved
@cobyge
Copy link
Contributor Author

cobyge commented Mar 2, 2024

Hey, thanks for the review.
I updated the code according to your request, and I've also added support for a previous version of Everything, in order to what supporting multiple versions might look like.

The only request I haven't worked on yet is the request regarding using dissect.cstruct. I'll have to think a bit about how to implement it, because of differences between structs for multiple versions.

I'd be happy to hear thoughts about how I handled different versions in the code (I'm not quite happy about with the version handling).

@Horofic Horofic requested a review from Miauwkeru July 19, 2024 08:45
@Horofic Horofic removed their request for review August 1, 2024 14:56
@EinatFox EinatFox linked an issue Aug 6, 2024 that may be closed by this pull request
@cobyge cobyge force-pushed the feature/add-everything-plugin branch from f5d0fb9 to b101283 Compare January 17, 2025 23:27
@cobyge
Copy link
Contributor Author

cobyge commented Jan 17, 2025

Hey!
It's been a while, but I finally got around to rewriting this with a simpler (better?) implementation, using cstruct.

I haven't been following any project changes for a while, so if I need to change anything let me know (the tests pass 🤷).

I'm resolving the previous comments as they are no longer relevant (and are all fixed in this implementation)
Hope we can get this pushed sometime soon 😄

@Miauwkeru
Copy link
Contributor

@cobyge yes, it has been a while! Thanks for the time you took to change your implementation. We plan to make time so we can properly review it soon :)

Copy link
Contributor

@Miauwkeru Miauwkeru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some time to review your code, and here you go.

I also found a weird inconsistency. After running it multiple times on static data, the number of records kept changing. So there is something weird going on there.

Comment on lines +1 to +3
from .everything import EverythingPlugin

__all__ = ["EverythingPlugin"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change the file endings inside this file? It currently uses the windows style file endings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can merge #788 first, this could be done a lot nicer.

Comment on lines +50 to +51
for path in self.find_user_files():
self.configs.append(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for path in self.find_user_files():
self.configs.append(path)
self.configs.extend(self.find_user_files())

Comment on lines +46 to +48
for path in self.target.fs.path().glob(path_option):
if path.exists():
self.configs.append(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the glob only returns paths that actually patch your pattern. So the check whether it exists is a bit redundant.

Suggested change
for path in self.target.fs.path().glob(path_option):
if path.exists():
self.configs.append(path)
self.configs.extend(self.target.fs.path().glob(path_option))

Comment on lines +55 to +57
for db in user_details.home_path.glob(self.USER_PATH):
if db.exists():
yield db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing about the if statement as above. And you can reduce the amount of indentation in this instance with:

Suggested change
for db in user_details.home_path.glob(self.USER_PATH):
if db.exists():
yield db
yield from user_details.home_path.glob(self.USER_PATH)

_target=self.target,
)
except (NotImplementedError, ValueError) as e:
logger.warning("Invalid EverythingDB %s: %s", path, e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For plugins, we try to put out logging related to the target that it is executed on. Hence we do:

Suggested change
logger.warning("Invalid EverythingDB %s: %s", path, e)
self.target.log.warning("Invalid EverythingDB %s: %s", path, e)

folder.date_accessed = self.read_u64()
if self.header.flag_has_attributes:
folder.attributes = self.read_u32()
if isinstance(self.filesystem_list[folder.fs_index], EverythingREFS):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(self.filesystem_list[folder.fs_index], EverythingREFS):
if isinstance(self.filesystem_list[folder.fs_index], EverythingREFS):

Comment on lines +436 to +442
if folder.parent_index is None:
# The EFU format does not contain the root drive, so it just puts random data into
# the metadata. This will cause errors if passed to flow.record, so we remove it here
folder.date_accessed = None
folder.date_modified = None
folder.date_created = None
folder.size = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if folder.parent_index is None:
# The EFU format does not contain the root drive, so it just puts random data into
# the metadata. This will cause errors if passed to flow.record, so we remove it here
folder.date_accessed = None
folder.date_modified = None
folder.date_created = None
folder.size = None
if folder.parent_index:
continue
# The EFU format does not contain the root drive, so it just puts random data into
# the metadata. This will cause errors if passed to flow.record, so we remove it here
folder.date_accessed = None
folder.date_modified = None
folder.date_created = None
folder.size = None

# index of filesystem
# This is later used to build a hierarchy for folders
folder_list = [EverythingIndexObj() for _ in range(self.header.number_of_folders)]
for i, folder in enumerate(folder_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i is currently being unused, so I would remove the whole enumerate here too.

trunc_from_prev = self.read_byte_or_4()
if trunc_from_prev > len(temp_buf):
raise ValueError(f"Error while parsing file name {trunc_from_prev} > {len(temp_buf)}")
temp_buf = temp_buf[: len(temp_buf) - trunc_from_prev]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that it keeps reusing tmp_buf here? Wasn't it already filled in at line 412 repeatedly.
Is there no chance that it will corrupt the info in these iterations?

Comment on lines +54 to +60
// Flags
uint32_t flag_has_file_size:1;
uint32_t flag_has_date_created:1;
uint32_t flag_has_date_modified:1;
uint32_t flag_has_date_accessed:1;
uint32_t flag_has_attributes:1;
uint32_t flag_has_folder_size:1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of doing it like this, look into the cstruct Flags That way you can interact with these flags as you'd a python flag type. And would eventually reduce the number of lines of code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Added support for voidtools everything DB PR#515
4 participants