-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(reporting): report meta-data information about chunks. #557
base: main
Are you sure you want to change the base?
Conversation
0b150ba
to
1cc1169
Compare
cfffb33
to
2ed0237
Compare
@e3krisztian implemented the changes we talked about and introduced a test. |
@@ -181,6 +181,7 @@ class ChunkReport(Report): | |||
end_offset: int | |||
size: int | |||
is_encrypted: bool | |||
metadata: dict = attr.ib(factory=dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering if we want to validate metadata dict, do we want to enforce that key is a string and value is of a certain type, or we are ok we anything, even nested meta-data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be great to somehow have a "namespace" or at least some convention on metadata variable naming?
What if we want to push data from multiple headers, or permissions, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with enforcing a convention on metadata variable naming. Having a namespace would be too complicated since we can't foresee the metadata field names used by handlers.
I would enforce that metadata is a dict without nested data, keys must be strings and values must be base types.
I would convey information about files created (timestamps, permissions, owner) with something different since it involves way more complex structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a validator. See 99944e3
unblob/handlers/archive/sevenzip.py
Outdated
@@ -70,4 +70,6 @@ def calculate_chunk(self, file: File, start_offset: int) -> Optional[ValidChunk] | |||
# We read the signature header here to get the offset to the header database | |||
first_db_header = start_offset + len(header) + header.next_header_offset | |||
end_offset = first_db_header + header.next_header_size | |||
return ValidChunk(start_offset=start_offset, end_offset=end_offset) | |||
return ValidChunk( | |||
start_offset=start_offset, end_offset=end_offset, metadata=header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to pass all attributes from the header as metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This point came up when discussing with @e3krisztian yesterday. I think it's better to only pass the most relevant header attributes rather than the whole instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 75548d2
5a7bf12
to
443985f
Compare
443985f
to
77cb778
Compare
77cb778
to
0f5d9f2
Compare
601f123
to
6ef2737
Compare
6ef2737
to
a312492
Compare
a312492
to
ef6e981
Compare
Allow handlers to provide a dict value as part of a ValidChunk metadata attribute. That dictionnary can contain any relevant metadata information from the perspective of the handler, but we advise handler writers to report parsed information such as header values. This metadata dict is later reported as part of our ChunkReports and available in the JSON report file if the user requested one. The idea is to expose metadata to further analysis steps through the unblob report. For example, a binary analysis toolkit would read the load address and architecture from a uImage chunk to analyze the file extracted from that chunk with the right settings. A note on the 'as_dict' implementation. The initial idea was to implement it in dissect.cstruct (see fox-it/dissect.cstruct#29), but due to expected changes in the project's API I chose to implement it in unblob so we're not dependent on another project.
ef6e981
to
f6bad66
Compare
Allow handlers to provide a dict value as part of a
ValidChunk
metadata attribute. That dictionary can contain any relevant metadata information from the perspective of the handler, but we advise handler writers to report parsed information such as header values.This metadata dict is later reported as part of our
ChunkReports
and available in the JSON report file if the user requested one.The idea is to expose metadata to further analysis steps through the unblob report. For example, a binary analysis toolkit would read the load address and architecture from a uImage chunk to analyze the file extracted from that chunk with the right settings.
A note on the 'as_dict' implementation.
The initial idea was to implement it in dissect.cstruct (see fox-it/dissect.cstruct#29), but due to expected changes in the project's API I chose to implement it in unblob so we're not dependent on another project.
Related to #16 and initial discussion in #16 (comment)
You can observe the changes like this: