Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RzShell: refactor string, regex and byte search #4762

Open
wants to merge 146 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
146 commits
Select commit Hold shift + click to select a range
2a0b3e9
Sorry mister, we don't do search no longer.
Rot127 Dec 10, 2024
2893fde
Add RzShell handlers for all search commands.
Rot127 Dec 10, 2024
4fc8622
Import new RzSearch from reference branch.
Rot127 Dec 11, 2024
542c0b5
Reimplement hex bytes (/x) search.
Rot127 Dec 11, 2024
2c726fd
Add search test within debug maps.
Rot127 Jan 1, 2025
976bdd4
Make BytesPattern implementation private.
Rot127 Jan 1, 2025
5fb3ca9
Use RzBuffer for byte search.
Rot127 Jan 2, 2025
edfd934
Hex byte search: Add test for search of all file content.
Rot127 Jan 3, 2025
9e9afdd
Add string search implementation.
Rot127 Jan 2, 2025
b08ccef
Clean up error handling in cmd_string_search_generic()
Rot127 Jan 4, 2025
cc130d5
Add encoding details to help
Rot127 Jan 4, 2025
3a195ed
Make the default encoding value 'guess'
Rot127 Jan 4, 2025
28999d4
Replace if with switch
Rot127 Jan 4, 2025
e168f6d
Fix some logic bugs
Rot127 Jan 4, 2025
4a1dbc4
Make an assumption in process_one_string explicit.
Rot127 Jan 4, 2025
cbfd30c
Add string scan function using the whole buffer.
Rot127 Jan 4, 2025
0852426
Use rz_scan_strings_whole_buf() instead of other scan methods.
Rot127 Jan 4, 2025
656a486
Move the strbuf buffer into the function to reduce complexity.
Rot127 Jan 4, 2025
e81384a
Fix types and includes.
Rot127 Jan 20, 2025
2796766
Base string search on regex expressions.
Rot127 Jan 20, 2025
be2d846
Add test cases
Rot127 Jan 20, 2025
857f588
Add UTF8 search tests
Rot127 Jan 21, 2025
466b1af
Add UTF-16-be tests.
Rot127 Jan 21, 2025
5320942
Use renamed files
Rot127 Jan 21, 2025
e9913d8
Fix typo
Rot127 Jan 21, 2025
95cc6e8
Clearify in regex header that flags are only used for compilation.
Rot127 Jan 21, 2025
c88077b
Add utf16le tests
Rot127 Jan 21, 2025
f0e2d53
Add utf32 string search tests.
Rot127 Jan 22, 2025
ff6df9c
Fix utf16 encode to allow codepoints up to 10ffff.
Rot127 Jan 22, 2025
4decb92
Only print until non-printable character.
Rot127 Jan 22, 2025
da8851f
Add test for UTF8 to check offset of string.
Rot127 Jan 23, 2025
d0ce01b
Mover error case test to the top
Rot127 Jan 23, 2025
a55674a
Fix offsets and length of detected strings.
Rot127 Jan 23, 2025
cad4fe3
Some clean ups of doxygen
Rot127 Jan 23, 2025
ca1aa2a
Add dots to doxygen
Rot127 Jan 24, 2025
06518af
Add test for ibm037 string
Rot127 Jan 24, 2025
4ee7a7f
Add test for very small string search.
Rot127 Jan 24, 2025
830fe24
Fix rebase mistakes.
Rot127 Jan 27, 2025
fb23d86
Add psu command to force UTF-8 string printing.
Rot127 Jan 27, 2025
2781f1f
Add function to check for valid UTF-32 code points.
Rot127 Jan 27, 2025
2ad24f2
Add test for case insensitive UTF8.
Rot127 Jan 27, 2025
478f2f0
Add EBCDIC code point validator.
Rot127 Jan 28, 2025
6ade200
Add look ahead ability to utf32 check.
Rot127 Jan 29, 2025
7279215
Add a validator function for UTF-16 chars.
Rot127 Jan 29, 2025
ff91a44
Fix check
Rot127 Jan 29, 2025
39efb54
Change UTF-16 code point check to printable check.
Rot127 Jan 29, 2025
98f432d
Change UTF-16 detection method to check at least 2 characters.
Rot127 Jan 29, 2025
39df427
Remove unecessary code after string detection is now more precise.
Rot127 Jan 29, 2025
b0561cd
Add macro for last unicode code point.
Rot127 Jan 29, 2025
030ae28
Fix alignment of UTF-8 strings to their memory representation.
Rot127 Jan 29, 2025
d2bd918
Add extended regex string search on UTF-32LE strings.
Rot127 Jan 29, 2025
459bbb8
Make process_on_string independent on buffer size.
Rot127 Jan 29, 2025
35461e9
Skip offset map if strings are expected to be UTF-8 anyways.
Rot127 Jan 29, 2025
66f7ea6
Set thread number to maximum CPU cores, not threads.
Rot127 Jan 30, 2025
725af0d
Remove duplicate if condition.
Rot127 Jan 30, 2025
817554e
Remove unused str.search.max_threads all together.
Rot127 Jan 30, 2025
d841437
Fix OOB write for UTF8 strings.
Rot127 Jan 30, 2025
baaf7c7
Add ability to print any supported string encoding with 'ps'.
Rot127 Jan 30, 2025
57599ad
Fix: Rename last RzRune occurrances.
Rot127 Jan 30, 2025
6f6198a
Print whole block buffer, if the user wants it.
Rot127 Jan 30, 2025
2806a33
Fix leaks
Rot127 Jan 31, 2025
bd22020
Check empty condition before sorting vectors.
Rot127 Jan 31, 2025
e30943d
Fix UTF-32 decoding.
Rot127 Feb 1, 2025
698b140
Rename cp -> code_point
Rot127 Feb 2, 2025
676272a
Revert string detection heuristics to use the only ASCII checking ones.
Rot127 Feb 2, 2025
104d69d
Fix leaks
Rot127 Feb 2, 2025
2c28595
Remove dead code (after fixing the decode functions.)
Rot127 Feb 2, 2025
627e227
Fix tests with updated binaries
Rot127 Feb 2, 2025
559e724
Move all search command help messages to RzShell.
Rot127 Feb 2, 2025
eba1afc
Fix always true condition.
Rot127 Feb 2, 2025
04211c4
Fix: Add command handler for /as, /af, /at
Rot127 Feb 3, 2025
da33a2a
Cut '/' from input string because the legacy handler expects this.
Rot127 Feb 3, 2025
a87ab32
Make offset and size arguments optional for /F.
Rot127 Feb 3, 2025
7a6580f
Move RzSearchHit free() to IPI
Rot127 Feb 3, 2025
b5408e3
Document rz_strbuf_drain.
Rot127 Feb 3, 2025
efb2f69
Clarify that the cmd.hit command is not run when the hit is found.
Rot127 Feb 3, 2025
128c830
Move search hit flag building into function
Rot127 Feb 3, 2025
049452d
Give the byte search a hit description.
Rot127 Feb 3, 2025
3041e71
Clang-format
Rot127 Feb 3, 2025
4138723
Fix leaks
Rot127 Feb 3, 2025
73f06e4
Add min/max values to /v and /V commands.
Rot127 Feb 3, 2025
5a5d743
Fix tests: Adds 0x prefix and test for exacped strings.
Rot127 Feb 3, 2025
25ac7cc
Fix search tests with latest commands
Rot127 Feb 3, 2025
576bf36
Fix race condition and double frees.
Rot127 Feb 3, 2025
a5e18cd
Enhance error message if string offset map is faulty.
Rot127 Feb 3, 2025
7c7206d
Enhance doxygen of RzDetectedString.
Rot127 Feb 3, 2025
ef60f86
Add JSON output to legacy /aa
Rot127 Feb 3, 2025
bd63614
Document behavior of byte search for odd number of nibbles.
Rot127 Feb 3, 2025
8ef27ce
Remove progress output
Rot127 Feb 3, 2025
e67d596
Fix UB in case LSHIFT(1) == 31.
Rot127 Feb 3, 2025
d22eac6
Set minimal string length to 3 to allow finding ELF
Rot127 Feb 3, 2025
cdcfe43
Add more json outputs
Rot127 Feb 3, 2025
c2b73b5
Fix some tests which were invalidly formatted.
Rot127 Feb 3, 2025
1243363
Fix, don't append a second j to the command
Rot127 Feb 3, 2025
94305af
Fix two NULL passes to rz_hex_str2bin
Rot127 Feb 3, 2025
f78b5f9
Move string search settings: str.search -> search.str
Rot127 Feb 4, 2025
88683b3
Add /xr regex byte search.
Rot127 Feb 4, 2025
d8a636a
Fix unit test for projecct loading.
Rot127 Feb 4, 2025
4aa571e
Add string encoding to hit flag.
Rot127 Feb 4, 2025
76c347f
Change to new hit flag format
Rot127 Feb 4, 2025
06121f9
Fix tests which emit more due to increased block size for search.
Rot127 Feb 4, 2025
89d60cf
Fixup cmd_search_hint
Rot127 Feb 4, 2025
4aad948
Fix: Add global address offset to hit.
Rot127 Feb 4, 2025
00c434b
Use new string search commands.
Rot127 Feb 4, 2025
9511172
Rename search.str.buf_size to max_length.
Rot127 Feb 6, 2025
64d98e0
Move duplicated code into static function.
Rot127 Feb 6, 2025
b89bb09
Reduce block size again to old value.
Rot127 Feb 6, 2025
5fe207c
Set block size in tests instead of increasing the default one.
Rot127 Feb 6, 2025
1227e46
By default search for literal string, not for regex and for the encod…
Rot127 Feb 7, 2025
3c215f6
Add unit test for ibm290 scanning.
Rot127 Feb 7, 2025
92498fd
Rename search.str.encoding to str.encoding.
Rot127 Feb 7, 2025
9ab8528
Set default encoding to UTF-8
Rot127 Feb 7, 2025
648483e
Fix project migration
Rot127 Feb 7, 2025
0734c35
Fix tests
Rot127 Feb 7, 2025
2ee205c
Sort result because it might change order due to multithreading
Rot127 Feb 7, 2025
e40b819
Add example for extended regular expressions.
Rot127 Feb 7, 2025
476ec46
Remove unused min_uni_blocks setting and move the number into a const…
Rot127 Feb 7, 2025
1a439dc
Document that raw_alignment is not used for normal search.
Rot127 Feb 7, 2025
c477670
Implement alignment matching of search hits (incomplete).
Rot127 Feb 7, 2025
30d97a5
Finish alignment search.
Rot127 Feb 8, 2025
98669df
Fix some more tests.
Rot127 Feb 8, 2025
203d1c5
Add EBCDIC tests
Rot127 Feb 8, 2025
00a261d
Fix error case where hits is undefined.
Rot127 Feb 8, 2025
e7ca9df
Remove addrmod member.
Rot127 Feb 9, 2025
7980f09
Fix: Don't print none-printable strings (e.g. only NUL byte strings).
Rot127 Feb 9, 2025
efa98fb
Check if flag at refaddr is a string before printing it as such.
Rot127 Feb 9, 2025
80ed949
Revert default string encoding setting to UTF-8 (breaks many tests).
Rot127 Feb 9, 2025
aa19d6c
Fix rebase changes
Rot127 Feb 10, 2025
7c71568
Revert "Check if flag at refaddr is a string before printing it as su…
Rot127 Feb 10, 2025
073c6ed
Fix check setting invalid encoding.
Rot127 Feb 10, 2025
db6c783
Fix tests which showed incorrect encoding enscaping prefix.
Rot127 Feb 10, 2025
a75965b
Fix unititialized condition for hits.
Rot127 Feb 10, 2025
fe8a4b7
Forbit to set settings as encoding.
Rot127 Feb 10, 2025
d30d8ce
Handle the 'settings' case in code search.
Rot127 Feb 10, 2025
288ba04
Don't show search progress by default for tests.
Rot127 Feb 10, 2025
e831bee
Respect match_overlap option.
Rot127 Feb 10, 2025
921e5ae
Fix all simple tests which mostly rename things.
Rot127 Feb 10, 2025
ab49885
Remove duplicate overlap setting
Rot127 Feb 10, 2025
c1e68af
Fix now completely detected string.
Rot127 Feb 10, 2025
dd9aa63
Fix docs
Rot127 Feb 10, 2025
05ea931
Fix regex flags parsing.
Rot127 Feb 10, 2025
4362520
Remove legacy regex search.
Rot127 Feb 10, 2025
7e0381d
Apply review suggestions
Rot127 Feb 10, 2025
b9d2d3b
Move regex flag utility function to rz_regex.
Rot127 Feb 10, 2025
506727c
Update Unicode to version 16.0.0.
Rot127 Feb 11, 2025
2b945d1
Add ps option to print until the first non-printable code point.
Rot127 Feb 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions binrz/rz-test/run.c
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ static RzSubprocessOutput *run_rz_test(RzTestRunConfig *config, ut64 timeout_ms,
rz_pvector_push(&args, "-escr.color=0");
rz_pvector_push(&args, "-escr.interactive=0");
rz_pvector_push(&args, "-eflirt.sigdb.load.system=false");
rz_pvector_push(&args, "-esearch.show_progress=false");
rz_pvector_push(&args, "-eflirt.sigdb.load.home=false");
rz_pvector_push(&args, "-N");
RzListIter *it;
Expand Down
1 change: 1 addition & 0 deletions librz/arch/asm.c
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,7 @@ static Ase findAssembler(RzAsm *a, const char *kw) {
if (assemblerMatches(a, h)) {
if (kw) {
if (strstr(h->name, kw)) {
rz_iterator_free(iter);
return h->assemble;
}
} else {
Expand Down
3 changes: 1 addition & 2 deletions librz/arch/data.c
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@ static bool get_string(const ut8 *buf, int size, RzDetectedString **dstr, RzStrE
}

RzUtilStrScanOptions opt = {
.buf_size = size,
.max_uni_blocks = 4,
.max_str_length = size,
.min_str_length = 4,
.prefer_big_endian = big_endian,
.check_ascii_freq = false,
Expand Down
2 changes: 1 addition & 1 deletion librz/arch/p/analysis/analysis_arm_cs.c
Original file line number Diff line number Diff line change
Expand Up @@ -1671,7 +1671,7 @@ jmp $$ + 4 + ( [delta] * 2 )
#endif
// 0x000082a8 28301be5 ldr r3, [fp, -0x28]
if (INSOP(1).mem.scale != -1) {
op->scale = INSOP(1).mem.scale << LSHIFT(1);
op->scale = (ut64)INSOP(1).mem.scale << LSHIFT(1);
}
op->ireg = cs_reg_name(handle, REGBASE(1));
op->disp = MEMDISP(1);
Expand Down
9 changes: 3 additions & 6 deletions librz/bin/bfile_string.c
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,7 @@ static RzList /*<RzDetectedString *>*/ *string_scan_range(SharedData *shared, co
size_t buffer_size = RZ_MIN(shared->buffer_size, interval_size);

RzUtilStrScanOptions scan_opt = {
.buf_size = buffer_size,
.max_uni_blocks = shared->max_uni_blocks,
.max_str_length = buffer_size,
.min_str_length = shared->min_str_length,
.prefer_big_endian = shared->prefer_big_endian,
.check_ascii_freq = shared->check_ascii_freq,
Expand Down Expand Up @@ -312,8 +311,7 @@ RZ_API void rz_bin_string_search_opt_init(RZ_NONNULL RzBinStringSearchOpt *opt)
rz_return_if_fail(opt);
opt->max_threads = RZ_THREAD_N_CORES_ALL_AVAILABLE;
opt->min_length = RZ_BIN_STRING_SEARCH_MIN_STRING;
opt->buffer_size = RZ_BIN_STRING_SEARCH_BUFFER_SIZE;
opt->max_uni_blocks = RZ_BIN_STRING_SEARCH_MAX_UNI_BLOCKS;
opt->max_length = RZ_BIN_STRING_SEARCH_BUFFER_SIZE;
opt->max_region_size = RZ_BIN_STRING_SEARCH_MAX_REGION_SIZE;
opt->raw_alignment = RZ_BIN_STRING_SEARCH_RAW_FILE_ALIGNMENT;
opt->string_encoding = RZ_STRING_ENC_GUESS;
Expand Down Expand Up @@ -464,9 +462,8 @@ RZ_API RZ_OWN RzPVector /*<RzBinString *>*/ *rz_bin_file_strings(RZ_NONNULL RzBi
.lock = lock,
.bf = bf,
.strings_db = strings_db,
.buffer_size = opt->buffer_size,
.buffer_size = opt->max_length,
.string_encoding = opt->string_encoding,
.max_uni_blocks = opt->max_uni_blocks,
.min_str_length = opt->min_length,
.check_ascii_freq = opt->check_ascii_freq,
.prefer_big_endian = prefer_big_endian,
Expand Down
3 changes: 1 addition & 2 deletions librz/core/canalysis.c
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,7 @@ static bool find_string_at(RzCore *core, RzBinObject *bobj, ut64 pointer, char *

RzStrEnc strenc = bin->str_search_cfg.string_encoding;
RzUtilStrScanOptions scan_opt = {
.buf_size = sizeof(buffer),
.max_uni_blocks = bin->str_search_cfg.max_uni_blocks,
.max_str_length = sizeof(buffer),
.min_str_length = bin->str_search_cfg.min_length,
.prefer_big_endian = core->analysis->big_endian,
.check_ascii_freq = bin->str_search_cfg.check_ascii_freq,
Expand Down
6 changes: 6 additions & 0 deletions librz/core/casm.c
Original file line number Diff line number Diff line change
Expand Up @@ -279,23 +279,28 @@ RZ_API RzList /*<RzCoreAsmHit *>*/ *rz_core_asm_strsearch(RzCore *core, const ch
ut64 usrimm2 = inp_arg ? rz_num_math(core->num, inp_arg) : usrimm;
if (usrimm > usrimm2) {
RZ_LOG_ERROR("core: Invalid range [0x%08" PFMT64x ":0x%08" PFMT64x "]\n", usrimm, usrimm2);
free(inp);
return NULL;
}

if (core->blocksize < 8) {
RZ_LOG_ERROR("core: block size is too small\n");
free(inp);
return NULL;
}
if (!(buf = (ut8 *)calloc(core->blocksize, 1))) {
free(inp);
return NULL;
}
if (!(ptr = rz_str_dup(input))) {
free(buf);
free(inp);
return NULL;
}
if (!(hits = rz_core_asm_hit_list_new())) {
free(buf);
free(ptr);
free(inp);
return NULL;
}
tokens[0] = NULL;
Expand Down Expand Up @@ -481,6 +486,7 @@ RZ_API RzList /*<RzCoreAsmHit *>*/ *rz_core_asm_strsearch(RzCore *core, const ch
rz_cons_break_pop();
rz_asm_set_pc(core->rasm, toff);
beach:
free(inp);
free(buf);
free(ptr);
free(code);
Expand Down
Loading
Loading