Apparent bug with for loops #94

micsthepick · 2024-09-11T08:48:28Z

I only know a little about how the fuzzing process works, and barely at all about how atheris instruments the bytecode, but I've noticed the following discrepancy:

import atheris

with atheris.instrument_imports():
    import sys

@atheris.instrument_func
def Fuzz(data: bytes):
    string = 'thisisalongstringtotestatherisandtomakesurethatithandlesaforloopcorrectly'

    if len(data) < 1:
        return

    fdp = atheris.FuzzedDataProvider(data)

    data_unicode = fdp.ConsumeUnicode(len(data))

    if len(data_unicode) <= 0 or data_unicode[0] != "t":
        return
    elif len(data_unicode) <= 1 or data_unicode[1] != "h":
        return
    ...<repetitive source code elided for brevity>...
    elif len(data_unicode) <= 71 or data_unicode[71] != "l":
        return
    elif len(data_unicode) <= 72 or data_unicode[72] != "y":
        return
    raise ValueError("BOOM!")


if __name__ == '__main__':
    atheris.Setup(sys.argv, Fuzz)
    atheris.Fuzz()

works fine, but when I condense it into a for loop:

import atheris

with atheris.instrument_imports():
    import sys


@atheris.instrument_func
def Fuzz(data: bytes):
    string = 'thisisalongstringtotestatherisandtomakesurethatithandlesaforloopcorrectly'

    if len(data) < 1:
        return

    fdp = atheris.FuzzedDataProvider(data)

    data_unicode = fdp.ConsumeUnicode(len(data))

    for i in range(len(string)):
        if len(data_unicode) <= i or data_unicode[i] != string[i]:
            break
    else:
        raise ValueError("BOOM!")


if __name__ == '__main__':
    atheris.Setup(sys.argv, Fuzz)
    atheris.Fuzz()

Expected behaviour:
Both examples take a comparable amount of time (taking into consideration that the unrolled loop is probably faster) and finish with the completed string as a crash example.

Observed behaviour:
The for loop doesn't finish, and only gets a few character right at once

Further notes:
Because both are functionally equivalent, I wouldn't expect the for loop to take so much longer (at this stage it's looking like a heat death kind of slow).

micsthepick · 2024-09-11T09:26:19Z

running with -reduce_inputs=0 certainly helps the unrolled version run faster, but the first (for loop) version still doesn't finish with that on or off. Here is a log from the rolled up for loop version:

python ./atheris_test.py -reduce_inputs=0
INFO: Using built-in libfuzzer
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2647317036
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2      INITED cov: 5 ft: 5 corp: 1/1b exec/s: 0 rss: 36Mb
#11     NEW    cov: 6 ft: 6 corp: 2/3b lim: 4 exec/s: 0 rss: 36Mb L: 2/2 MS: 4 InsertByte-ChangeBit-ChangeBit-ChangeByte-
#1521   NEW    cov: 6 ft: 8 corp: 3/5b lim: 17 exec/s: 0 rss: 36Mb L: 2/2 MS: 5 InsertByte-ShuffleBytes-ChangeByte-ChangeBinInt-ShuffleBytes-
#1551   NEW    cov: 6 ft: 9 corp: 4/8b lim: 17 exec/s: 0 rss: 36Mb L: 3/3 MS: 5 EraseBytes-CopyPart-ChangeBinInt-ChangeByte-InsertByte-
#17804  NEW    cov: 6 ft: 12 corp: 5/43b lim: 177 exec/s: 0 rss: 36Mb L: 35/35 MS: 3 ChangeBit-CrossOver-InsertRepeatedBytes-
#20742  NEW    cov: 6 ft: 15 corp: 6/79b lim: 205 exec/s: 0 rss: 36Mb L: 36/36 MS: 3 CrossOver-CrossOver-ChangeBit-
#524288 pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 262144 rss: 36Mb
#1048576        pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 262144 rss: 36Mb
#2097152        pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 233016 rss: 36Mb
#4194304        pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 209715 rss: 36Mb
#8388608        pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 204600 rss: 36Mb
#16777216       pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 202135 rss: 36Mb
#33554432       pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 199728 rss: 36Mb
#67108864       pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 197961 rss: 36Mb
#134217728      pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 200624 rss: 36Mb
#268435456      pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 209388 rss: 37Mb
#536870912      pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 212790 rss: 37Mb

tybug · 2024-09-14T05:17:12Z

This is an expected consequence of how ~all fuzzers determine which inputs to mutate 🙂. Libfuzzer (the backing of atheris) saves inputs to mutate - NEW in your log - when they uncover a new line. Unrolling a loop means that every nth iteration is new, and the fuzzer can continue to make incremental progress. With (rolled) loops this can't happen.

I believe libfuzzer does have consideration for high hit counts on loop iterations – ft aka features in your log – but only at large intervals, so the progress is minimal and tails off as the hit count requirement exponentially (?) increases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apparent bug with for loops #94

Apparent bug with for loops #94

micsthepick commented Sep 11, 2024 •

edited

Loading

micsthepick commented Sep 11, 2024 •

edited

Loading

tybug commented Sep 14, 2024

Apparent bug with for loops #94

Apparent bug with for loops #94

Comments

micsthepick commented Sep 11, 2024 • edited Loading

micsthepick commented Sep 11, 2024 • edited Loading

tybug commented Sep 14, 2024

micsthepick commented Sep 11, 2024 •

edited

Loading

micsthepick commented Sep 11, 2024 •

edited

Loading