Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparent bug with for loops #94

Open
micsthepick opened this issue Sep 11, 2024 · 2 comments
Open

Apparent bug with for loops #94

micsthepick opened this issue Sep 11, 2024 · 2 comments

Comments

@micsthepick
Copy link

micsthepick commented Sep 11, 2024

I only know a little about how the fuzzing process works, and barely at all about how atheris instruments the bytecode, but I've noticed the following discrepancy:

import atheris

with atheris.instrument_imports():
    import sys

@atheris.instrument_func
def Fuzz(data: bytes):
    string = 'thisisalongstringtotestatherisandtomakesurethatithandlesaforloopcorrectly'

    if len(data) < 1:
        return

    fdp = atheris.FuzzedDataProvider(data)

    data_unicode = fdp.ConsumeUnicode(len(data))

    if len(data_unicode) <= 0 or data_unicode[0] != "t":
        return
    elif len(data_unicode) <= 1 or data_unicode[1] != "h":
        return
    ...<repetitive source code elided for brevity>...
    elif len(data_unicode) <= 71 or data_unicode[71] != "l":
        return
    elif len(data_unicode) <= 72 or data_unicode[72] != "y":
        return
    raise ValueError("BOOM!")


if __name__ == '__main__':
    atheris.Setup(sys.argv, Fuzz)
    atheris.Fuzz()

works fine, but when I condense it into a for loop:

import atheris

with atheris.instrument_imports():
    import sys


@atheris.instrument_func
def Fuzz(data: bytes):
    string = 'thisisalongstringtotestatherisandtomakesurethatithandlesaforloopcorrectly'

    if len(data) < 1:
        return

    fdp = atheris.FuzzedDataProvider(data)

    data_unicode = fdp.ConsumeUnicode(len(data))

    for i in range(len(string)):
        if len(data_unicode) <= i or data_unicode[i] != string[i]:
            break
    else:
        raise ValueError("BOOM!")


if __name__ == '__main__':
    atheris.Setup(sys.argv, Fuzz)
    atheris.Fuzz()

Expected behaviour:
Both examples take a comparable amount of time (taking into consideration that the unrolled loop is probably faster) and finish with the completed string as a crash example.

Observed behaviour:
The for loop doesn't finish, and only gets a few character right at once

Further notes:
Because both are functionally equivalent, I wouldn't expect the for loop to take so much longer (at this stage it's looking like a heat death kind of slow).

@micsthepick
Copy link
Author

micsthepick commented Sep 11, 2024

running with -reduce_inputs=0 certainly helps the unrolled version run faster, but the first (for loop) version still doesn't finish with that on or off. Here is a log from the rolled up for loop version:

python ./atheris_test.py -reduce_inputs=0
INFO: Using built-in libfuzzer
WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2647317036
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2      INITED cov: 5 ft: 5 corp: 1/1b exec/s: 0 rss: 36Mb
#11     NEW    cov: 6 ft: 6 corp: 2/3b lim: 4 exec/s: 0 rss: 36Mb L: 2/2 MS: 4 InsertByte-ChangeBit-ChangeBit-ChangeByte-
#1521   NEW    cov: 6 ft: 8 corp: 3/5b lim: 17 exec/s: 0 rss: 36Mb L: 2/2 MS: 5 InsertByte-ShuffleBytes-ChangeByte-ChangeBinInt-ShuffleBytes-
#1551   NEW    cov: 6 ft: 9 corp: 4/8b lim: 17 exec/s: 0 rss: 36Mb L: 3/3 MS: 5 EraseBytes-CopyPart-ChangeBinInt-ChangeByte-InsertByte-
#17804  NEW    cov: 6 ft: 12 corp: 5/43b lim: 177 exec/s: 0 rss: 36Mb L: 35/35 MS: 3 ChangeBit-CrossOver-InsertRepeatedBytes-
#20742  NEW    cov: 6 ft: 15 corp: 6/79b lim: 205 exec/s: 0 rss: 36Mb L: 36/36 MS: 3 CrossOver-CrossOver-ChangeBit-
#524288 pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 262144 rss: 36Mb
#1048576        pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 262144 rss: 36Mb
#2097152        pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 233016 rss: 36Mb
#4194304        pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 209715 rss: 36Mb
#8388608        pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 204600 rss: 36Mb
#16777216       pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 202135 rss: 36Mb
#33554432       pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 199728 rss: 36Mb
#67108864       pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 197961 rss: 36Mb
#134217728      pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 200624 rss: 36Mb
#268435456      pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 209388 rss: 37Mb
#536870912      pulse  cov: 6 ft: 15 corp: 6/79b lim: 4096 exec/s: 212790 rss: 37Mb

@tybug
Copy link

tybug commented Sep 14, 2024

This is an expected consequence of how ~all fuzzers determine which inputs to mutate 🙂. Libfuzzer (the backing of atheris) saves inputs to mutate - NEW in your log - when they uncover a new line. Unrolling a loop means that every nth iteration is new, and the fuzzer can continue to make incremental progress. With (rolled) loops this can't happen.

I believe libfuzzer does have consideration for high hit counts on loop iterations – ft aka features in your log – but only at large intervals, so the progress is minimal and tails off as the hit count requirement exponentially (?) increases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants