Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refuse to map procfs files on Linux (VirusTotal#1848) #4

Merged
merged 38 commits into from
Jan 2, 2023

Conversation

fengjixuchui
Copy link
Owner

No description provided.

plusvic and others added 30 commits September 6, 2022 11:23
Fix infinite recursion when parsing a malformed binary, discovered by
clusterfuzz.

Fixes #1793.
* Implement text string sets.

Add support for text string sets into the grammar. They look like this:

for any s in ("a", "b"): (pe.imphash() == s)

This requires changing integer_set and integer_enumeration to just be set and
enumeration, and adding a new type (YR_ENUMERATION) that tracks the type of
enumeration (integer or otherwise) and the number of items in the enumeration.

The enumeration now checks that each type of the item in the enumeration is
consistent and will cause a compiler error if they are not. For example, this is
an error:

for any s in ("a", 0): (s)

Also, fix the build when using --with-debug-verbose option as it was missing the
assert.h include.

* Add docs and adjust constants layout.
* Add OP_OF_FOUND_AT.

Add support for "any of them at 0" constructs to the language. This allows users
to avoid using long or chains like "$a at 0 or $b at 0" and also is a nicer way
to write "for any of them: ($ at 0)".

* Add short blurb to docs about new string set at offset.
* Print module names.

Add a -M option to the cli which will print the module names to stdout. As the
number of modules has grown it can become confusing to know what modules are
available, so this option will display them on stdout.

NOTE: I'm not sure I like how I implemented this. It seems hackish to expose an
API that just prints things on stdout but it felt even more hackish to walk the
yr_modules_table array from outside of the module code. ;)

Fixes #737.

* Expose access to modules in the API.

This adds a yr_modules_get_table to the libyara API so callers can get access to
the modules. This is currently only used for printing the available module names
in the yara cli. With this change it is now possible to expose this information
in yara-python too.
* Add warnings for edge cases.

When using "all of them at 0" it is now a warning if you have more than 1 string
defined in "them".

"N of them" (where N > 1) is also a warning.

* Use IS_UNDEFINED when checking integer expression.
Add the missing trailing backtick that was missed in the original commit for
these docs.
…cking (#1833)

According to Windows documentation, deletion/renaming of the file will only occur when
all the handles opened before the `DeleteFile(A|W)` call are closed, so yara won't have
any problems with reading the file and doing its work because of this.

On the other hand, it will make it much more intuitive for users that don't know nor care
that a software is using Yara on the file they're targetting, the deletion will just take
a second or two instead of being instant, but it's much better than being denied deletion
repeatedly.

This problem notably surfaces when antivirus software using Yara is installed on dev machines:
compiling creates/deletes lot of small executables sometimes, which are picked up and then
checked by Yara: currently this creates lots of error because the file cannot be deleted.
* Fix `pe_rva_to_offset`

* Fix: checking if RVA is inside section
* Fix: real pointer to raw data is aligned down to sector size

* Add pe.import_rva() functions.

Add pe.import_rva("foo.dll", "func1") which returns the RVA of the imported
function. Also add pe.import_rva("foo.dll", 1) which does the same but the
import is done by ordinal.

* Implement delayed import RVA and add docs.

* Add math.length()

Add a math.length() which will return the length of the sequence of bytes,
including any NULL bytes.

Fixes #1778.

* Add string module.

Move the math.to_int() functions and math.length() over to the new string
module. I decided to move the to_int() because it seems logical to convert from
a string to an integer using a string module rather than the math module.

You still use math.to_string() to convert an integer to a string, and use
string.to_int() to convert a string to an integer.

Add tests and docs for string module. Move the appropriate tests from the math
module over to the new string tests.

While here, also add the console module to the bazel build as it was apparently
missing (this is untested).

* Remove unused include.

* Add tests for #1561

* Fix copyright year in string module.

Co-authored-by: Peter Babka <[email protected]>
* Add tests/test-string.c and update .gitignore

* Add missing test file to BUILD.bazel
Benefits
- Changes that result in a rebuild of libyara will also cause test
  programs to be rebuilt
- Avoid repeating flag definitions
As of file 5.44, some PE-related strings and MIME types have been
updated, causing the test to fail.

See [Debian bug#1027031](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1027031)
As of OpenSSL 3.0, {MD5,SHA1,SHA256}_{Init,Update,Final} are
deprecated. Stop using them.
wxsBSD and others added 8 commits December 29, 2022 11:59
This is now illegal syntax:

for 3.14159 in (1): (1)

It doesn't make sense to allow floating point in the primary expression used in
for expressions. This is an extension of the work done in a5903e1 which banned
negative integers, strings and regexp from being used here too.
* fix error handling in string.to_int()

Error handling in the to_int method of the new string module was not
properly done.

- errno was not reset prior to the call to strtoll, so YR_UNDEFINED
  could be returned if a prior call failed, but not the current one.
- overflow/underflow was not properly checked, as it returns LLONG_MIN
  or LLONG_MAX and not 0.

Instead, do what is recommended in the man page for strtol:
- set errno to 0 prior to the call
- check errno after the call

* Flesh out invalid cases for string.to_int

Ensure the behavior of string.to_int is the least surprising and
consistent across all OSes:

- return an error when the string is not entirely parsed (so "10p"
  fails to parse).
- return an error when the string has no digit whatsoever (avoiding
  parsing "foo" as 0).
- remove the evaluation differences on different libc implementations.

Fixes #1843

* validate value of base in string.to_int(string, base)

There seems to be implementation differences on invalid base values,
especially with cygwin. So validate that the base has a valid value
first.
Also restore the comment that was lost with #1811.
Compilers and code analysis tools like valgrind raise warnings when `break` statements are missing at the end of each case in a switch statement. Even thought the code was correct, let's write it in a way that doesn't raise the alarms.
* Make -X print the plaintext.

Make the -X argument print the plaintext along with the xor key.

Output looks like this:

```
wxs@mbp yara % cat rules/xor.yara
rule a {
  strings:
    $a = "This program cannot"
    $b = "This program cannot" xor(1-255)
  condition:
    any of them
}
wxs@mbp yara % ./yara -s -X rules/xor.yara tests/data/xorwideandascii.out | head -5
a tests/data/xorwideandascii.out
0x4:$a:xor(0x00,This program cannot): This program cannot
0x1c:$b:xor(0x01,This program cannot): Uihr!qsnfs`l!b`oonu
0x34:$b:xor(0x02,This program cannot): Vjkq"rpmepco"acllmv
0x4c:$b:xor(0x03,This program cannot): Wkjp#sqldqbn#`bmmlw
wxs@mbp yara % ./yara -s -X rules/xor.yara tests/data/xorwideandascii.out | tail -5
0x1878:$b:xor(0xfb,This program cannot): \xAF\x93\x92\x88\xDB\x8B\x89\x94\x9C\x89\x9A\x96\xDB\x98\x9A\x95\x95\x94\x8F
0x1891:$b:xor(0xfc,This program cannot): \xA8\x94\x95\x8F\xDC\x8C\x8E\x93\x9B\x8E\x9D\x91\xDC\x9F\x9D\x92\x92\x93\x88
0x18aa:$b:xor(0xfd,This program cannot): \xA9\x95\x94\x8E\xDD\x8D\x8F\x92\x9A\x8F\x9C\x90\xDD\x9E\x9C\x93\x93\x92\x89
0x18c3:$b:xor(0xfe,This program cannot): \xAA\x96\x97\x8D\xDE\x8E\x8C\x91\x99\x8C\x9F\x93\xDE\x9D\x9F\x90\x90\x91\x8A
0x18dc:$b:xor(0xff,This program cannot): \xAB\x97\x96\x8C\xDF\x8F\x8D\x90\x98\x8D\x9E\x92\xDF\x9C\x9E\x91\x91\x90\x8B
wxs@mbp yara %
```

* Update argument description.

* Fix #1851.

This is actually a combination of two bugs. The first bug is that if a string
fits in an atom we never calculated the xor key. The second bug is that if you
specify the xor modifier we were NOT setting the ascii flag on the string, so we
could not calculate the xor key properly one we fixed the first bug.

Tested with:

wxs@mbp yara % cat rules/xor.yara
rule a {
  strings:
    $c = "This" xor
  condition:
    any of them
}
wxs@mbp yara % ./yara -s -X rules/xor.yara tests/data/xorwideandascii.out | head -5
a tests/data/xorwideandascii.out
0x4:$c:xor(0x00,This): This
0x1c:$c:xor(0x01,This): Uihr
0x34:$c:xor(0x02,This): Vjkq
0x4c:$c:xor(0x03,This): Wkjp
wxs@mbp yara %

An interesting side effect of this bug is that you could also trigger it with a
"xor wide" combination of modifiers if the string was short enough to fit in an
atom, which is now also fixed.

wxs@mbp yara % cat rules/xor.yara
rule a {
  strings:
    $c = "Th" xor wide
  condition:
    any of them
}
wxs@mbp yara % ./yara -s -X rules/xor.yara tests/data/xorwideandascii.out | head -5
a tests/data/xorwideandascii.out
0x18f4:$c:xor(0x00,T\x00h\x00): T\x00h\x00
0x191f:$c:xor(0x01,T\x00h\x00): U\x01i\x01
0x194a:$c:xor(0x02,T\x00h\x00): V\x02j\x02
0x1975:$c:xor(0x03,T\x00h\x00): W\x03k\x03
wxs@mbp yara %

Noticed by @melomac, who provided a nice writeup to help point out that this
only happens on "short" strings (which led me to realize it was only on strings
that fit in an atom).

* Add test case for #1851.
It makes no sense to try to mmap files for which the filesystem lies
about the size (0).

Clsoe #1838
@fengjixuchui fengjixuchui merged commit dc44f75 into fengjixuchui:master Jan 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.