add lint to avoid duplicate features #2250

mr-tz · 2024-08-02T13:07:31Z

e.g. duplicate API features such as removed in mandiant/capa-rules#916

v1bh475u · 2025-01-22T04:57:53Z

hello @mr-tz . I want to work on this issue. Can you please assign it to me?
Also, I wanted to ask what python libraries I cannot use in lint. (For e.g can i use collections or it is not desired?).

williballenthin · 2025-01-22T07:20:24Z

you can use anything in the standard library (that is, doesn't require installation from PyPI) as well as anything we already rely upon (see pyproject or requirements.txt). we can also add dependencies as necessary, though we'd prefer a good reason to do so, since they add a little risk and maintenance burden.

williballenthin · 2025-01-22T07:20:38Z

also, go ahead and work on it!

v1bh475u · 2025-01-22T19:31:06Z

To what extent do we want to remove duplicate features? We can detect the duplicate features under same statement ( and, or, not, some, etc). But do we want to make it so that it analyzes complex like the boolean expression a|(a&<some_other_expression>) simplifies into a where a is some feature? Do we also want to add lint for ors under one or and some thing similar for other such statements.

williballenthin · 2025-01-22T23:21:18Z

let's start with the most trivial duplicates, such as the same value under AND or OR. that's what we fixed in the linked PR.

v1bh475u · 2025-01-23T18:00:46Z

INFO     lint: collecting potentially referenced samples                                                                                                   lint.py:1034
⠹ linting rule: compiled to the .NET platform        0% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/1 rule • 0:00:00 < -:--:-- • 00.0 rule/s  
 compiled to the .NET platform
  FAIL: rule contains a duplicate feature under `or`/`and` statement: remove the duplicate features
        duplicate line: "      - import: mscoree._cordllmain" : line numbers: 17, 19


rules with FAIL:
  - compiled to the .NET platform

My current implementation will print in this format, is it fine? It take cares of duplicate features under a single statement ( one of or, and, not, optional and some and can be extended for other statements).

williballenthin · 2025-01-23T19:16:14Z

that's great!

v1bh475u · 2025-01-24T14:17:26Z

@williballenthin I have made the PR but there are some issues with it. We cannot use rule passed as argument for detecting duplicate features as the Rule class when retrieves features from the yaml file, it simply ignores the redundancies. Hence, we have to rely on definition which contains the entire unparsed yaml file as string. What I have done is iterate through lines and get the features under same statement and if there is any duplicate feature, we keep its line number in a list and add it to the recommendation for fixing. It is working and has detected quite a few duplicate features in some rules. However, it fails for certain multi-lined features. For example,

        - string: /dbghelp\.dll/i
          description: WindBG
        - string: /dbghelp\.dll/i
          description: WINE

This is due to the feature being multi-lined. So, I wanted to ask what your suggestions are for proceeding with this issue.

v1bh475u · 2025-01-25T09:17:01Z

In create-tcp-socket-via-raw-afd-driver.yml,

        # in the example code, in debug mode, the array is constructed bytewise on the stack
        - basic block:
          - and:
            - description: bExtendedAttributes for IPv4 TCP on stack, bytewise
            # i've kept the values approximately in order while removing some duplicates for clarity
            - number: 0x00
            - number: 0x0F
            - number: 0x1E
            - number: 0x41 = A
            - number: 0x66 = f
            - number: 0x64 = d
            - number: 0x4F = O
            - number: 0x70 = p
            - number: 0x65 = e
            - number: 0x6E = n
            - number: 0x50 = P
            - number: 0x61 = a
            - number: 0x63 = c
            - number: 0x6B = k
            - number: 0x65 = e
            - number: 0x74 = t
            - number: 0x58 = X
            - number: 0x02
            - number: 0x01
            - number: 0x06
            - number: 0x00
            - number: 0x60
            - number: 0xEF
            - number: 0x3D
            - number: 0x47
            - number: 0xFE

Are - number: 0x65 = e and - number: 0x00 to be considered redundant?

mr-tz · 2025-01-25T10:30:05Z

Yes

v1bh475u · 2025-01-25T11:06:48Z

Ok, then. My updated lint has found some duplicate features in current set of rules and due to that reason, my commits are unable to pass the rule_linter check. So, shall I make PR on capa-rules with removed duplicate features?

mr-tz · 2025-01-25T18:11:39Z

Yes, that would be great 👍🏼

v1bh475u · 2025-01-26T11:29:27Z

I have made my of this issue ready for review and also added a in capa-rules for removing duplicate features. Please review them and suggest changes if needed.

mr-tz added enhancement New feature or request good first issue Good for newcomers labels Aug 2, 2024

williballenthin assigned v1bh475u Jan 22, 2025

v1bh475u mentioned this issue Jan 24, 2025

add lint for duplicate feature under a statement #2573

Merged

3 tasks

williballenthin closed this as completed in #2573 Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add lint to avoid duplicate features #2250

add lint to avoid duplicate features #2250

mr-tz commented Aug 2, 2024

v1bh475u commented Jan 22, 2025

williballenthin commented Jan 22, 2025

williballenthin commented Jan 22, 2025

v1bh475u commented Jan 22, 2025

williballenthin commented Jan 22, 2025

v1bh475u commented Jan 23, 2025

williballenthin commented Jan 23, 2025

v1bh475u commented Jan 24, 2025

v1bh475u commented Jan 25, 2025

mr-tz commented Jan 25, 2025

v1bh475u commented Jan 25, 2025

mr-tz commented Jan 25, 2025

v1bh475u commented Jan 26, 2025

add lint to avoid duplicate features #2250

add lint to avoid duplicate features #2250

Comments

mr-tz commented Aug 2, 2024

v1bh475u commented Jan 22, 2025

williballenthin commented Jan 22, 2025

williballenthin commented Jan 22, 2025

v1bh475u commented Jan 22, 2025

williballenthin commented Jan 22, 2025

v1bh475u commented Jan 23, 2025

williballenthin commented Jan 23, 2025

v1bh475u commented Jan 24, 2025

v1bh475u commented Jan 25, 2025

mr-tz commented Jan 25, 2025

v1bh475u commented Jan 25, 2025

mr-tz commented Jan 25, 2025

v1bh475u commented Jan 26, 2025