Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Flag nested .semgrepignore files that aren't used #1

Draft
wants to merge 1 commit into
base: feature/semgrep
Choose a base branch
from

Conversation

huonw
Copy link
Owner

@huonw huonw commented May 5, 2023

Broken out of pantsbuild#18593

huonw pushed a commit that referenced this pull request May 11, 2023
The Pants native client which was introduced in pantsbuild#11922 has so far only
been usable in the Pants repo when the `USE_NATIVE_PANTS` environment
variable was set.

To make the native client available to end users, this change begins
distributing the native-client binary in Pants wheels. A corresponding
change in the `scie-pants` repo
(pantsbuild/scie-pants#172) uses the native
client to run `pants`.

To reduce the surface area of `scie-pants` (rather than having it be
responsible for handling the special `75` exit code similar to the
`pants` script integration), this PR also moves to having the
native-client execute its own fallback (via `execv`) to the Pants
entrypoint. In future, the `pantsbuild/pants` `pants` script could also
use that facility (e.g. by specifying a separate `pants_server` bash
script as the entrypoint to use as the `_PANTS_SERVER_EXE`).

----

As originally demonstrated on pantsbuild#11831, the native client is still much
faster than the legacy client. Using
pantsbuild/scie-pants#172, the timings look
like:
```
Benchmark #1: PANTS_NO_NATIVE_CLIENT=true PANTS_SHA=836cceb74e6af042e7c82677f3ceb4927efce20e scie-pants-macos-x86_64 help
  Time (mean ± σ):      1.161 s ±  0.067 s    [User: 830.6 ms, System: 79.2 ms]
  Range (min … max):    1.054 s …  1.309 s    10 runs

Benchmark #2: PANTS_SHA=836cceb74e6af042e7c82677f3ceb4927efce20e scie-pants-macos-x86_64 help
  Time (mean ± σ):     271.0 ms ±  30.6 ms    [User: 8.9 ms, System: 6.9 ms]
  Range (min … max):   241.5 ms … 360.6 ms    12 runs

Summary
  'PANTS_SHA=836cceb74e6af042e7c82677f3ceb4927efce20e scie-pants-macos-x86_64 help' ran
    4.29 ± 0.54 times faster than 'PANTS_NO_NATIVE_CLIENT=true PANTS_SHA=836cceb74e6af042e7c82677f3ceb4927efce20e scie-pants-macos-x86_64 help'
```

Fixes pantsbuild#11831.
huonw pushed a commit that referenced this pull request Jun 5, 2023
pantsbuild#19243)

…9047)"

This reverts commit 0aedb6b.

contains breaking change:
```
Error: 3.20 [ERROR] Failed to load the pants.backend.experimental.javascript.register backend: ImportError("cannot import name 'UserChosenNodeJSResolveAliases' from 'pants.backend.javascript.subsystems.nodejs' (/home/gha/actions-runner1/_work/pants/pants/src/python/pants/backend/javascript/subsystems/nodejs.py)")
Traceback (most recent call last):
  File "/home/gha/actions-runner1/_work/pants/pants/src/python/pants/init/extension_loader.py", line 141, in load_backend
    module = importlib.import_module(backend_module)
  File "/home/gha/.pyenv/versions/3.9.13/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/gha/actions-runner1/_work/pants/pants/src/python/pants/backend/experimental/javascript/register.py", line 9, in <module>
    from pants.backend.javascript.goals import lockfile, tailor, test
  File "/home/gha/actions-runner1/_work/pants/pants/src/python/pants/backend/javascript/goals/lockfile.py", line 9, in <module>
    from pants.backend.javascript import nodejs_project_environment
  File "/home/gha/actions-runner1/_work/pants/pants/src/python/pants/backend/javascript/nodejs_project_environment.py", line 9, in <module>
    from pants.backend.javascript import package_json, resolve
  File "/home/gha/actions-runner1/_work/pants/pants/src/python/pants/backend/javascript/resolve.py", line 9, in <module>
    from pants.backend.javascript import nodejs_project
  File "/home/gha/actions-runner1/_work/pants/pants/src/python/pants/backend/javascript/nodejs_project.py", line 19, in <module>
    from pants.backend.javascript.subsystems.nodejs import NodeJS, UserChosenNodeJSResolveAliases
ImportError: cannot import name 'UserChosenNodeJSResolveAliases' from 'pants.backend.javascript.subsystems.nodejs' (/home/gha/actions-runner1/_work/pants/pants/src/python/pants/backend/javascript/subsystems/nodejs.py)
```
huonw added a commit to pantsbuild/pants that referenced this pull request Jul 23, 2024
…21126)

Three things:

Allow configuring the config location; because of how the digest is 
built, running `semgrep` via pants doesn't actually  allow you to 
configure this by passing `--config`, like the tool does. There is now
a `config_name` option which will be auto-discovered and hierarchically
applied to each relevant partition. This will potentially evolve into
a `config_paths` option which will allow users to specify a list of
globs.

`semgrep` searches for rules recursively within a rules directory, so 
reflect that in the globs. This isn't obvious in the documentation, but
it's the way it behaves, and also is explicit in the code.

https://github.com/semgrep/semgrep/blob/45699ace83dd61f19d6ea1e7452f32a0c1996735/cli/src/semgrep/config_resolver.py#L402


Finally, I flattened the glob for `.semgrepignore`. Currently, the glob
recursively matches all matching file names, but ultimately `semgrep` 
just reads only what's in the current dir. In my testing of
pants+semgrep
I did indeed notice the same behavior, that the `.semgrepignore`s in
lower
dirs were not being respected. An integration test confirmed the
same.[1]
Supporting multi-level `.semgrepignores` is going to be
a significant effort, since it would involve actually parsing and
interpreting
the config files, and also navigating the pitfalls of managing overlays
of
ignore patterns. Instead, I chose to force it to being a singular file 
path.

Here're some historical discussions about .semgrepignore:
huonw#1,
#18593 (comment)

https://github.com/semgrep/semgrep/blob/45699ace83dd61f19d6ea1e7452f32a0c1996735/src/targeting/Semgrepignore.ml#L124
https://semgrep.dev/docs/ignoring-files-folders-code#reference-summary

[1] The integration test which proves hierarchical semgrepignore doesn't
work:
```py
def test_semgrepignore_not_partitioned(rule_runner: RuleRunner) -> None:
    rule_runner.write_files({
        f"{DIR}/BUILD": "",
        f"{DIR}/.semgrep.yml": RULES,
        f"{DIR}/.semgrepignore": FILE,
        
        f"{DIR}/a/.semgrepignore": FILE,
        f"{DIR}/a/BUILD": SINGLE_FILE_BUILD,
        f"{DIR}/a/{FILE}": BAD_FILE,
    })
    tgt = rule_runner.get_target(Address(f"{DIR}/a", target_name="f"))

    # run with these two semgrepignores in place
    results = run_semgrep(rule_runner, [tgt])
    assert len(results) == 1
    result = results[0]
    # doesn't get ignored
    assert "Ran 1 rule on 1 file: 1 finding" in result.stderr

    # run with a root semgrepignore
    rule_runner.write_files({
        ".semgrepignore": FILE,
    })
    results = run_semgrep(rule_runner, [tgt])
    assert len(results) == 1
    result = results[0]
    # finally, it's ignored
    assert "Ran 1 rule on 0 files: 0 findings" in result.stderr
```

---------

Co-authored-by: Huon Wilson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant