-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make tartufo multi-threaded #325
Comments
I think this would be a great feature. I ran tartufo on a local repo with 4k commits and it took about 112 seconds. I don't necessarily think the
I am interested in the development of this feature, a TestingI wrote a small script to determine the legitimacy of these claims. EXPAND: View the scriptimport time
import tracemalloc
from multiprocessing import Process
from threading import Thread
from typing import Type
def do_work() -> None:
"""Simulates CPU intensive work."""
x = [0]
for i in range(10_000_000):
j = x.pop()
k = i + j - 100
x.append(k)
def run_worker(worker: Type[Thread | Process]) -> None:
"""Creates 10 workers and runs the `do_work` function in each."""
tracemalloc.start()
start = time.perf_counter()
workers: list[Thread | Process] = []
for _ in range(10):
w = worker(target=do_work)
workers.append(w)
w.start()
for w in workers:
w.join()
end = time.perf_counter()
duration = end - start
current_mem, peak_mem = tracemalloc.get_traced_memory()
tracemalloc.stop()
print(f"Method: {worker.__name__}")
print(f"Completion time: {duration:.2f} seconds")
print(f"Current mem usage: {current_mem} bytes")
print(f"Peak mem usage: {peak_mem} bytes")
print()
if __name__ == "__main__":
run_worker(Thread)
run_worker(Process) EXPAND: View the script output
As you can see, the CPU usage with I noticed the current implementation of $ find ./tartufo -type f -not -path "*.pyc" -exec grep -iHn "threading" {} \;
./tartufo/scanner.py:11:import threading
./tartufo/scanner.py:146: _scan_lock: threading.Lock = threading.Lock() |
Feature Request
Add threading or concurrency to tartufo
Is your feature request related to a problem? Please describe.
Not a problem per-se, but on a monorepo with a long history, running tartufo can take a LONG time.
I happened to open htop and noticed only a single CPU core was being used (essentially wasting 11 cores). I would guess making tartufo multi-threaded would decrease not only the time for a scan, but also amount of $$$ spent on CICD instances on AWS.
Describe the solution you'd like
Preferrably either discoverable (via /proc) or number of threads via command line option.
I have not done any python coding for a loooong time so I am not sure what is available or if this is possible.
I would assume there would be a parent thread which perhaps farms out chunks of commits to child threads or something like that.
Describe alternatives you've considered
I suppose it might be possible to run separate instances of tartufo on different branches, but in my specific case I'm running on a bare mirror of a repo so branches are not available.
Teachability, Documentation, Adoption, Migration Strategy
Should be backwards compatible... may be good to make multi-threading the default option so everyone would just profit.
The text was updated successfully, but these errors were encountered: