-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assess fidelity of secret pattern matching #209
Comments
@lukehinds plan for roadmap |
Doing some testing: Jupyter notebook with the following cells with GPT 4o-generated secrets x = 'AIzaSyD-EXAMPLEk72b3gHs4TTGEXAMPLEKEY'
my_stripe_secret_key = 'sk_test_4eC39HqLyjWDarjtT1zdp7dcEXAMPLE'
gh_token = 'ghp_1a2B3c4D5eF6G7H8I9J0KLmnopQrSTuVWxyzEXAMPLE'
slack_token = 'xoxb-123456789012-9876543210987-abcdefGHIJKLMNOPQRSTUvwxYz' btc_address = '1BoatSLRHtKNngkdXEeobR76b53LETtpyT'
eth_addresses = ['0xfB6916095ca1df60bB79Ce92cE3Ea74c37c5d359', '0x32Be343B94f860124dC4fEe278FDCBD38C102D88']
xrp_address = 'rEb8TK3gBgk5auZkwc6sHnwrGVJH8DuaLh'
solana_address = '4o5YuSJ2dhJkVzUVAsxP2hbr4R8RLyX5NXSTFcHbSjaT' btc_wif = '5HueCGU8rMjxEXAMPLEGonPnLC5EXAMPL3dENwRYEXAMPLEKKn9W'
my_jwt = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c' The
Two secrets on the same line (in a list) were detected separately, which is great! Ones missed: |
We can look at this again later when revamping the code based on, when #209 gets underway
Make sense to perhaps fold this into the refactor as well: #423 |
A considerable number of the patterns in
signatures.yml
will match on secret variable names, such asOPENAI_API_KEY
in addition to or in place of the secret key or token itself. This is an artefact of the previous secret blocking implementation.Now with on-the-fly encryption we must be precise with the strings which are encrypted - if we obfuscate an entire line in a user's code prompt, including the variable name, it could cause the LLM to produce mangled output. We also want to avoid adding spurious claims of encrypting x amount of nonexistent secrets to the response.
This task will focus on assessing the changes needed to the way the patterns are matched in order to improve the matching fidelity. E.g. We can still detect on
OPENAI_API_KEY : <key>
, but the key itself should be within a separate matching group so it can be extracted and encrypted exclusively.The text was updated successfully, but these errors were encountered: