Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assess fidelity of secret pattern matching #209

Open
poppysec opened this issue Dec 5, 2024 · 3 comments
Open

Assess fidelity of secret pattern matching #209

poppysec opened this issue Dec 5, 2024 · 3 comments

Comments

@poppysec
Copy link
Member

poppysec commented Dec 5, 2024

A considerable number of the patterns in signatures.yml will match on secret variable names, such as OPENAI_API_KEY in addition to or in place of the secret key or token itself. This is an artefact of the previous secret blocking implementation.

- Amazon:
  - Access Key: (?:A3T[A-Z0-9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA|ABIA|ACCA)[A-Z0-9]{16}
  - Secret Access Key Variable: (?i)(amazon|amz|aws)[-_]{0,1}(secret)[-_]{0,1}((access)[-_]{0,1}){0,1}key
  # - Cognito User Pool ID: (?i)us-[a-z]{2,}-[a-z]{4,}-\d{1,}
  - RDS Password: (?i)(rds\-master\-password|db\-password)
  - S3 Private Key Variable: (?i)AWS_S3_PRIVATE_KEY|s3_key|S3_PRIVATE_KEY
  - Security Token Header Variable: (?i)X-Amz-Security-Token
  - API Gateway Key Source Header Variable: (?i)x-amazon-apigateway-api-key-source
  - S3 Bucket: (?i)AWS_S3_BUCKET|s3_bucket
  - SNS Confirmation URL: (?i)https:\/\/sns\.[a-z0-9-]+\.amazonaws\.com\/?Action=ConfirmSubscription&Token=[a-zA-Z0-9-=_]+
  - SES SMTP Password Variable: (?i)ses_smtp_password
  - AWS Private Key Variable: (?i)ec2\-private\-key|EC2_PRIVATE_KEY
  - MWS Token: (amzn\.mws\.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})
  - AppSync GraphQL Key: \bda2-[a-z0-9]{26}

- Microsoft:
  - Azure API Key Variable: (?i)Ocp-Apim-Subscription-Key
  - Azure Functions Key Header Variable: (?i)x-functions-key

Now with on-the-fly encryption we must be precise with the strings which are encrypted - if we obfuscate an entire line in a user's code prompt, including the variable name, it could cause the LLM to produce mangled output. We also want to avoid adding spurious claims of encrypting x amount of nonexistent secrets to the response.

This task will focus on assessing the changes needed to the way the patterns are matched in order to improve the matching fidelity. E.g. We can still detect on OPENAI_API_KEY : <key>, but the key itself should be within a separate matching group so it can be extracted and encrypted exclusively.

@lukehinds
Copy link
Contributor

@lukehinds plan for roadmap

@poppysec
Copy link
Member Author

Doing some testing: Jupyter notebook with the following cells with GPT 4o-generated secrets

x = 'AIzaSyD-EXAMPLEk72b3gHs4TTGEXAMPLEKEY'
my_stripe_secret_key = 'sk_test_4eC39HqLyjWDarjtT1zdp7dcEXAMPLE'
gh_token = 'ghp_1a2B3c4D5eF6G7H8I9J0KLmnopQrSTuVWxyzEXAMPLE'
slack_token = 'xoxb-123456789012-9876543210987-abcdefGHIJKLMNOPQRSTUvwxYz'
btc_address = '1BoatSLRHtKNngkdXEeobR76b53LETtpyT'
eth_addresses = ['0xfB6916095ca1df60bB79Ce92cE3Ea74c37c5d359', '0x32Be343B94f860124dC4fEe278FDCBD38C102D88']
xrp_address = 'rEb8TK3gBgk5auZkwc6sHnwrGVJH8DuaLh'
solana_address = '4o5YuSJ2dhJkVzUVAsxP2hbr4R8RLyX5NXSTFcHbSjaT'
btc_wif = '5HueCGU8rMjxEXAMPLEGonPnLC5EXAMPL3dENwRYEXAMPLEKKn9W'
my_jwt = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c'

The EXAMPLE ones might cause issues with pattern matching, we will see.

🛡️ Codegate prevented 9 secrets from being leaked by redacting them.

CodeGate Security Analysis

The provided code snippet contains several security concerns related to sensitive information handling. Here are the key issues and recommendations for addressing them:

  1. API Keys and Secrets:

    • x: An API key that appears to be related to Google services (e.g., Google Maps API).
    • my_stripe_secret_key: A Stripe secret key, which should be kept confidential and not exposed in public code repositories.
    • gh_token: A GitHub token. Tokens should not be hard-coded in the source code as they can be used to access repositories and perform actions on behalf of the user.

    Recommendations:

    • Use environment variables or a secure vault to store and access API keys, secret keys, and tokens.
    • Implement access controls to ensure that only authorized personnel can view or modify these credentials.
  2. Cryptocurrency Addresses:

    • btc_address: A Bitcoin address.
    • eth_addresses: Ethereum addresses.
    • xrp_address: XRP (ipple) address.
    • solana_address: Solana address.

    Recommendations:

    • Ensure that cryptocurrency addresses are used securely and are not exposed in public repositories to prevent unauthorized transactions.
    • Consider using a secure wallet or service to manage cryptocurrency addresses and transactions.
  3. Private Keys:

    • btc_wif: A Bitcoin private key in Wallet Import Format (WIF).

    Recommendations:

    • Never hard-code private keys in your source code. They should be handled securely using environment variables or secure storage solutions.
    • Consider using a hardware wallet or a secure key management service to store private keys.
  4. JWT Tokens:

    • my_jwt: A JSON Web Token (JWT).

    Recommendations:

    • JWTs should be stored securely and not hard-coded in the source code. They should be transmitted over secure channels (e.g., HTTPS) and validated server-side.
    • Ensure that JWTs have appropriate expiration times and are refreshed as needed.

Best Practices for Secure Implementation:

  • Avoid hard-coding sensitive information such as API keys, secret keys, tokens, and private keys in the source code.
  • Use environment variables or secure vaults to manage sensitive information.
  • Implement proper access controls and authentication mechanisms to protect sensitive data.
  • Regularly review and update security policies and procedures to ensure compliance with industry standards and best practices.

References:

Two secrets on the same line (in a list) were detected separately, which is great!

Ones missed:
XRP - rEb8TK3gBgk5auZkwc6sHnwrGVJH8DuaLh
Stripe secret key -sk_test_4eC39HqLyjWDarjtT1zdp7dcEXAMPLE
Google API key - AIzaSyD-EXAMPLEk72b3gHs4TTGEXAMPLEKEY

lukehinds added a commit that referenced this issue Jan 6, 2025
We can look at this again later when revamping the code based on,
when #209 gets underway
@lukehinds
Copy link
Contributor

Make sense to perhaps fold this into the refactor as well: #423

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants