Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix check bypass via square brackets Issue #857 #860

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tsigouris007
Copy link

@tsigouris007 tsigouris007 commented Jun 20, 2024

  • Please check if the PR fulfills these requirements
  • Tests for the changes have been added
  • Docs have been added / updated
  • All CI checks are green
  • Tests pass
  • What kind of change does this PR introduce?

Fixes behavior on missed secrets that have square brackets before their default value.
Example:

"<%= ENV['CLIENT_ACCESS_KEY_ID'].presence || 'AKIA123456789ABCDEF1' %>"
  • What is the current behavior?

Issue: #857

  • What is the new behavior (if this is a feature change)?

Escapes the square brackets and catches secrets properly.

  • Does this PR introduce a breaking change?

No breaking changes.

Copy link
Contributor

@lorenzodb1 lorenzodb1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tsigouris007, thank you for your PR!

Could you explain what the bug was?

@tsigouris007
Copy link
Author

Hi @tsigouris007, thank you for your PR!

Could you explain what the bug was?

Hi,
You can visit the issue for the compete write-up.

@lorenzodb1
Copy link
Contributor

Hi @tsigouris007, #857 describes the issue, I was looking for an explanation around what's causing said issue in the code

@tsigouris007
Copy link
Author

tsigouris007 commented Jun 21, 2024

Hi @tsigouris007, #857 describes the issue, I was looking for an explanation around what's causing said issue in the code

I haven't quite had the time to find the exact location that brakes this.
The combined opening/closing square brackets [] seem to be causing the lines to not be fetched at all for checking.

For example I got AWSKeyDetector's code into a separate standalone python file and the following secret was caught properly by the regex:

access_key_id: "<%= ENV['CLIENT_ACCESS_KEY_ID'].presence || 'AKIA123456789ABCDEF1' %>"

I am taking wild a guess here and I most probably think the problem is how the checked file lines are fetched (and that's why I thought of "escaping" the square brackets earlier).
As far as I can help atm you can go to detect_secrets/core/scan.py method _process_line_based_plugins and the following condition is returning True and is quitting early:

if _is_filtered_out(
            required_filter_parameters=['line'],
            filename=filename,
            line=line,
            context=code_snippet,
):

If you remove it or add a pass instead of continue these secrets are caught because it hits the else clause in _is_filtered_out loop returning a debug message of:

Skipping secret due to `detect_secrets.filters.heuristic.is_indirect_reference`.

I can take a closer look but it will take me at least a couple of days.

@lorenzodb1
Copy link
Contributor

Understanding what exactly is causing the bug will help us finding a good solution moving forward rather than a workaround that could potentially add another set of issues, hence why I'm asking this. Take your time and please let me know what you'll find!

@tsigouris007
Copy link
Author

Understanding what exactly is causing the bug will help us finding a good solution moving forward rather than a workaround that could potentially add another set of issues, hence why I'm asking this. Take your time and please let me know what you'll find!

It's understandable.
I 'll try to get back to this asap.

@tsigouris007
Copy link
Author

Hi @lorenzodb1 ,
so I found where the problem occurs.
In file detect_secrets/filters/heuristic.py there is _get_indirect_reference_regex with a regex of:

return re.compile(r'([^\v=!:]*)\s*(:=?|[!=]{1,3})\s*([\w.-]+[\[\(][^\v]*[\]\)])')

So the choice is to either change this regex a bit or escape earlier the square brackets.
The latter could be done as is in my PR or we could do it also in is_indirect_reference function inside the heuristic.py file right before the boolean return as shown:

def is_indirect_reference(line: str) -> bool:
    """
    Filters secrets that take the form of:

        secret = get_secret_key()

    or

        secret = request.headers['apikey']
    """
    # Constrain line length as the heuristic's intention is to target lines that resemble
    # function calls. The constraint avoids catastrophic backtracking failures of the regex.
    if len(line) > 1000:
        return False
    line = line.replace('[', '\[').replace(']', '\]')
    return bool(_get_indirect_reference_regex().search(line))

This produces the correct results.

After playing around with the regex I came up with the following that seems to be working (tested on a huge repo with ~1.2k secrets):

def _get_indirect_reference_regex() -> Pattern:
    # Regex details:
    #   ([^\v=!:]*)     ->  Something before the assignment or comparison
    #   \s*             ->  Some optional whitespaces
    #   (:=?|[!=]{1,3}) ->  Assignment or comparison: :=, =, ==, ===, !=, !==
    #   \s*             ->  Some optional whitespaces
    #   (
    #       [\w.-]+     ->  Some alphanumeric character, dot or -
    #       [\[\(]      ->  Start of indirect reference: [ or (
    #       [^\v]*      ->  Something except line breaks
    #       [\]\)]      ->  End of indirect reference: ] or )
    #   )
    # return re.compile(r'([^\v=!:]*)\s*(:=?|[!=]{1,3})\s*([\w.-]+[\[\(][^\v]*[\]\)])') # Left the old one for reference
    return re.compile(r'([^\v=!:"<%>]*)\s*(:=?|[!=]{1,3})\s*([\w.-]+[\[\(][^\v]*[\]\)])')

How do you want to go through with this?
I have no strong opinion.

@lorenzodb1
Copy link
Contributor

Hi @tsigouris007, sorry for the delay in replying. I personally prefer the second option. Thanks for investigating this and coming up with a solution!

@tsigouris007
Copy link
Author

Hi @tsigouris007, sorry for the delay in replying. I personally prefer the second option. Thanks for investigating this and coming up with a solution!

Hi @lorenzodb1 ,
I have updated the PR.
Give it a test if you want, for sanity reasons.

@@ -197,7 +197,7 @@ def _get_indirect_reference_regex() -> Pattern:
# [^\v]* -> Something except line breaks
# [\]\)] -> End of indirect reference: ] or )
# )
return re.compile(r'([^\v=!:]*)\s*(:=?|[!=]{1,3})\s*([\w.-]+[\[\(][^\v]*[\]\)])')
return re.compile(r'([^\v=!:"<%>]*)\s*(:=?|[!=]{1,3})\s*([\w.-]+[\[\(][^\v]*[\]\)])')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsigouris007 could you add a test for this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.
I added two test cases for this change.
Are we good to go?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what you were trying to fix with this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was catching the first case as that was my initial issue as described above.
So, I beefed up the regex and now it catches both cases (broke it in multiple lines for readability, it's not that far from the original one).
I also added a few more test cases that should be caught following this logic and added a brief explanation for each case.
Let me know if this works for you now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants