You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tartufo does not scan files with alternatives encodings, such as UTF-16 LE.
This is important because (and I discovered this accidentally because) Powershell converts all standard output to UTF-16 LE, so if you use a command like openssl genrsa > private_key.pem, it will save private_key.pem as a UTF-16 LE-encoded text file. Tartufo will NOT scan such files at all.
If you create a UTF-16 LE-encoded file (you can do this by re-encoding a file in VS Code) and you look at it under a hex editor, you'll see two strange bytes at the beginning of the file (the Byte-Order Mark / BOM), and every other byte will be null (0x00). In the eyes of Git, this makes this file count as a "binary" file. Tartufo ignores binary files, so this file will be ignored.
Non-Solution
Do not use the GIT_DIFF_FORCE_TEXT flag to fix this. This will cause problems when a file is actually a binary file, and the files that are actually text will still be converted to Python strings that contain aberrant characters, and will therefore not match regular expressions and possibly entropy checks.
Possible solution
Attempt to detect the text encoding of a file. If it is text, but it is not UTF-8, re-encode the chunk to be scanned (not the original file itself!) into UTF-8 prior to scanning it. I think you will want to change this here:
# Run regex scans first to trigger a potential fast fail for bad config
ifself.global_options.regexandself.rules_regexes:
forissueinself.scan_regex(chunk):
self.store_issue(issue)
yieldissue
ifself.global_options.entropy:
forissueinself.scan_entropy(chunk):
self.store_issue(issue)
yieldissue
To Reproduce
In PowerShell, run openssl genrsa > private_key.pem. This will generate a UTF-16 LE-encoded private key, which should get flagged by tartufo, but does not.
Run tartufo in the repo. It will not catch this.
Open private_key.pem in VS Code. On the bottom right, you'll see "UTF-16 LE." Click on this and save it with "UTF-8" encoding instead.
Run tartufo again and observe that it scans this file.
Expected Behavior
Tartufo should be able to scan files that are text, but not encoded with UTF-8.
Environment
A Windows machine.
The text was updated successfully, but these errors were encountered:
🐛 Bug Report
Tartufo does not scan files with alternatives encodings, such as UTF-16 LE.
This is important because (and I discovered this accidentally because) Powershell converts all standard output to UTF-16 LE, so if you use a command like
openssl genrsa > private_key.pem
, it will saveprivate_key.pem
as a UTF-16 LE-encoded text file. Tartufo will NOT scan such files at all.If you create a UTF-16 LE-encoded file (you can do this by re-encoding a file in VS Code) and you look at it under a hex editor, you'll see two strange bytes at the beginning of the file (the Byte-Order Mark / BOM), and every other byte will be null (
0x00
). In the eyes of Git, this makes this file count as a "binary" file. Tartufo ignores binary files, so this file will be ignored.Non-Solution
Do not use the
GIT_DIFF_FORCE_TEXT
flag to fix this. This will cause problems when a file is actually a binary file, and the files that are actually text will still be converted to Python strings that contain aberrant characters, and will therefore not match regular expressions and possibly entropy checks.Possible solution
Attempt to detect the text encoding of a file. If it is text, but it is not UTF-8, re-encode the chunk to be scanned (not the original file itself!) into UTF-8 prior to scanning it. I think you will want to change this here:
tartufo/tartufo/scanner.py
Lines 571 to 580 in 3f075ab
To Reproduce
openssl genrsa > private_key.pem
. This will generate a UTF-16 LE-encoded private key, which should get flagged bytartufo
, but does not.tartufo
in the repo. It will not catch this.private_key.pem
in VS Code. On the bottom right, you'll see "UTF-16 LE." Click on this and save it with "UTF-8" encoding instead.tartufo
again and observe that it scans this file.Expected Behavior
Tartufo should be able to scan files that are text, but not encoded with UTF-8.
Environment
A Windows machine.
The text was updated successfully, but these errors were encountered: