Skip to content

Commit

Permalink
Make dmesg parsing non-UTF-tolerant
Browse files Browse the repository at this point in the history
When Python's bytes.decode() method encounters encounters a byte sequence
that cannot be decoded, it will take an action dependent on its second argument:
'strict': raise UnicodeDecodeError exception (default)
'replace': insert U+FFFD
'ignore': skip to next character

While most inputs appear to be sanitized, dmesg output is passed to
_parse_dmesg() as is, and can contain data that escapes to invalid Unicode.
When the parser attempts to decode this data it immediately raises an exception
and dies, as seen in xrmx#77.

To prevent this issue, I set the error handling method to 'replace', as 'ignore' can
hide decoding errors from developers working with *really* broken dmesg logs.
The parts of dmesg the parser looks at should be in a standard format anyway,
so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be
too harmful.
  • Loading branch information
AzureCrimson authored Apr 28, 2018
1 parent 331ada0 commit 13af822
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion pybootchartgui/parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,7 @@ def _parse_dmesg(writer, file):
processMap['k-boot'] = kernel
base_ts = False
max_ts = 0
for line in file.read().decode('utf-8').split('\n'):
for line in file.read().decode('utf-8', 'replace').split('\n'):
t = timestamp_re.match (line)
if t is None:
# print "duff timestamp " + line
Expand Down

0 comments on commit 13af822

Please sign in to comment.