Make dmesg parsing non-UTF-tolerant

When Python's bytes.decode() method encounters encounters a byte sequence that cannot be decoded, it will take an action dependent on its second argument: 'strict': raise UnicodeDecodeError exception (default) 'replace': insert U+FFFD 'ignore': skip to next character While most inputs appear to be sanitized, dmesg output is passed to _parse_dmesg() as is, and can contain data that escapes to invalid Unicode. When the parser attempts to decode this data it immediately raises an exception and dies, as seen in xrmx#77. To prevent this issue, I set the error handling method to 'replace', as 'ignore' can hide decoding errors from developers working with *really* broken dmesg logs. The parts of dmesg the parser looks at should be in a standard format anyway, so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be too harmful.
AzureCrimson · Apr 28, 2018 · 13af822 · 13af822
1 parent 331ada0
commit 13af822
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/pybootchartgui/parsing.py b/pybootchartgui/parsing.py
@@ -526,7 +526,7 @@ def _parse_dmesg(writer, file):
     processMap['k-boot'] = kernel
     base_ts = False
     max_ts = 0
-    for line in file.read().decode('utf-8').split('\n'):
+    for line in file.read().decode('utf-8', 'replace').split('\n'):
         t = timestamp_re.match (line)
         if t is None:
 #                       print "duff timestamp " + line