Pybootchartgui: UnicodeDecodeError #77

AzureCrimson · 2018-04-28T13:49:57Z

Cause of problem

My dmesg contains escape characters, preventing pybootchartgui from processing bootchart logs on my machine.

Suggested fix

Replace

bootchart/pybootchartgui/parsing.py

Line 529 in 331ada0

for line in file.read().decode('utf-8').split('\n'):

with

for line in file.read().decode('utf-8', 'ignore').split('\n'):

or another Python UnicodeDecodeError handling method.

Failing case

stripped dmesg output

[ 0.000000] ACPI: BGRT 0x000000007885BEC8 000038 (v00 \xfffffff3\xffffffee 01072009 AMI 00010013)

pybootchartgui stack trace

Traceback (most recent call last): File "/usr/lib/python-exec/python3.4/pybootchartgui", line 23, in <module> sys.exit(main()) File "/usr/lib64/python3.4/site-packages/pybootchartgui/main.py", line 124, in main trace = parsing.Trace(writer, args, options) File "/usr/lib64/python3.4/site-packages/pybootchartgui/parsing.py", line 52, in __init__ parse_paths (writer, self, paths) File "/usr/lib64/python3.4/site-packages/pybootchartgui/parsing.py", line 716, in parse_paths state = _do_parse(writer, state, name, tf.extractfile(name)) File "/usr/lib64/python3.4/site-packages/pybootchartgui/parsing.py", line 674, in _do_parse state.kernel = _parse_dmesg(writer, file) File "/usr/lib64/python3.4/site-packages/pybootchartgui/parsing.py", line 529, in _parse_dmesg for line in file.read().decode('utf-8').split('\n'): UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 57: invalid continuation byte

stripped failing bootchart.tgz (base64 encoded, use base64 -d to decode)

H4sIANpz5FoAA+3RwQsBQRTH8Tn7K96Ri97sLLvcrCQHJbnJQWxyQO0i+aNd/ANmSVHKSaLvp1ev mXmHN/3mqzRfmM9Srx6GRbdRTR/7lQ2tsS6IrPpy1hQX1hnRD+91tcu300zETI+7LJ1ly1W+Wb+a e/f+o8biafUWxERa7UGvKUl3OBI93POJ4riWdNqxFAcXS3mvKueTPFGfXqDakFa/J8Wk9eUqpW9/ EAAAAAAAAAAAAAAAAAD+1AVFIhhQACgAAA==

Link to above files (as Github attachments are broken)

https://drive.google.com/drive/folders/12bXkurAEv3kntzk5ZPk55Jyibn3o3-Kr

The text was updated successfully, but these errors were encountered:

xrmx · 2018-04-28T18:00:29Z

Care to open a PR please?

When Python's bytes.decode() method encounters encounters a byte sequence that cannot be decoded, it will take an action dependent on its second argument: 'strict': raise UnicodeDecodeError exception (default) 'replace': insert U+FFFD 'ignore': skip to next character While most inputs appear to be sanitized, dmesg output is passed to _parse_dmesg() as is, and can contain data that escapes to invalid Unicode. When the parser attempts to decode this data it immediately raises an exception and dies, as seen in xrmx#77. To prevent this issue, I set the error handling method to 'replace', as 'ignore' can hide decoding errors from developers working with *really* broken dmesg logs. The parts of dmesg the parser looks at should be in a standard format anyway, so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be too harmful.

AzureCrimson mentioned this issue Apr 28, 2018

Make dmesg parsing non-UTF-tolerant #78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pybootchartgui: UnicodeDecodeError #77

Pybootchartgui: UnicodeDecodeError #77

AzureCrimson commented Apr 28, 2018

xrmx commented Apr 28, 2018

Pybootchartgui: UnicodeDecodeError #77

Pybootchartgui: UnicodeDecodeError #77

Comments

AzureCrimson commented Apr 28, 2018

Cause of problem

Suggested fix

Failing case

stripped dmesg output

pybootchartgui stack trace

stripped failing bootchart.tgz (base64 encoded, use base64 -d to decode)

Link to above files (as Github attachments are broken)

xrmx commented Apr 28, 2018