Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pybootchartgui: UnicodeDecodeError #77

Open
AzureCrimson opened this issue Apr 28, 2018 · 1 comment
Open

Pybootchartgui: UnicodeDecodeError #77

AzureCrimson opened this issue Apr 28, 2018 · 1 comment

Comments

@AzureCrimson
Copy link
Contributor

Cause of problem

My dmesg contains escape characters, preventing pybootchartgui from processing bootchart logs on my machine.


Suggested fix

Replace

for line in file.read().decode('utf-8').split('\n'):

with

for line in file.read().decode('utf-8', 'ignore').split('\n'):

or another Python UnicodeDecodeError handling method.


Failing case

stripped dmesg output

[ 0.000000] ACPI: BGRT 0x000000007885BEC8 000038 (v00 \xfffffff3\xffffffee 01072009 AMI 00010013)

pybootchartgui stack trace

Traceback (most recent call last): File "/usr/lib/python-exec/python3.4/pybootchartgui", line 23, in <module> sys.exit(main()) File "/usr/lib64/python3.4/site-packages/pybootchartgui/main.py", line 124, in main trace = parsing.Trace(writer, args, options) File "/usr/lib64/python3.4/site-packages/pybootchartgui/parsing.py", line 52, in __init__ parse_paths (writer, self, paths) File "/usr/lib64/python3.4/site-packages/pybootchartgui/parsing.py", line 716, in parse_paths state = _do_parse(writer, state, name, tf.extractfile(name)) File "/usr/lib64/python3.4/site-packages/pybootchartgui/parsing.py", line 674, in _do_parse state.kernel = _parse_dmesg(writer, file) File "/usr/lib64/python3.4/site-packages/pybootchartgui/parsing.py", line 529, in _parse_dmesg for line in file.read().decode('utf-8').split('\n'): UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 57: invalid continuation byte

stripped failing bootchart.tgz (base64 encoded, use base64 -d to decode)

H4sIANpz5FoAA+3RwQsBQRTH8Tn7K96Ri97sLLvcrCQHJbnJQWxyQO0i+aNd/ANmSVHKSaLvp1ev mXmHN/3mqzRfmM9Srx6GRbdRTR/7lQ2tsS6IrPpy1hQX1hnRD+91tcu300zETI+7LJ1ly1W+Wb+a e/f+o8biafUWxERa7UGvKUl3OBI93POJ4riWdNqxFAcXS3mvKueTPFGfXqDakFa/J8Wk9eUqpW9/ EAAAAAAAAAAAAAAAAAD+1AVFIhhQACgAAA==

Link to above files (as Github attachments are broken)

https://drive.google.com/drive/folders/12bXkurAEv3kntzk5ZPk55Jyibn3o3-Kr

@xrmx
Copy link
Owner

xrmx commented Apr 28, 2018

Care to open a PR please?

AzureCrimson added a commit to AzureCrimson/bootchart that referenced this issue Apr 28, 2018
When Python's bytes.decode() method encounters encounters a byte sequence
that cannot be decoded, it will take an action dependent on its second argument:
'strict': raise UnicodeDecodeError exception (default)
'replace': insert U+FFFD
'ignore': skip to next character

While most inputs appear to be sanitized, dmesg output is passed to
_parse_dmesg() as is, and can contain data that escapes to invalid Unicode.
When the parser attempts to decode this data it immediately raises an exception
and dies, as seen in xrmx#77.

To prevent this issue, I set the error handling method to 'replace', as 'ignore' can
hide decoding errors from developers working with *really* broken dmesg logs.
The parts of dmesg the parser looks at should be in a standard format anyway,
so a U+FFFD (Replacement Character) or two after the timestamp shouldn't be
too harmful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants