-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Support for different CSV Encodings in Import_Scripts/Populate_Metadata.py #198
Open
JulianHn
wants to merge
16
commits into
ome:develop
Choose a base branch
from
JulianHn:develop
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 2 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
3ccb2a5
Added support for different csv encodings in Populate_Metadata.py. Re…
JulianHn 10f73a0
Added Test for latin-1 and cp1252 encoding in Populate_Metadata.py
JulianHn e21a065
Added try-except block around the get_original_file_data() code to pr…
JulianHn f75a8a4
Rewrote encoding test case to check all available encodings on the ma…
JulianHn 9921b3a
Added checks for correct omero-py version, that support encodings dif…
JulianHn 779cfd5
Change detection of omero-py version over to a method that directly
JulianHn 3627a2a
Added skip condition to encoding test, if omero-py does not support it
JulianHn 2490122
Keep encoding variable untouched in populate_metadata if EncSup = False
JulianHn dc54b13
Fixed pytest complaints except two of the "line too long" PEP8ify
JulianHn 7f8e019
More flake8 fixes
JulianHn 074acba
Fixed flake8 messages except for some line too longs, that are caused…
JulianHn a971f2f
Added try/except to catch the type-error thrown by Populate_Metadata.…
JulianHn b1b92a3
Added try around the string creation facility to skip test cases
JulianHn 0d8672e
More flake8 fixes
JulianHn 9055d76
More flake8 fixes
JulianHn c322eb4
Changed Caught Exception type, as omero-py upstream changed
JulianHn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably this line fails if you use the wrong encoding? Or only if you use
utf-8
?A try/except that returns a useful message (and/or prints it to stdout) could be helpful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no. I discovered this issue because one of our clients was using german Umlauts in their iso-8859-1 encoded CSV, which indeed raise an UnicodeDecodeError at this position. Unfortunately this does not hold true, if it would be the other way around, i.e. specifying 'iso-8859-1' on a Unicode encoded CSV. This will not return an error, but the string is non-sensical. I presume that this can lead to failures down the line in the script (e.g. when the image name is not correct due to the mis-read string), but it could also just lead to nonsensical annotations.
I'm not sure, how one would go about catching this behaviour in general.
You can see the effect for yourself with this test code:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so it's not guaranteed to fail, but if it does raise a
UnicodeDecodeError
that could be worth catching?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I will try to find ways to break the other encodings as well and check what kind of errors they will give. Might not be 100% but better catching some of the errors than none.