Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panoptes_aggregation reduce tries to eval text fields #34

Open
bogden1 opened this issue Nov 21, 2022 · 2 comments
Open

panoptes_aggregation reduce tries to eval text fields #34

bogden1 opened this issue Nov 21, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@bogden1
Copy link
Collaborator

bogden1 commented Nov 21, 2022

The reduction step produces warning of these forms:

<string>:1: SyntaxWarning: 'int' object is not subscriptable; perhaps you missed a comma?
<string>:1: SyntaxWarning: list indices must be integers or slices, not ellipsis; perhaps you missed a comma?
<string>:1: SyntaxWarning: 'list' object is not callable; perhaps you missed a comma?

This appears to be due to text that contains [ or { getting eval'd during the reduction process, through this call chain:

reduce_panoptes_csv.py:reduce_subject -> csv_utils.py:unflatten_data

I guess the intent is to correctly translate text-ified Python data back into data, but this seems not right for pure text fields.

@bogden1 bogden1 added the bug Something isn't working label Nov 21, 2022
@bogden1
Copy link
Collaborator Author

bogden1 commented Nov 21, 2022

Running an instrumented version of panoptes_aggregation, with tag v4.0.0 (matching our requirements.txt) I have learned which strings are successfully eval'd.

I'm not sure what the consequences of this are. It looks like it could be worse if someone happens to write valid Python dictionary syntax in a text field. Some things get translated into strings, which is what they were in the first place. Sometimes a string gets converted into a list, e.g. [ ] becomes an empty list, [00] becomes a list with the single element 0.

The most common case seems to be '[...]', which gets converted into a list containing an instance of class Ellipsis, which is not ideal. The same goes for curlies ({...} gives a set containing an Ellipsis instance). The Ellipsis class does at least print as the word "Ellipsis", but I have no idea what the consequences may be of having unexpected Ellipsis objects sloshing around.

On the whole, it looks like the impact is not too bad. In phase1 I have (with counts):

  1 [  ]
112 [...]
  7 [[...]]
  1 []
  3 {...}
  7 [00]

This is for the text workflows only -- in phase 1 we also have a couple of dropdowns, where I think that this eval behaviour is actually what we want.

and in phase2 (which is incomplete) I have the following strings that are successfully eval'd:

  1   [    ]
  1   [ ]
395   [...]
  1   {...}
 52   [00]
  1   "For [...] to do as the is ordered by the Doctor"

That makes totals of:

131/1,231,500 cells for phase 1 (0.01% of transcribed cells)
451/1,292,564 cells for phase 2 (0.03% of transcribed cells)

A quick look at the phase 2 cases suggests that, other than ... being replaced with Ellipsis, these eval'd strings tend to make it through to the reduction output more or less as though they had not been eval'd.

Other strings are not successfully eval'd, which sometimes produces the SyntaxWarnings that alerted me to this problem in the first place. Unsuccessful eval results in the original value being retained, which I believe is the behaviour that we want.

I have confirmed that all of the SyntaxWarnings are produced by the attempted evals.

@bogden1 bogden1 changed the title panoptes_aggregation reduce tries to eval text strings panoptes_aggregation reduce tries to eval text strings Nov 21, 2022
@bogden1 bogden1 changed the title panoptes_aggregation reduce tries to eval text strings panoptes_aggregation reduce tries to eval text fields Nov 21, 2022
@bogden1
Copy link
Collaborator Author

bogden1 commented Nov 22, 2022

Given that the impact appears to be minimal, I think that this can be closed. However, we may want to draw Zooniverse's attention to it.

Update Reported it, they are tracking it here: zooniverse/aggregation-for-caesar#654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

1 participant