Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

astropy 5 not compatible with focalplane calibration ecsv #157

Open
sbailey opened this issue Feb 17, 2022 · 13 comments
Open

astropy 5 not compatible with focalplane calibration ecsv #157

sbailey opened this issue Feb 17, 2022 · 13 comments

Comments

@sbailey
Copy link
Contributor

sbailey commented Feb 17, 2022

Focal plane calibration ecsv files used for updating the desimodel focalplane model are not compatible with astropy 5.0 (thanks to @kfanning for reporting).

Using current main environment based on astropy 5.0:

x = Table.read('/global/cfs/cdirs/desi/engineering/focalplane/calibration/20220216T170044+0000_fp_calibs.ecsv')
...
ValueError: datatype 'object' of column 'POS_NEIGHBORS' is not in allowed values ('bool', 'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', 'float16', 'float32', 'float64', 'float128', 'string')

This works fine with astropy 4.0.1.post1:

x = Table.read('/global/cfs/cdirs/desi/engineering/focalplane/calibration/20220216T170044+0000_fp_calibs.ecsv') 
print(x['POS_NEIGHBORS'].dtype)
print(type(x['POS_NEIGHBORS'][0]))
print(x['POS_NEIGHBORS'][0])

Output:
object
<class 'str'>
{'M04109', 'M04110', 'M03064'}

In the input file, there are two "object" columns:

# - {name: POS_NEIGHBORS, datatype: object, description: neighboring positioners}
# - {name: FIXED_NEIGHBORS, datatype: object, description: neighboring fixed boundaries}

that are dumps as string representations of arrays in the table rows:

2 7 M00703 31 2.8362 3.0768 61.28 -9.06874965873426 85.3871 50.4355 393.332167989971 195.995177091664 1.0 1.0 0.0 0.0 0.0 0.0 False False True True -31.8049446343454 159.716336663025 84.77343350581931 51.82079290229973 -98.95183412887425 7.80823591846599 190.6660839949855 -193.6660839949855 182.0 -4.995177091664004 196.6660839949855 -196.6660839949855 185.0 -10.995177091664004 "{'M04109', 'M04110', 'M03064'}" set() "[[0.8140000000000004, 2.083, 2.613, 4.154, 4.695, 5.094, 5.094, 4.831, 4.235, 3.4320000000000004, 2.28, -1.9020000000000001, -2.007, -1.1390000000000002, -0.17, 0.8140000000000004], [-3.2360000000000007, -2.7070000000000003, -2.665, -2.759, -1.6800000000000002, -0.54, 0.54, 1.2910000000000001, 1.933, 2.283, 2.283, -0.9350000000000002, -2.665, -3.137, -3.332, -3.2360000000000007]]" "[[3.88, 3.831, 3.182, -1.1720000000000002, -1.1720000000000002, 3.182, 3.831, 3.88], [0.0, 1.0140000000000002, 1.5830000000000002, 1.0369999999999997, -1.0369999999999997, -1.5830000000000002, -1.0140000000000002, 0.0]]" 85.38109795747424 -98.62699133507905 50.43195478045504 9.288985006831506

Other snippets from the header:

# %ECSV 0.9
...
# schema: astropy-2.0

@weaverba137 could you investigate if this is actually invalid ecsv that astropy 4.0.1 was more forgiving about, or whether this is an astropy 5.0 parsing bug?

@kfanning could you confirm which version of astropy is used to dump these, or whether they are being constructed by hand outside of astropy?

@sbailey
Copy link
Contributor Author

sbailey commented Feb 17, 2022

From @kfanning in Slack while I was filing ticket:

Looks like the mountain is using astropy 4.2.1 to write these files (they are astropy.table objects in the code). I tried to do a quick search in the astropy docs and it doesn't look like it is forbidden to have "object" columns (if anything they point out these are allowed) but nevertheless astropy 5.0 raises a value error ValueError: datatype 'object' of column 'POS_NEIGHBORS' is not in allowed values ('bool', 'int8', 'int16', 'int32', 'int64', 'uint8', 'uint16', 'uint32', 'uint64', 'float16', 'float32', 'float64', 'float128', 'string') when reading in the ecsv file with astropy.table.Table.read

@sbailey
Copy link
Contributor Author

sbailey commented Feb 17, 2022

If one version of astropy can write a table to ecsv, it seems reasonable for future versions of astropy to be able to read it, so this looks like an astropy feature regression to me, or otherwise an astropy 4.x table writing bug if it is genuinely writing invalid ecsv.

Mini test:

t = Table()
t['a'] = [1,2,3]      
t['b'] = np.array([[1,], [1,2], [1,2,3]], dtype=object)
t.write('blat.ecsv')
tx = Table.read('blat.ecsv')

both astropy 4 and 5 are able to internally round-trip this case, and astropy 4 can read the astropy 5 file but not vice-a-versa (i.e. the opposite of the typical case: astropy 4 is future compatible, but astropy 5 is not backwards compatible...)

Astropy 5 produces:

# %ECSV 1.0
# ---
# datatype:
# - {name: a, datatype: int64}
# - {name: b, datatype: string, subtype: json}
# schema: astropy-2.0
a b
1 [1]
2 [1,2]
3 [1,2,3]

while astropy 4 produces:

# %ECSV 0.9
# ---
# datatype:
# - {name: a, datatype: int64}
# - {name: b, datatype: object}
# schema: astropy-2.0
a b
1 [1]
2 "[1, 2]"
3 "[1, 2, 3]"

[I probably should have posted that to astropy instead, but I've also probably already spent too much time on this today so I'm going to go back to Fuji for now...]

@weaverba137
Copy link
Member

weaverba137 commented Feb 17, 2022

For reference, this change originates in Astropy 4.3, https://docs.astropy.org/en/stable/changelog.html#id51.

Actually, I think between that and the example above, we have a workaround:

  1. Change ECSV 0.9 to ECSV 1.0.
  2. Change object to string, subtype:json.
  3. Remove quotes around the affected columns.

Still, is it really the intention of the originating code to write out a JSON-valued column?

@weaverba137
Copy link
Member

weaverba137 commented Feb 17, 2022

Found the relevant astropy issue: astropy/astropy#12840. The eventual fix will be to convert the exception into a warning, but that is still an open PR.

@weaverba137
Copy link
Member

@sbailey, could you let me know what the level of urgency is on this? I have no familiarity with these files or even whether they are used by the pipeline at all.

@weaverba137
Copy link
Member

See also https://github.com/astropy/astropy-APEs/blob/main/APE6.rst#object-columns for the official documentation of object columns in ECSV 1.0. The short version is that object columns are expected to be JSON-encoded.

A string like "{'M03938', 'M03908', 'M04218', 'M04170', 'M02941', 'M01389'}" is not valid JSON. It appears someone was trying to encode a set. So even if astropy adds backward-compatibility, it still might not be able to decode that properly. set() is also not valid JSON.

@kfanning
Copy link
Member

These files are used in the daily focal plane calibration sync for ops planning. At the moment this isn't causing any issues while that process uses the desi 21.7e environment.

It sounds like the format of the file should be changed before running this process in a newer environment. I've looked at Joe's script making these and I think I can change it. I think it will be the least intrusive to save these as JSON encoded lists of strings, rather than saving these columns with sets in them and hoping for the best.

I need to know if the daily sync uses these columns at all to make sure I don't break anything by changing the format.

Do we want older files updated to the newer format? Do we have a location to store the originally formatted old files? Do we even care about the old files?

@tskisner
Copy link
Member

Sorry I am late to this discussion, just a couple points:

  • The focalplane sync loads the calibration dump as an astropy table, but it does not actually use the two columns in question (POS_NEIGHBORS and FIXED_NEIGHBORS).
  • The desimodel data svn is the record of the changing focalplane state with time used by fiberassign. I don't think we would ever "re-sync" all the calibration dumps into a fresh, empty svn repo.

@tskisner
Copy link
Member

For additional reference, the function which takes the calibration dump table and extracts the information is here

@schlafly
Copy link
Contributor

Sorry to let this hang. I think an ideal solution would be one where we update the format of the FP dumps to a new format which breaks neither astropy 4 or astropy 5. Ted has confirmed that we don't use these two columns, so it sounds like it may be possible to produce an astropy 4 & 5 compatible file? Then we can migrate the FP sync script install to astropy 5.

@weaverba137
Copy link
Member

That sounds fine with me.

@weaverba137
Copy link
Member

@schlafly, did this ever get resolved?

@schlafly
Copy link
Contributor

It looks like astropy was updated to change this to a warning, so reading these files with astropy 5 today is possible with that warning (for now). I don't know who is maintaining the FP dumping code but that is really where any updates need to happen. @ClairePpt , any idea who would be best to update the focal plane dump output? The issue is that the files currently contain some object value columns which astropy is trying to deprecate or change. We don't actually use them downstream though the focal plane team may well use them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants