You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just a general question - is there something we can/should be doing to make ourselves less brittle against passing NaN's from the search result table to other functions/operations? Or are we fine as is? These do seem to be sensible missing data values, but we want to be as easily maintainable and robust as we reasonably can.
In the process of working on #13 I discovered that the NaNs added into the pandas DataFrame search result table broke things in some unexpected ways when they got passed to astroquery. Particularly when a NaN dataURI got passed to astroquery.mast.observations.get_cloud_uris. These NaNs get introduced when query columns are empty when we use pd.concat() to join tables together. We're doing this as an outer join to preserve as much information in the columns as possible.
My most recent example example wasn't an issue in mainline lightkurve as this is from where we concatenate a DataFrame of TESSCut Information to our main self.table. However, that main self.table also has lots of NaNs from a similar operation where we concatenate the tables from astroquery.mast.observations.query_criteria and astroquery.mast.observations.get_product_list. So this may be representative of future concerns if we expand to run operations on more columns and make our tables more generic compared to the current lightkurve.
The text was updated successfully, but these errors were encountered:
I think that the NaNs should be handled on a case-by-case basis. Where it breaks functionality, either the NaNs should be filled appropriately for that specific application, or if there's no appropriate substitute drop the lines and return the available results, warning about the data that didn't return results.
Just a general question - is there something we can/should be doing to make ourselves less brittle against passing NaN's from the search result table to other functions/operations? Or are we fine as is? These do seem to be sensible missing data values, but we want to be as easily maintainable and robust as we reasonably can.
In the process of working on #13 I discovered that the NaNs added into the pandas DataFrame search result table broke things in some unexpected ways when they got passed to astroquery. Particularly when a NaN dataURI got passed to astroquery.mast.observations.get_cloud_uris. These NaNs get introduced when query columns are empty when we use pd.concat() to join tables together. We're doing this as an outer join to preserve as much information in the columns as possible.
My most recent example example wasn't an issue in mainline lightkurve as this is from where we concatenate a DataFrame of TESSCut Information to our main self.table. However, that main self.table also has lots of NaNs from a similar operation where we concatenate the tables from astroquery.mast.observations.query_criteria and astroquery.mast.observations.get_product_list. So this may be representative of future concerns if we expand to run operations on more columns and make our tables more generic compared to the current lightkurve.
The text was updated successfully, but these errors were encountered: