Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error attempting to pull meteorite data from NASA #3

Open
simonw opened this issue Feb 25, 2019 · 2 comments
Open

Error attempting to pull meteorite data from NASA #3

simonw opened this issue Feb 25, 2019 · 2 comments
Labels
bug Something isn't working

Comments

@simonw
Copy link

simonw commented Feb 25, 2019

https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh

/tmp $ socrata2sql insert data.nasa.gov gh4g-9sfh
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.

Connecting to database
  ▶ Using default SQLite database "sqlite:///meteorite_landings.sqlite".
  ▶ Query "SELECT PostGIS_version();" failed. Geometry columns will be skipped.

Setting up new table, "meteorite_landings", from Socrata API fields
  ▶ "geolocation" is a location column but your database doesn't support PostGIS so it'll be skipped.
  ▶ Loading from API ◉◉◉◉◉◉◉◉◉◉◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯◯ 32%Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/socrata2sql/parsers.py", line 11, in parse_datetime
    return datetime.strptime(str_val, "%Y-%m-%dT%H:%M:%S.%f")
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 359, in _strptime
    (data_string, format))
ValueError: time data '-0300-01-01T00:00:00' does not match format '%Y-%m-%dT%H:%M:%S.%f'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/socrata2sql", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/socrata2sql/cli.py", line 299, in main
    to_insert.append(Binding(**parse_row(row, Binding)))
  File "/usr/local/lib/python3.7/site-packages/socrata2sql/cli.py", line 258, in parse_row
    parsed[col_name] = parsers[mapper_col_type](col_val)
  File "/usr/local/lib/python3.7/site-packages/socrata2sql/parsers.py", line 14, in parse_datetime
    return datetime.strptime(str_val, "%Y-%m-%dT%H:%M:%S")
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 359, in _strptime
    (data_string, format))
ValueError: time data '-0300-01-01T00:00:00' does not match format '%Y-%m-%dT%H:%M:%S'

It looks like this is due to invalid data, but it would be nice if socrata2sql could either show a nicer error message or had an option to report and skip invalid rows.

@achavez
Copy link
Contributor

achavez commented Feb 25, 2019

I went down a rabbit hole on this one and I'm a bit stumped as to how to handle this ...

I think this is actually good data (ex: https://www.lpi.usra.edu/meteor/metbull.php?code=24259) and it appears that Socrata validates it on upload so we ought to be able to expect one of their two documented timestamp formats. Unfortunately it seems Python doesn't support anything before year 1. The database backends do support dates with negative years but SQL Alchemy requires a datetime from the standard library so there's not an easy way to get one into the DB via SQL Alchemy.

So I'm tempted to import them as NULLs as the best worst option for now and raise a loud warning in the console. Thoughts?

@achavez achavez added the bug Something isn't working label Feb 25, 2019
@simonw
Copy link
Author

simonw commented Mar 10, 2019

Oh that's nasty! It's a real date but it's prior to 1AD... yeah I guess NULL with a warning is the best option then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants