Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: JSON serialization of date types causes schema validation to fail in target-postgres #1831

Closed
1 task
dluftspring opened this issue Jul 11, 2023 · 8 comments · Fixed by #2580
Closed
1 task
Assignees
Labels
kind/Bug Something isn't working valuestream/SDK

Comments

@dluftspring
Copy link

dluftspring commented Jul 11, 2023

Singer SDK Version

0.30.0

Meltano version 2.19.1

Is this a regression?

  • Yes

Python Version

3.10

Bug scope

Targets (data type handling, batching, SQL object generation, etc.)

Operating System

Linux, Mac OS

Description

When syncing data between tap-snowflake and target-postgres this line causes JSON schema validation to fail because the inferred json schema from Snowflake is a date but this coerces the type into a datetime.

To summarize

  • Column is of type date in Snowflake
  • Target column is of type date in Postgres
  • Because of the type conformation code schema validation fails in the target because it is expecting format: string type: date and instead receives format: string type: date-time.

Why is that extra timestamp string being added on?

Additionally you can verify that this is a problem by syncing any snowflake date colum into target-jsonl and see that it is being serialized as a timestamp instead of a regular date even though

json.dumps(datetime.date(<year>,<month>,<day>).isoformat())

produces perfectly valid JSON. There's probably a reason that little string is added where it is but it does cause some problems for Meltano.

Code

2023-07-10T09:25:22.412171Z [info     ] jsonschema.exceptions.ValidationError: '2019-08-31T00:00:00+00:00' is not a 'date' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412300Z [info     ]                                cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412421Z [info     ] Failed validating 'format' in schema['properties']['expiration_date']: cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412552Z [info     ]     {'format': 'date', 'type': ['string', 'null']} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412667Z [info     ]                                cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412781Z [info     ] On instance['expiration_date']: cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412895Z [info     ]     '2019-08-31T00:00:00+00:00' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.476401Z [error    ] Loader failed
2023-07-10T09:25:22.476859Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False
@dluftspring dluftspring added kind/Bug Something isn't working valuestream/SDK labels Jul 11, 2023
@tayloramurphy
Copy link
Collaborator

@edgarrmondragon is this related / similar to the Decimal bugs we were seeing?

@edgarrmondragon
Copy link
Collaborator

@edgarrmondragon is this related / similar to the Decimal bugs we were seeing?

@tayloramurphy nope, this is a decision made very early in the sdk history to add T00:00:00+00:00 to all date values:

return f"{elem.isoformat()}T00:00:00+00:00"

I don't recall why that's the case, and I don't know if removing it would result in a breaking change for any taps/targets.

@tayloramurphy
Copy link
Collaborator

@edgarrmondragon this is a real Chesterton's fence situation... it doesn't really make sense to me why it would coerce a date to a datetime, particularly when those are distinct types. Thoughts on what a possible mitigation would be? Should this be configurable by the end user?

@visch
Copy link
Contributor

visch commented Jul 12, 2023

https://github.com/MeltanoLabs/tap-postgres/pull/172/files One possible implementation to fix it :D

@pnadolny13
Copy link
Contributor

@tayloramurphy @edgarrmondragon I also just ran into this when using tap-snowflake and created MeltanoLabs/tap-snowflake#26 before I knew this issue existed. It looks like this is the old MR that added it originally https://gitlab.com/meltano/sdk/-/merge_requests/42 but its a giant release MR so its hard to grok.

@edgarrmondragon
Copy link
Collaborator

I think a reasonable solution inspired by Derek's implementation would be to encapsulate all the conforming methods into a dedicated class, add it to the base stream class, and allow tap developers to override the primitive-level conforming method.

@edgarrmondragon
Copy link
Collaborator

I think I have a solution for this that

  1. Maintains backwards compatibility with whatever relied on this originally (Chesterton's fence)
  2. Makes type coercion easily extensible so extractors can manage their own

The implementation consists on refactoring _conform_primitive_property using functools.singledispatch.

@nidhi-akkio
Copy link

I ran into this issue as we are trying to migrate our integrations at Akkio to Meltano. I am trying to migrate snowflake and got an error when uploading to target-s3 from tap-snowflake since date type is expected. I am considering a patch similar to https://github.com/MeltanoLabs/tap-postgres/pull/172/files where we override _conform_primitive_property for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/Bug Something isn't working valuestream/SDK
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants