Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date/Timestamp SQL Insert format #10

Open
gallejesus opened this issue Dec 26, 2023 · 5 comments
Open

Date/Timestamp SQL Insert format #10

gallejesus opened this issue Dec 26, 2023 · 5 comments

Comments

@gallejesus
Copy link

Hi!

I'm having issues using target when inserting Date and Timestamp format. I believe the target is not overwriting the SQL for insrts and the SQL statement singer is building is of this format:

f"""\
INSERT INTO {full_table_name}
({", ".join(property_names)})
VALUES ({", ".join([f":{name}" for name in property_names])})
"""

Which for oracle fails as it is inseting dates with plain strings not casting to dates o adding the date keyword before the str.

Am I missing some config to tune this insert statement?

Thank you!

@radbrt
Copy link
Owner

radbrt commented Dec 28, 2023

Hi @gallejesus, you are surely right. I haven't tested this on dates yet, I can take a look at it in the next few days.

@radbrt
Copy link
Owner

radbrt commented Dec 30, 2023

@gallejesus I tried to replicate this locally without success - meaning that without any changes to the code, my date column in the postgres source was written as a date to Oracle. If you have anything to help me replicate it, let me know and I can hopefully identify the problem. The output from the postgres tap was as follows (singer message records):

{"type": "STATE", "value": {}}
{"type": "SCHEMA", "stream": "public-yay_datea", "schema": {"properties": {"xid": {"type": ["integer", "null"]}, "val": {"type": ["string", "null"]}, "dt": {"format": "date", "type": ["string", "null"]}}, "type": "object"}, "key_properties": []}
{"type": "RECORD", "stream": "public-yay_datea", "record": {"xid": 1, "val": "YAY", "dt": "2023-01-01"}, "time_extracted": "2023-12-28T14:31:00.035630+00:00"}
{"type": "RECORD", "stream": "public-yay_datea", "record": {"xid": 2, "val": "HAY", "dt": "2023-01-01"}, "time_extracted": "2023-12-28T14:31:00.036644+00:00"}
{"type": "RECORD", "stream": "public-yay_datea", "record": {"xid": 3, "val": "WAY", "dt": "2023-01-01"}, "time_extracted": "2023-12-28T14:31:00.036749+00:00"}
{"type": "STATE", "value": {"bookmarks": {"public-yay_datea": {}}}}

Note the "format": "date" key in the schema line. I haven't tested it with datetime/timestamp.

The result in Oracle:
date_column_result

You can test this yourself by saving the singer records to a file and run cat <singer_records.jsonl> | meltano invoke target-oracle.

@gallejesus
Copy link
Author

Hi @radbrt,

Yes, I can replicate your case and it works fine, so there must be something with how records are extracted and passed into the target.
Originally I was using Clickhouse as origin database using the extractor from CH, but I can replicate it with postgres as well.

So I'm running postgres and oracle (free version) in docker containers:

docker run -d --network host --name oracle container-registry.oracle.com/database/free:latest
docker run -d --network host -e POSTGRES_PASSWORD=pwd postgres

And this is my meltano.yml file:

version: 1
default_environment: dev
project_id: bac4eb1b-c776-4cfc-a62c-e081fb552a56
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-postgres
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-postgres.git
    config:
      batch_config:
        encoding:
          compression: gzip
          format: jsonl
        storage:
          prefix: ''
          root: /tmp
      database: postgres
      host: localhost
      user: postgres
    select:
    - public-YAY_DATEA.*
  loaders:
  - name: target-oracle
    variant: radbrt
    pip_url: git+https://github.com/radbrt/target-oracle.git
    config:
      database: FREE
      host: localhost
      port: '1521'
      username: system
      sqlalchemy_url: oracle+cx_oracle://system:oracle@localhost:1521/FREE
      prefer_float_over_numeric: false

The table in postgres has been created using this DDL and some examples inserts:

CREATE TABLE public."YAY_DATEA" (
    "XID" INTEGER,
    "VAL" VARCHAR(4000),
    "DT" DATE
);

INSERT INTO public."YAY_DATEA" ("XID", "VAL", "DT")
VALUES
    (1, 'Value1', '2024-01-02'),
    (2, 'Value2', '2024-01-03'),
    (3, 'Value3', '2024-01-04');

Executing this pipeline in this way:

meltano run tap-postgres target-oracle

This is the error message I'm getting (that is the one refering about the format for the dates):

2024-01-02T11:49:21.198073Z [info     ]     cursor.executemany(statement, parameters) cmd_type=elb consumer=True name=target-oracle producer=False stdio=stderr string_id=target-oracle
2024-01-02T11:49:21.198482Z [info     ] sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-01861: literal does not match format string cmd_type=elb consumer=True name=target-oracle producer=False stdio=stderr string_id=target-oracle
2024-01-02T11:49:21.198882Z [info     ] [SQL: INSERT INTO yay_datea    cmd_type=elb consumer=True name=target-oracle producer=False stdio=stderr string_id=target-oracle
2024-01-02T11:49:21.199306Z [info     ] (xid, val, dt)                 cmd_type=elb consumer=True name=target-oracle producer=False stdio=stderr string_id=target-oracle
2024-01-02T11:49:21.199748Z [info     ] VALUES (:xid, :val, :dt)]      cmd_type=elb consumer=True name=target-oracle producer=False stdio=stderr string_id=target-oracle
2024-01-02T11:49:21.200139Z [info     ] [parameters: [{'xid': 1, 'val': 'Value1', 'dt': '2024-01-02'}, {'xid': 2, 'val': 'Value2', 'dt': '2024-01-03'}, {'xid': 3, 'val': 'Value3', 'dt': '2024-01-04'}]] cmd_type=elb consumer=True name=target-oracle producer=False stdio=stderr string_id=target-oracle
2024-01-02T11:49:21.200509Z [info     ] (Background on this error at: https://sqlalche.me/e/14/4xp6) cmd_type=elb consumer=True name=target-oracle producer=False stdio=stderr string_id=target-oracle
2024-01-02T11:49:21.327667Z [error    ] Loader failed
2024-01-02T11:49:21.328221Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False

Let me know if with this you can replicate the scenario and get the same issue. Thank you!

@radbrt
Copy link
Owner

radbrt commented Jan 2, 2024

Thank you @gallejesus. I was able to replicate it now. The error message is indeed very confusing and I haven't gotten to the bottom of it yet, but it seems to be related to batch messages. If I removed the batch config on the tap, it did not trigger an error. I'll look into it more, batch messages are great, but I have no idea how long it will take to fix this.

@gallejesus
Copy link
Author

Yes! It is true it works without batching, I can work with this and continue playing with it.

If I take a deeper look and find something that may be helpful to fix it I'll let you know here. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants