Unable to write SUPER type column to Redshift using redshift.copy #3095

duarteocarmo · 2025-02-14T17:49:19Z

Describe the bug

Description

Having issues writing data to a Redshift table containing a SUPER type column using awswrangler.redshift.copy. Even with serialize_to_json=True, the SUPER type column is not properly handled.

Environment

awswrangler version: 3.11.0
Python version: 3.12.4
Operating System: Darwin

Table Schema

CREATE TABLE IF NOT EXISTS bss_dv.free_text_translation
(
    md5_hash VARCHAR(32) NOT NULL ENCODE RAW,
    translation SUPER ENCODE RAW,
    translation_date TIMESTAMP WITHOUT TIME ZONE ENCODE az64
)
DISTSTYLE KEY
DISTKEY (md5_hash)
SORTKEY (md5_hash)

My function

def write_table(
    table: pl.DataFrame,
    config: JobConfig,
    dest_table: str = "free_text_translation",
    dest_schema: str = "bss_dv",
) -> None:
    pdf = table.to_pandas()
    pdf["translation_date"] = pd.to_datetime(pdf["translation_date"])
    # example data from here

    with wr.redshift.connect(
        secret_id=config.REDSHIFT_SECRET_ID,
        dbname=config.REDSHIFT_DB,
        timeout=3600
    ) as con:
        wr.redshift.copy(
            df=pdf,
            path=config.TEMP_BUCKET,
            con=con,
            table="free_text_translation",
            schema="bss_dv",
            mode="append",
            serialize_to_json=True,
        )

Example data:

{
    'md5_hash': '00066f9e57abf748924808004fc504a7',
    'translation': '{"text": "...", "source_language": "de", "target_language": "en", "success": true, "position": 0, "should_translate": true, "translated_text": "...", "message": null, "version": "2025-01-06", "text_len_chars": 235, "manual": false}',
    'translation_date': Timestamp('2025-02-14 13:00:59')
}

When I query the data in redshift - the translation column is a string rather than SuperJson..

How to Reproduce

See above.

Expected behavior

The col in Redshift should be superjson

Your project

No response

Screenshots

No response

OS

Mac

Python version

3.12.4

AWS SDK for pandas version

3.11.0

Additional context

No response

The text was updated successfully, but these errors were encountered:

Rutuja2506 · 2025-02-24T00:10:33Z

Hi @kukushking , Could you please try explicitly serializing 'translation' column as JSON by using *json.dumps, which ensures the column is in correct format

   # Explicitly serialize 'translation' column as JSON
   import json
   pdf["translation"] = pdf["translation"].apply(json.dumps)

duarteocarmo · 2025-02-24T07:35:34Z

@Rutuja2506 - that worked for me!

misteliy · 2025-02-24T15:54:21Z

@kukushking, could you please reopen this issue? The original problem persists as serialize_to_json=True is not functioning correctly.

To address this, please either:

Remove the option: If it's not feasible to fix at this time.
Correct the implementation: So that it works as intended.
While we acknowledge the workaround is effective, it's not an ideal long-term solution.

Thanks!

kukushking · 2025-02-25T12:17:48Z

@misteliy @duarteocarmo what is the type of the column in your source data frame: object or a string?

One thing below code ensures is that the column is a string that can be serialised to JSON by and written into SUPER type.

   pdf["translation"] = pdf["translation"].apply(json.dumps)

misteliy · 2025-02-26T06:01:18Z

Please have a look at the documentation:

so, there should be no need to serialize!

kukushking · 2025-02-27T14:45:57Z

@misteliy just to be clear SERIALIZETOJSON only alters COPY command with the ability to load columns into SUPER type column in redshift from Parquet. The data in your dataframe and resulting parquet still must be serialized.

I have added a test case and clarified the docs.

duarteocarmo added the bug Something isn't working label Feb 14, 2025

jaidisido assigned kukushking Feb 18, 2025

duarteocarmo closed this as completed Feb 24, 2025

kukushking removed their assignment Feb 24, 2025

duarteocarmo reopened this Feb 24, 2025

kukushking linked a pull request Feb 27, 2025 that will close this issue

chore: add redshift COPY with SERIALIZETOJSON test case #3104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to write SUPER type column to Redshift using redshift.copy #3095

Unable to write SUPER type column to Redshift using redshift.copy #3095

duarteocarmo commented Feb 14, 2025

Rutuja2506 commented Feb 24, 2025 •

edited

Loading

duarteocarmo commented Feb 24, 2025

misteliy commented Feb 24, 2025

kukushking commented Feb 25, 2025

misteliy commented Feb 26, 2025

kukushking commented Feb 27, 2025 •

edited

Loading

Unable to write SUPER type column to Redshift using redshift.copy #3095

Unable to write SUPER type column to Redshift using redshift.copy #3095

Comments

duarteocarmo commented Feb 14, 2025

Describe the bug

Description

Environment

Table Schema

My function

How to Reproduce

Expected behavior

Your project

Screenshots

OS

Python version

AWS SDK for pandas version

Additional context

Rutuja2506 commented Feb 24, 2025 • edited Loading

duarteocarmo commented Feb 24, 2025

misteliy commented Feb 24, 2025

kukushking commented Feb 25, 2025

misteliy commented Feb 26, 2025

kukushking commented Feb 27, 2025 • edited Loading

Rutuja2506 commented Feb 24, 2025 •

edited

Loading

kukushking commented Feb 27, 2025 •

edited

Loading