Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allows AnyUrl pydantic types #2286

Open
wants to merge 4 commits into
base: devel
Choose a base branch
from
Open

Conversation

joscha
Copy link
Contributor

@joscha joscha commented Feb 11, 2025

Without this change, serializing a model with AnyUrl fails.

A sample pydantic model would be:

class Resources(BaseModel):
    website: AnyUrl | None = None
    """
    Website URL.
    """

The error with such an entity yielded from a resource looks like this:

TypeError: Type is not JSON serializable: AnyUrl

Details:

    self._extract_single_source(
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/extract/extract.py", line 349, in _extract_single_source
    with self.manage_writers(load_id, source):
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nix/store/ss2y42j0bjh7j2vlnxys606r3wz9r931-python3-3.12.8-env/lib/python3.12/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/extract/extract.py", line 387, in manage_writers
    self.extract_storage.close_writers(load_id)
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/extract/storage.py", line 78, in close_writers
    storage.close_writers(load_id, skip_flush=skip_flush)
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/common/storages/data_item_storage.py", line 85, in close_writers
    writer.close(skip_flush=skip_flush)
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/common/data_writers/buffered.py", line 176, in close
    self._flush_and_close_file(skip_flush=skip_flush)
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/common/data_writers/buffered.py", line 260, in _flush_and_close_file
    self._flush_items(allow_empty_file)
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/common/data_writers/buffered.py", line 250, in _flush_items
    self._writer.write_data(self._buffered_items)
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/common/data_writers/writers.py", line 184, in write_data
    json.typed_dump(items, self._f)
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/common/json/_orjson.py", line 32, in typed_dump
    fp.write(typed_dumpb(obj, pretty=pretty))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/common/json/_orjson.py", line 36, in typed_dumpb
    return _dumps(obj, sort_keys, pretty, custom_pua_encode, orjson.OPT_PASSTHROUGH_DATETIME)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/joscha/dev/dlt-source-morphais/.devenv/state/venv/lib/python3.12/site-packages/dlt/common/json/_orjson.py", line 24, in _dumps
    return orjson.dumps(obj, default=default, option=options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Type is not JSON serializable: AnyUrl

Please let me know if this is an acceptable fix, then I will add a test

Without this change, serializing a model with `AnyUrl` fails
Copy link

netlify bot commented Feb 11, 2025

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit fadd8e6
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/67ac7ff8e674d900085a1390

dlt/common/json/__init__.py Outdated Show resolved Hide resolved
try:
from pydantic import AnyUrl as PydanticAnyUrl
except ImportError:
PydanticAnyUrl = None # type: ignore[misc]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cargo-culted this from above - I am assuming this is for the case when the Pydantic classes are not available and meant to gracefully degrade?

dlt/common/json/__init__.py Outdated Show resolved Hide resolved
@rudolfix rudolfix self-requested a review February 13, 2025 16:05
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are too many different types that we could add this way. my take for this PR would be to add pluggable custom_encde_extension that is NULL by default and that you can set in json module if you need special types to be serialized. so you could add your special type handling you just plug in your encoder. such thing we are happy to merge. so

  1. at the end of cutom_encode call custom_encde_extension (if not None) and return wha it returns or if it returns None, raise TypeError like previously
  2. add tests along other json tests

Also mind that maybe this is not the place you really want to plug your encoders.

between extract and normalize (where all weird types happen) we use custom_pua_encode so 90% you want to plug your encoder there. lmk. glad to help

@joscha
Copy link
Contributor Author

joscha commented Feb 14, 2025

there are too many different types that we could add this way. my take for this PR would be to add pluggable custom_encde_extension that is NULL by default and that you can set in json module if you need special types to be serialized. so you could add your special type handling you just plug in your encoder. such thing we are happy to merge. so

  1. at the end of cutom_encode call custom_encde_extension (if not None) and return wha it returns or if it returns None, raise TypeError like previously
  2. add tests along other json tests

Also mind that maybe this is not the place you really want to plug your encoders.

between extract and normalize (where all weird types happen) we use custom_pua_encode so 90% you want to plug your encoder there. lmk. glad to help

I think that makes sense. Do you want to give me an interface signature of your choosing with a 3 line example on how to use it and I'll implement it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants