Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type hints to the generic database handler #1858

Merged

Conversation

DavidMStraub
Copy link
Member

This is work in progress adding type hints to gramps.gen and defining Gramps-specific types in the process.

@dsblank
Copy link
Member

dsblank commented Jan 22, 2025

@DavidMStraub, one complication is that when these methods are overloaded in a proxy (like LivingProxyDb, or PrivateProxyDb) then a method might return None. This is the number one issue in bugs throughout the code base: we don't check to see if the return value is None and then we end up with crashes.

I think with the correct type hints, we could catch such errors.

@DavidMStraub
Copy link
Member Author

Exactly! I have been bitten by this before. I suspect there are similar, more hidden cases.

@DavidMStraub DavidMStraub force-pushed the types_generic_get_from_handle branch from 2b130b1 to 2508db8 Compare January 26, 2025 13:53
@dsblank dsblank mentioned this pull request Jan 30, 2025
@kulath
Copy link
Member

kulath commented Jan 31, 2025

@DavidMStraub In Use orjson + encoder @dsblank suggests something like:

@singledispatch
def data_to_object(data):
    print("Default implementation for unknown type:", type(data))

@data_to_object.register(PersonData)
def _(data):
    obj: Person = data_to_object(data)
    return obj

@data_to_object.register(FamilyData)
def _(data):
    obj: Family = data_to_object(data)
    return obj

If I were to make a call like
myfamily = data_to_object(myfamilydata)
that would be correct.

But if I were to call it with
myperson = data_to_object(myfamilydata)

assuming myperson is an object of type Person
would the error be reported by mypy?

@DavidMStraub
Copy link
Member Author

No, this will not work and not throw an error. The correct way to tell the type checker would be using overload

from typing import overload

def data_to_object(data):
    # actual implementation here

@overload
def data_to_object(data: PersonData) -> Person:
    ...

@overload
def data_to_object(data: FamilyData) -> Family:
    ...

Possibly this could be combined with singledispatch, but then it's getting complicated.

@DavidMStraub
Copy link
Member Author

But the question is what you are trying to achieve. I think it boils down to type narrowing for a factory method. So I would do:

from typing import TypeVar, Type

T = TypeVar("T", bound="BasicPrimaryObject")

class BasicPrimaryObject:
    @classmethod
    def from_data(cls: Type[T], data) -> T:
        raise NotImplementedError


class Person(BasicPrimaryObject):
    @classmethod
    def from_data(cls: Type["Person"], data: PersonData) -> "Person":
        # implement


class Family(BasicPrimaryObject):
    @classmethod
    def from_data(cls: Type["Family"], data: FamilyData) -> "Family":
        # implement

Now, Person.from_data(some_family_data) will throw an error during type checking. Note that I've intentionally left data unannotated on the base class to allow the child classes to define specific types.

@kulath
Copy link
Member

kulath commented Jan 31, 2025

What I am trying to achieve is clarity and explicitness and type checking in the code, like there is in gen/db/generic get_person_from_handle. I would much prefer replicated get_person_from_data, consistently like all the other get_foo_from_bar in gen/db/generic.

I wanted to understand whether the single dispatch approach would work, because that is being advocated elsewhere, but I don't like it because it is not as clear and explicit as get_person-from_data etc.

I would write the overload approach as myfamily = get_object_from_data(somefamilydata), but this does not make it explicit in the call that we know that the type of object is Family..

As far as the Family.get_object_from_data(somefamilydata) approach is concerned, AIUI this would distribute the code among the different primary objects, which may not be as desirable as having the db related code in one place. And it is not as consistent as myfamily = get_object_from_data(somefamilydata) for db conversion.

Overall though presumably there is some overhead in having Python resolve which method to apply, as opposed to writing in the call which method to use!

@DavidMStraub
Copy link
Member Author

DavidMStraub commented Jan 31, 2025

What I am trying to achieve is clarity and explicitness and type checking in the code, like there is in gen/db/generic get_person_from_handle. I would much prefer replicated get_person_from_data, consistently like all the other get_foo_from_bar in gen/db/generic.

I am with you on this. Explicit is better than implicit, and if something is difficult to type-hint, it probably means that it's not explicit enough.

singledispatch is useful when you have a function that has different implementations based on the input type, but have the same output type.

But an alternative to having separate functions is to have class methods like in my example. These are just factory methods, so it's natural they live with the classes.

Overall though presumably there is some overhead in having Python resolve which method to apply, as opposed to writing in the call which method to use!

That is something I would absolutely not worry about. Let's have a consistent and maintainable design. This will not be the speed bottleneck.

@DavidMStraub DavidMStraub marked this pull request as ready for review January 31, 2025 11:00
@DavidMStraub
Copy link
Member Author

I marked this PR as ready for review - I think we should merge our type hints bit by bit, otherwise we will run into the issue again that we have a huge PR that will take a long time to merge and will be difficult to disentangle from other changes.

There is no problem with having only some modules annotated, as long as the type checker passes (which is assured by CI).

@Nick-Hall
Copy link
Member

What I am trying to achieve is clarity and explicitness and type checking in the code, like there is in gen/db/generic get_person_from_handle. I would much prefer replicated get_person_from_data, consistently like all the other get_foo_from_bar in gen/db/generic.

The get_person_from_handle and associated methods actually retrieve objects from the database. There is one method for each primary/table object.

A get_person_from_data method would be equivalent to one of our existing unserialize methods. In this case, an object is created from data already retrieved from the database. Secondary objects can also be created from data.

We could consider creating to_data/from_data methods as newer forms of the serialize/unserialize methods. There is already a create class method which could be updated to use from_data.

@Nick-Hall Nick-Hall added this to the v6.0 milestone Feb 1, 2025
@Nick-Hall Nick-Hall force-pushed the types_generic_get_from_handle branch from 12c3e9a to b71e3f2 Compare February 2, 2025 18:01
@Nick-Hall Nick-Hall changed the title [WIP] Add type hints to gramps.gen Add type hints to the generic database handler Feb 2, 2025
@Nick-Hall Nick-Hall merged commit b71e3f2 into gramps-project:master Feb 2, 2025
2 checks passed
@kulath kulath mentioned this pull request Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants