-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type hints #203
base: main
Are you sure you want to change the base?
Type hints #203
Conversation
One quick note: |
|
Absolutely, you can copy the section from tmt's pre-commit configuration, it served there well. |
# JSON: TypeAlias = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None | ||
DataType: TypeAlias = Union[RawDataType, ListDataType, DictDataType] | ||
TreeData: TypeAlias = dict[str, DataType] | ||
TreeDataPath: TypeAlias = Union[TreeData, str] # Either TreeData or path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the naming can be confusing here between TreeData
, TreeDataPath
and DataType
. Any suggestions?
MyPy is up and there are issues to resolve |
All tests are green here right now. Any review please? |
fmf/base.py
Outdated
return | ||
# Use the special merge for merging dictionaries | ||
data_val = data[key] | ||
if type(data_val) == type(value) == dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the original code, but this is not the pythonic way of checking object classes. if isinstance(data_val, dict) and isinstance(value, dict)
will work better, and supportes dict
subclasses - there's no reason to not allow e.g. OrderedDict
to be the source for a node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about str
, do we hard-check str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which condition? I see elif type(data[key]) == type(value) == type(""):
a couple of lines below, here I would also use isinstance
: elif isinstance(data[key], str) and isinstance(value, str)
- is it this one you're asking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in those kind of situations. I think it is better to hard-check for str
type in those cases, and implement a base type if we want other merge-able string subtypes. Dict one, could also get a similar treatment eventually, but I can see a benefit for changing the type there for the time being (i.e. before converting all raw dictionary types to attrs
dataclasses that would include the adjust
and other fmf
directives).
But on that note, do you have a small snippet to confirm that isinstance
works with OrderedDict
and Mapping
types? I don't think it would work with the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in those kind of situations. I think it is better to hard-check for
str
type in those cases, and implement a base type if we want other merge-able string subtypes. Dict one, could also get a similar treatment eventually, but I can see a benefit for changing the type there for the time being (i.e. before converting all raw dictionary types toattrs
dataclasses that would include theadjust
and otherfmf
directives).
IUIC, a hard check would be type(value) == type('')
, but what would be the benefit over isinstance(value, str)
? I'm not sure we understand each other :)
But on that note, do you have a small snippet to confirm that
isinstance
works withOrderedDict
andMapping
types? I don't think it would work with the latter.
isinstance(OrderedDict(), dict)
is true, isinstance(AMappingSubclass(), dict)
is not. isinstance(AMappingSubclass(), (dict, collections.abc.Mapping))
should cover this base class as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IUIC, a hard check would be
type(value) == type('')
, but what would be the benefit overisinstance(value, str)
? I'm not sure we understand each other :)
Yes, though I prefer type(value) == str
. That would be to enforce we are only checking for str
. Mostly as a sanity check since I don't think we should be calling _mergeplus()
as data[key] += value
in those cases, but instead check for specialized overloads.
isinstance(OrderedDict(), dict)
is true,isinstance(AMappingSubclass(), dict)
is not.isinstance(AMappingSubclass(), (dict, collections.abc.Mapping))
should cover this base class as well.
Hmm, how about if I implement the general case for isinstance(value)
, but keep type(data_val) == dict
for sanity check until we implement attrs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IUIC, a hard check would be
type(value) == type('')
, but what would be the benefit overisinstance(value, str)
? I'm not sure we understand each other :)Yes, though I prefer
type(value) == str
. That would be to enforce we are only checking forstr
. Mostly as a sanity check since I don't think we should be calling_mergeplus()
asdata[key] += value
in those cases, but instead check for specialized overloads.
isinstance(OrderedDict(), dict)
is true,isinstance(AMappingSubclass(), dict)
is not.isinstance(AMappingSubclass(), (dict, collections.abc.Mapping))
should cover this base class as well.Hmm, how about if I implement the general case for
isinstance(value)
, but keeptype(data_val) == dict
for sanity check until we implementattrs
?
But that's what I don't understand: why should we enforce str
, or dict
, exactly? :) Why bother? As long as value
is an instance of str
(or one of its subclasses), it is string-like enough for our use. We can measure its length, we can call its join()
, and we can add it to another string with +=
.
I see no benefit in sticking to the exact type only, TBH. From my POV, if isinstance(value, (dict, Mapping))
is true, then value
is simply a dictionary, maybe in a trenchcoat, but it still can be iterated over. We perform no black magic with dict
internals - we copy keys and values between dictionaries, and from time to time we call get()
, that's all :) It's actually quite common, BTW, e.g. depending on the parser type, ruamel.yaml
may materialize a YAML file as a pile of OrderedDict
instances - by allowing dict
only, we would disqualify otherwise perfectly fine dict
-like objects. If it walks like a dict
, if it quacks like a dict
, then we can merge it with another dict
-like object.
The same applies to list
checks, right? We don't inspect list
type internals, all we do is stuff like data[key] = [item for item in data[key] if item not in value]
- as long as value
is list-like, it should be absolutely acceptable.
I don't understand what issues would be solved by allowing the exact type only, or what issues would it prevent. On the other hand, checking for being list-like, dict-like, or string-like opens the door to a wider array of compatible input types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About data_val
, it will always be a dict/list/str, etc., unless we open the yaml to beyond safe_loader
, at which point ambiguity enters (e.g. the original type might no longer be a dict due to a yaml tag, but with fmf
we want to add to the constructor). If we keep it strict on data_val
, we would at least catch any issues if something else altered the yaml loader or anything else.
For value
indeed they have the same interface so the operations will allow us to do the operations. It's just that we are assuming it is what the author of the dict-like, str-like object intends. I think I have an example of a possible confusion in this same PR.
Consider Tree
. It has a dict-like structure through __getitem__
, get()
, etc., but how I have it implemented is that keys
, values
, etc. are wrappers to Tree.data
(because it is easier to implement and because child
is more natural to access using file-api), while additional child
or nested items are gathered using __getitem__
and get()
. This could create a confusion when using _merge_plus()
.
For the time being though and until there's a plugin system to add more specialized rules for this, it would be helpful to add the isinstance
checks for value
. But I'm not sure how to keep track or denote the places where we might want to change it to have more specialized rules. Due to type-hinting there are quite alot of isinstance
that we would have to filter through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider
Tree
. It has a dict-like structure through__getitem__
,get()
, etc., but how I have it implemented is thatkeys
,values
, etc. are wrappers toTree.data
(because it is easier to implement and becausechild
is more natural to access using file-api), while additionalchild
or nested items are gathered using__getitem__
andget()
. This could create a confusion when using_merge_plus()
.
I think this example goes further than I wanted: yes, Tree
might look similar to a dictionary, it does have some methods with similar semantics, but it's not a dict
subclass. Therefore it would not pass the isinstance(value, dict)
test, and our code would not try to combine it with other values.
My point was to allow subclasses of dict
or list
, e.g. ruamel.yaml.comments.CommentedMap
, not classes that look like dict
or list
, i.e. class tree, not class API. To replace type(data_val) == type(value) == dict
with isinstance(data_val, dict) and isinstance(value, dict)
- this would not open the door to dict-like classes like Tree
. If by any chance a Tree
instance would be passed to _merge_plus()
, I would expect none of these tests to match, and such a value
would be reported as something fmf does not know how to merge with the existing tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about isinstance(value, (dict, Mapping))
? I think it's useful for Tree
to have type-hint of Mapping for further down the line, e.g. if it's used in a generic map input.
Signed-off-by: Cristian Le <[email protected]>
Signed-off-by: Cristian Le <[email protected]>
Signed-off-by: Cristian Le <[email protected]>
Signed-off-by: Cristian Le <[email protected]>
Signed-off-by: Cristian Le <[email protected]>
Signed-off-by: Cristian Le <[email protected]>
I don't like that the data dict type is an
Any
, but this should workTree
Context
filter