-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variables cannot have a metadata: dict
attribute, unlike parameters
#1071
Comments
This would help us on the UK and US systems, where we'd like to use the metadata dictionary on variables in order to link them directly to PolicyEngine. Currently, I've just monkey-patched Core since it's a trivial change (here). I've just got a working implementation and test case for this on my fork - if it's agreeable, let me know and I'll file a PR. Thanks! |
Hi @nikhilwoodruff , and thank you for this contribution proposal 😃 . In terms of the feature itself, I'm at the same time happy to see @RamParameswaran proposed a pull request implementing it, and sad to see that I/we didn't give a timely review. @nikhilwoodruff Does #983 cover the same use-case / feature-set? Also, I see @RamParameswaran you mention a comeback problem: the tight coupling with the country-template for testing. Does it mean we have a "catch-22" problem (feature can't be —at least easily— tested before it being shipped)?. In terms of coherence, and I guess this applies both to this issue and to #983, @benjello @sandcha @MattiSG do you see any precaution to be attentive to? Contrary to parameter's, which's metadata are discreet, compatibility with both proposals for variables seems to require them to be rather arbitrary (the use-cases are different). So I wander if:
Please do not hesitate to chime in! |
Thanks @maukoquiroga - I hadn't noticed the already existing PR. #983 would cover the same use-case here for us. On your questions, from our point of view:
|
From my point of view, the point that needs to be fully defined before introducing the ability to add arbitrary metadata is how do we normalise, accept and expose them in shared codebases. I find the possibility of exploring use cases of metadata very attractive, but without a clear path towards normalisation, I am afraid it too easily leads to a situation where each contributor group adds their own data to the shared model and where some tools would rely on additional, nonstandard metadata, thus making them creep into other tax and benefit systems without being normalised. Of course, each tax and benefit system stays free to do what they want, but it is in my view crucial that Core and its tooling ecosystem minimise the risk of community fragmentation through forks. Arbitrary metadata, in my opinion, opens a high risk of de facto forks where some tax and benefit systems, especially at their infancy when they most often rely on one specific actor, would come to depend strongly on some metadata. #983 is in my view a good example of that: if legal references are given in the suggested (admittedly clean and appealing) Australian-specific format, then the Australian tax and benefit system is very likely to not use the standard While the two options (improving We had a glimpse of that already in France (see openfisca/openfisca-france#1618 and openfisca/openfisca-france#1672), and it takes a huge effort to normalise things after the fact. I'm not even done facilitating the process for France, and there will then be some RFC process to bring up to Core… We're probably talking about a total of 4-5 months to normalise 16 metadata entries. I would prefer that we have a clear path towards normalisation of metadata through RFCs rather than adding support for arbitrary metadata. As a second choice, I believe we need to be very clear about drawing the line between:
…and have a clear path towards normalisation for the exploratory ones and clear recommendations on vendor-specific metadata. |
As a reference, the
We will see when RFC openfisca/openfisca-france#1672 closes how many are normalised vs how many will have to be cleaned up to assess how beneficial the exploration offered by the |
Thanks for the insight of what's happening in France, @MattiSG. So, if understand correctly, metadata are being used sort of surreptitiously to accomplish two things:
So in the #983 example, if we apply a "domain-driven" lens, what we're actually adding are attributes, not just data —adding an attribute taken as adding a data model, or data schemas, explicit or implicitly—, we could for example model it as: from typing import Sequence, Any
from dataclasses import dataclass
from openfisca_core.periods import Instant
@dataclass
class ReferenceValidity:
commencement: Instant
expiry: Instant
def __post_init__(self) -> None:
self.validate_contract()
def validate_contract(self) -> None:
if self.commencement > self.expiry:
# fail
@dataclass
class Reference:
part: int
schedule: str
labels: Sequence[Any] # actual metadata
dates: ReferenceValidity Possible scenario 1Taken sous that lens —domain—, at first glimpse, the problem may seem unsolvable:
In both cases, or in any additional one, I cannot but see the risk of overt or surreptitious "ontological forking" remain constant, hence the problem we're trying to deal with. Possible scenario 2On the other hand, if we take a more "cuack-driven", or exposure-driven, approach, we may for example relax 1. and at the same time render 2. stricter. Concretely, decoupling domain models from data schemas: for example, a Even more concretely, that what's going to be served and checked-for by the Public API for Extending my previous code example: import abc
import eli # an invented schema validator
class SupportsELI(abc.ABC): # An abstract class just to make the point
@abc.abstractmethod
def to_json(self) -> str:
... # it has to be implemented by any metadata modelling a reference
@dataclass
class Reference(SupportsELI):
part: int
schedule: str
labels: Sequence[Any] # actual metadata
dates: ReferenceValidity
def to_json(self) -> str:
return eli.represent(self) # or something like that In this 2nd scenario, the discussion would shift to which shared "data schemas" we expose, while letting systems sort of arbitrarily adding models to their APIs. Which seems like a middle ground between just improving Just two scenarios out of my head, there are definitely more, maybe simpler. |
Thanks @maukoquiroga for this higher-order rephrasing 🙂 I quite agree with your framing of the current situation and the abuses it can lead to. The “convert to a shared representation” version (your scenario 2) is appealing in the freedom it gives to each TBS while ensuring shared tooling stays interoperable. If I understand correctly, that means switching from declarative attributes to agreeing upon an API: each Variable should then respond to getter methods instead of directly exposing attributes. These getter methods could by default (in Core) be simple attribute exposure, and every TBS could surcharge them to compute them from finer-grained attributes. This could indeed be a solution: even though it will lead to some fragmentation (learning the specific attributes from each TBS), we could make sure that the differences are always declared in a consolidated way, just like the Entities are currently left to each TBS to define. This would also enable a higher-order standard that could be used for interoperability with other rules-as-code systems than OpenFisca. |
I suggest we use this issue to consolidate use cases and learn from them, in order to assess how realistic the different options are. For example, metadata that #983 intended to expose covers Australian-specific legislative references, enabling structured representation. @RamParameswaran could you share the use cases you had for consuming that structured data? 🙂 {
"regulation_reference": {
"part": 5,
"clause": "A",
"schedule": "34b",
"notes": ["foo", "bar"],
}
} @nikhilwoodruff could you please share the metadata output you would like to expose in the Web API “to link [variables] directly to PolicyEngine” and how you would consume it? 🙂 |
Thanks for the discussion here, I can see the case for not wanting fragmentation. We're mainly using this to inform variable metadata in PolicyEngine, and the fields we need/are about to implement for that are:
There's actually a few other attributes, but I've just noticed they're redundant and we should use existing fields ( class self_employment_income(Variable):
value_type = float
entity = Person
label = u"Self-employment income"
documentation = "Income from self-employment profits"
definition_period = YEAR
reference = "Income Tax (Trading and Other Income) Act 2005 s. 1(1)(a)"
metadata = dict(
policyengine=dict(
roles=dict(
adult=dict(
max=80000,
),
child=dict(
hidden=True,
),
)
)
) On the PolicyEngine household input page, you can see that this is shown for the default adult, and if you add a child, it's not shown. Footnotes
|
Yes. The getter methods is a solution. Another one could be to have a standard variable schema to represent a Variable. Etc., But it is the same idea yes. |
Thank you very much @nikhilwoodruff for this detailed answer! 😃 As a case study, in my opinion:
I fully understand the convenience of having one single data and metadata source for reusers, but OpenFisca is about modelling legislation in a generic enough way that all sorts of reuses can be created. This is what makes it powerful enough to aggregate communities around a single model 🙂 |
We can add in some place min and max values. But some of them evolves with legislation so it may be cumbersome to store them. Al so the sign may depend on convention (that might be unstable). That's why I never felt that this subject is a priority. @MattiSG : I agree with your comment on metadata, but we should find a way to ease downstream enrichment of metadata since it is a recurring need of many reusers. |
By the way an extract of #1034 (comment) I think could be pertinent here:
|
@maukoquiroga : yes we try to validate data values before injection. I also confirm that the performance of OpenFisca relies on vector computation so if we add checks that are not vectorizable it puts a huge penalty on performance. |
Hi there!
I really enjoy OpenFisca, but I recently encountered an issue.
Here is what I did:
Added a
metadata: dict
attribute to a Variable class.Here is what I expected to happen:
The attribute to be silently accepted.
Here is what actually happened:
A
ValueError
was thrown, specifying thatmetadata
is not in the allowed attribute list.Context
I identify more as a:
The text was updated successfully, but these errors were encountered: