Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified NominalComposition to include basis, default unknown #13

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kroenlein
Copy link
Collaborator

As per discussion on #data-platform-dojo, proposed fix for ambiguity in basis for procured material specs.

@kroenlein kroenlein requested a review from maxhutch November 3, 2019 20:28
@kroenlein
Copy link
Collaborator Author

Copy link
Contributor

@maxhutch maxhutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that compositions with different bases are not comparable, does it make sense to put the basis as a required member in the CompositionBounds? If so, how do we deal with "other" (see inline).

----------------|------------|---------|-----------
`type` | String | Req. | "nominal\_composition"
`quantities` | Map[String, Number]| Empty set | | Map[String, Number]
`basis` | `mass`, `volume`, `number`, `unknown` | `unknown` | The type of measurement that informed the quantity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "other"? I'm thinking about some of the exotic formulations use cases like dry basis.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dry basis is still a mass (or molar or volume) basis. Compositions are not guaranteed to be normalized, so a user filter or Attribute Template could meet that need.

More complex expressions like molarity are so complex that explicit attributes are probably the only reasonable solution.

@kroenlein
Copy link
Collaborator Author

Given that compositions with different bases are not comparable, does it make sense to put the basis as a required member in the CompositionBounds? If so, how do we deal with "other" (see inline).

At present, we only validate component names in a CompositionBounds, not values. I'd presume the CompositionBounds would have a set of allowed composition types and maintain that lack of opinion on numeric bounds.

@maxhutch
Copy link
Contributor

Given that compositions with different bases are not comparable, does it make sense to put the basis as a required member in the CompositionBounds? If so, how do we deal with "other" (see inline).

At present, we only validate component names in a CompositionBounds, not values. I'd presume the CompositionBounds would have a set of allowed composition types and maintain that lack of opinion on numeric bounds.

Rather than a set I'd expect exactly one. Just like we don't let a RealBounds be "kg or ml" since we cannot compare them, I'd expect these to enforce comparability.

This is the root on my ambivalence on basis: it seems like there are too many options / too much complexity for us to actually use it programatically so why elevate it too a first class concept beyond the existing templating system, e.g.:

{
  "type": "property_template",
  "name": "Solution composition (volume)",
  "description": "Composition of the solution on a volume basis",
  "bounds": {
    "type": "composition_bounds",
    "components": ["water", "ethanol", "oils"]
  }
}

What is gained by having it as an explicit enum?

@kroenlein
Copy link
Collaborator Author

Rather than a set I'd expect exactly one. Just like we don't let a RealBounds be "kg or ml" since we cannot compare them, I'd expect these to enforce comparability.

That's an interesting point. I'm comfortable with the comparability being the necessary criterion for Bounds, and thus enforcing a single value. The thought in making it a set was meeting users where they are, but requiring a second Attribute Template once you are changing from legacy data to living data and then asking users to XOR if they really mean it has merit.

This is the root on my ambivalence on basis: it seems like there are too many options / too much complexity for us to actually use it programatically so why elevate it too a first class concept beyond the existing templating system, e.g.:
...
What is gained by having it as an explicit enum?

There are two things empowered by this change:

  1. It provides an important check at storage time. Lack of this information is actually a huge challenge in interpreting data about alloys in the literature. Parallel examples in the data structure incude DiscreteCategorical, where it's explicitly defined as a probability, and Ingredient Quantities, where the basis is explicit in the field name. As well, it has been explicitly requested by customer teams.

  2. It allows automated mass/mole/volume conversions, given lookup tables. The basis you are moving from and to can constrain what the units on the conversion factors needs to be, all in an automated way. This parallels similar requests from customer teams.

@maxhutch
Copy link
Contributor

maxhutch commented Jun 25, 2020

That's an interesting point. I'm comfortable with the comparability being the necessary criterion for Bounds, and thus enforcing a single value. The thought in making it a set was meeting users where they are, but requiring a second Attribute Template once you are changing from legacy data to living data and then asking users to XOR if they really mean it has merit.

👍

  1. It provides an important check at storage time. Lack of this information is actually a huge challenge in interpreting data about alloys in the literature. Parallel examples in the data structure incude DiscreteCategorical, where it's explicitly defined as a probability, and Ingredient Quantities, where the basis is explicit in the field name. As well, it has been explicitly requested by customer teams.

This makes sense, but won't the default value (unknown) make it just as easy for this information to be omitted as it was before? EDIT to answer my own question: if the basis is specified in the template, then it won't be optional anymore because the default wouldn't fit the template.

I agree there is some additional value just to having it be a first class concept (to be weighted against the risk that it is misinterpreted or misused).

  1. It allows automated mass/mole/volume conversions, given lookup tables. The basis you are moving from and to can constrain what the units on the conversion factors needs to be, all in an automated way. This parallels similar requests from customer teams.

You previously said "Compositions are not guaranteed to be normalized". If that's the case, then I don't know that we can do automated conversions. The dry basis is a good example. If you're using a dry basis, then you have an implicit ingredient included in the calculation (moisture) that would also have to be converted between mass/volume/number. If you just know it is a mass basis but not specifically a dry mass basis, then the automatic conversion will be wrong (right?).

@kroenlein
Copy link
Collaborator Author

If you just know it is a mass basis but not specifically a dry mass basis, then the automatic conversion will be wrong (right?).

If you have an un-normalized set and you convert to another un-normalized set using densities, you haven't lost anything. The information a customer wants is almost assuredly the ratios in a certain basis. If it's a dry-basis mass fraction, you just wouldn't be able to generate a wet-basis mass fraction, but that's baked into not knowing anything about the fractions for water. Generating the dry-basis mole fraction just requires molar masses.

@sparadiso
Copy link

@kroenlein are you still working on this PR?

@kroenlein kroenlein marked this pull request as draft August 27, 2020 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants