Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#539 Spark schema generation works with record relations #556

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

cwoods-cpointe
Copy link
Contributor

Update Spark Schema MDA generation to account for relations between records. Updated the relation mda to include column and required fields.

Added record relation unit tests.

Spark Schema to/from pojos work with records with relations.

Updated documentation to account for the new relation fields.

Implement validation for relations except one to M.

@cwoods-cpointe cwoods-cpointe force-pushed the 539-spark-schema-relation-records branch from 04f5983 to d35a20f Compare January 31, 2025 19:43
@@ -247,13 +247,22 @@ namespacing (e.g., package in Java, namespace in XSD).
| `relations/relation/documentation`
| No
| None
| A description of the field.
| A description of the relation.
Copy link
Contributor

@ewilkins-csi ewilkins-csi Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Do we define what we mean by "relation" anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is referenced in the root record docs here

/**
* Class for common Spark-related attributes.
*/
public class SparkRelationAttributes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Why not add these methods to BaseRecordRelationDecorator? Specifically, it seems reasonable to include the defaulting logic of column and required/nullable so that we use the same defaults everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related, it may be worth adding some unit tests asserting how the defaulting logic works like we do with other metamodel options to help remind us that it's a breaking change if we update it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The column name is really just used for the spark saving but I agree it makes sense to pull these up to the greater level in case it needs to be used elsewhere.

| multiplicity | record |
| 1-1 | Mayor |
| 1-M | Street |
| M-1 | State |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Texarkana? 🙃

#set($hasOneToManyRelation = 0)
#foreach($relation in $record.relations)
#if($relation.isOneToManyRelation())
#set($hasOneToManyRelation = 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A: This might be a little more maintainable to move into the Java object or context variables. It's pretty straightforward logic, but in general I find velocity template logic pretty hard to parse/update so prefer doing as much as possible in Java.

@cwoods-cpointe cwoods-cpointe force-pushed the 539-spark-schema-relation-records branch 2 times, most recently from f649b77 to 2334a2a Compare January 31, 2025 20:11
| `relations/relation/column`
| No
| None
| The name of the storage field for data persistence.
Copy link
Contributor

@carter-cundiff carter-cundiff Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: If each relation refers to another record type, should we be able to use the column attribute from the field within the relation record?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relation is basically just field on the record with a type that corresponds to another record. So there is no specific field in the relation record that ties to the relation definition in this record. Hope that makes sense.

@cwoods-cpointe cwoods-cpointe force-pushed the 539-spark-schema-relation-records branch from 2334a2a to 34571a6 Compare January 31, 2025 20:56
@cwoods-cpointe cwoods-cpointe force-pushed the 539-spark-schema-relation-records branch from 34571a6 to a51526d Compare January 31, 2025 21:05
Update Spark Schema MDA generation to account for relations between
records. Updated the relation mda to include column and required fields.

Added record relation unit tests.

Spark Schema casting and to/from pojos work with records with relations.

Updated documentation to account for the new relation fields.

Implement validation for relations except one to M.
@cwoods-cpointe cwoods-cpointe force-pushed the 539-spark-schema-relation-records branch from 2edccb2 to edd99bd Compare January 31, 2025 23:06
@@ -34,6 +34,11 @@ public class RelationElement extends NamespacedMetamodelElement implements Relat
@JsonInclude(Include.NON_NULL)
protected Multiplicity multiplicity;

@JsonInclude(Include.NON_NULL)
protected Boolean required;
Copy link
Contributor

@csun-cpointe csun-cpointe Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: should we default this to false since Boolean allows a null value or the isRequired() function will return null by default if not set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants