Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NodeGroup.grouping_node #11613 #11614

Merged
merged 7 commits into from
Jan 10, 2025
Merged

Conversation

jacobtylerwalls
Copy link
Member

@jacobtylerwalls jacobtylerwalls commented Nov 11, 2024

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Description of Change

Before
From a node, to find the grouping node of that node's nodegroup, you would need to prefetch the sibling nodes and compare the results to the id of the nodegroup, or else just issue another Node.objects.get() to individually fetch the wanted node.

After
There is now a self-referencing foreign key to the other node that serves as the nodegroup root.
Per discussion, there is now a OneToOneField on the NodeGroup model pointing to the root node for the nodegroup.

Example query:

In [22]: from django.db import connection, reset_queries

In [23]: reset_queries()

In [24]: ngs = NodeGroup.objects.filter(grouping_node__isnull=False).select_related("grouping_node").distinct()

In [25]: for ng in ngs:
    ...:     print(ng.grouping_node.alias)
    ...: 
app_names
basic_search_settings
sparql_endpoint_providers
system_settings
map_settings
project_extent
map_zoom
saved_searches
mapbox
temporal_search
analytics
search_results_grid

In [26]: len(connection.queries)
Out[26]: 1

Issues Solved

Closes #11613

Checklist

  • I targeted one of these branches:
    • dev/8.0.x (under development): features, bugfixes not covered below
    • dev/7.6.x (main support): regressions, crashing bugs, security issues, major bugs in new features
    • dev/6.2.x (extended support): major security issues, data loss issues
  • I added a changelog in arches/releases
  • I submitted a PR to arches-docs (if appropriate)
  • Unit tests pass locally with my changes
  • I added tests that prove my fix is effective or that my feature works
  • My test fails on the target branch

Ticket Background

@jacobtylerwalls jacobtylerwalls changed the title Add Node.nodegroup_root #11613 Add Node.grouping_node #11613 Nov 14, 2024
@chrabyrd chrabyrd self-requested a review November 14, 2024 17:17
Copy link
Contributor

@chrabyrd chrabyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm 🚀 ! One question in the body of code.

On a more meta note, if it's possible to add the Node relation to NodeGroup instead of this change and still power what you're trying to do I think there'd be an additional win gained there. There have been quite a few times I've needed to get the node from the nodegroup and end up running to the db again 💀

If that can't be accommodated, how do you feel about changing grouping_node to nodegroup_node? idk grouping_node isn't really referenced elsewhere in code and IMO it's more descriptive to say what the thing actually is.

arches/app/models/graph.py Outdated Show resolved Hide resolved
@jacobtylerwalls
Copy link
Member Author

Thanks for taking a look. I'm going to pump the brakes for a couple days since I just got a few 👍 's on Slack for the idea of just factoring out a subquery into a queryset method. If you have to remember .select_related('grouping_node') then .with_grouping_node() calling out to a subquery isn't much worse, as long as it performs roughly the same?


If that can't be accommodated, how do you feel about changing grouping_node to nodegroup_node? idk grouping_node isn't really referenced elsewhere in code and IMO it's more descriptive to say what the thing actually is.

I first had nodegroup_root and called this branch node_for_nodegroup, so I was thinking along the same lines, but @chiatt encouraged me to consider grouping node. @chiatt, any thoughts?

@chrabyrd
Copy link
Contributor

Thanks for taking a look. I'm going to pump the brakes for a couple days since I just got a few 👍 's on Slack for the idea of just factoring out a subquery into a queryset method. If you have to remember .select_related('grouping_node') then .with_grouping_node() calling out to a subquery isn't much worse, as long as it performs roughly the same?

If that can't be accommodated, how do you feel about changing grouping_node to nodegroup_node? idk grouping_node isn't really referenced elsewhere in code and IMO it's more descriptive to say what the thing actually is.

I first had nodegroup_root and called this branch node_for_nodegroup, so I was thinking along the same lines, but @chiatt encouraged me to consider grouping node. @chiatt, any thoughts?

Cool cool! One thing re slack convo: with_grouping_node_alias() -> with_grouping_node() ? What if I wanted other info from the nodegroup node besides alias? And if you're going that route would it be a big lift to add with_node() to the nodegroup? For the same win mentioned earlier 😄

And ++ for having the same thoughts about naming. Hopefully that makes it in 🤞

@jacobtylerwalls
Copy link
Member Author

Yeah I think if we want other stuff from the node then it should go back to an FK, but I can definitely explore putting the column on the nodegroup. I'll give it some thought.

@chiatt
Copy link
Member

chiatt commented Nov 15, 2024

Thanks for taking a look. I'm going to pump the brakes for a couple days since I just got a few 👍 's on Slack for the idea of just factoring out a subquery into a queryset method. If you have to remember .select_related('grouping_node') then .with_grouping_node() calling out to a subquery isn't much worse, as long as it performs roughly the same?

If that can't be accommodated, how do you feel about changing grouping_node to nodegroup_node? idk grouping_node isn't really referenced elsewhere in code and IMO it's more descriptive to say what the thing actually is.

I first had nodegroup_root and called this branch node_for_nodegroup, so I was thinking along the same lines, but @chiatt encouraged me to consider grouping node. @chiatt, any thoughts?

I don't have a strong opinion on this. My preference is grouping_node because it describes the node's role in the group. It's also a term already used in the label based graph tests. nodegroup_node would be fine, but it seems a bit awkward to me. I realize that's pretty subjective, so maybe others would like to weigh in (@apeters, @robgaston?)

@jacobtylerwalls
Copy link
Member Author

Also, root could be confused with the root node of the graph or the grouping node of the root nodegroup (in graph.py, this is defined as "(the first nodegroup that doesn't have a parentnodegroup)"

@jacobtylerwalls
Copy link
Member Author

On a more meta note, if it's possible to add the Node relation to NodeGroup instead of this change and still power what you're trying to do I think there'd be an additional win gained there. There have been quite a few times I've needed to get the node from the nodegroup and end up running to the db again 💀

I gave it some thought, and although it makes intuitive sense to put it on NodeGroup, I think there are some considerations going the other way:

  • the relation would have to be nullable given the order we create nodegroups before nodes, plus the fact that nodegroups without nodes are allowed at a db level (although we do have a manage.py validate command to delete them). I'd rather it be not nullable.
  • with the information in a different table, you can no longer have a database check constraint to enforce that the FK has a certain kind of relationship (e.g. you could enforce that the grouping_node pk == the pk of the nodegroup, but I don't think you could enforce that every node has a grouping node, e.g. that every not-root-node has a nodegroup with a grouping node set).
  • In a simpler application, one could use django signals to keep tables in sync, but you know how much code doesn't go through django these days.

get the node from the nodegroup and end up running to the db again 💀

This is actually my main motivation, and now there's a way to do that when fetching the data. On the other proposal (column on NodeGroup) you'd have to select_related() the grouping node when fetching data, so it's not like the developer gets a permission slip to not think about this ;)

# Make sure to get grouping nodes before you need them
In [49]: with_grouping_nodes = NodeGroup.objects.filter(node__gt=0).prefetch_related('node_set__grouping_node').distinct()

In [50]: for ng in with_grouping_nodes:
    ...:     print(ng.node_set.all()[0].grouping_node.alias)

@apeters
Copy link
Member

apeters commented Nov 19, 2024

I don't have a strong opinion on this. My preference is grouping_node because it describes the node's role in the group. It's also a term already used in the label based graph tests. nodegroup_node would be fine, but it seems a bit awkward to me. I realize that's pretty subjective, so maybe others would like to weigh in (@apeters, @robgaston?)

I agree with @chiatt. I think grouping_node describes its function nicely.

@apeters
Copy link
Member

apeters commented Nov 19, 2024

I would also say that from a "what makes the most sense" perspective (and not to throw a monkey wrench 🐵 🔧 into this discussion) that having this on the "Nodegroup" table makes the most sense. So, in that case you'd end with something like this Node.nodegroup.rootnode to get to the "grouping_node" (but in this case, I think "rootnode" makes most sense when it's hanging off of the Nodegroup table).

Now pragmatically speaking this may not net us less db calls if that's what we're trying to achieve with this, but it does feel very natural.

@jacobtylerwalls jacobtylerwalls changed the title Add Node.grouping_node #11613 Add NodeGroup.root_node #11613 Nov 19, 2024
@jacobtylerwalls
Copy link
Member Author

Yeah, it makes good sense to move it. I can handle the missing safety with a Django system check.

@jacobtylerwalls
Copy link
Member Author

@chiatt ready for another look :-)

arches/app/models/models.py Outdated Show resolved Hide resolved
Copy link
Contributor

@chrabyrd chrabyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OOO today but saw these interesting updates and wanted to stick my nose in 👃

@jacobtylerwalls jacobtylerwalls changed the title Add NodeGroup.root_node #11613 Add NodeGroup.grouping_node #11613 Nov 22, 2024
Copy link
Contributor

@chrabyrd chrabyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, just a few questions 👍

constraints = [
models.CheckConstraint(
condition=Q(grouping_node=models.F("pk"))
| Q(grouping_node__isnull=True),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to potentially rehash but it's been a bit since I've looked at this. What's the condition for a NodeGroup to exist without a Node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purely for convenience when creating them -- we have to create one (NodeGroup) and then the other (Node), unless we want to require that everybody create them in a single transaction and then run this as a deferred constraint, but that's a breaking change I didn't want to impose.

There's a validation check in manage.py validate that checks for nodeless nodegroups. Apparently we're still creating some on setup_db:

Arches integrity report
Prepared by Arches 8.0.0a0 on Thu Jan  2 15:41:29 2025

        Error   Rows    Fixable?        Description

PASS    1005    0       N/A             Nodes with ontologies found in graphs without ontologies

FAIL    1012    1       Yes             Node Groups without contained nodes

PASS    1013    0       N/A             Node Groups without a grouping node

PASS    1014    0       N/A             Publication missing for language

arches/app/models/models.py Show resolved Hide resolved
arches/app/models/graph.py Outdated Show resolved Hide resolved
Copy link
Member

@chiatt chiatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep getting the following error when attempting to open an existing model in the graph designer or when trying to save a tile (on save the error is logged, but it doesn't prevent the save):

"arches/app/models/graph.py", line 224, in __init__
    nodes = [models.Node(**node_slug) for node_slug in node_slugs]
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "arches/app/models/models.py", line 890, in __init__
    super(Node, self).__init__(*args, **kwargs)
TypeError: Node() got unexpected keyword arguments: 'grouping_node_id'

And this error when trying to publish an existing model:

g_node_id'
Traceback (most recent call last):
  File "/arches/app/views/graph.py", line 688, in post
    source_graph.update_from_editable_future_graph()
  File "/arches/app/models/graph.py", line 2650, in update_from_editable_future_graph
    source_node.nodegroup.grouping_node_id = source_node.nodegroup_id
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'grouping_node_id'

I double checked my migrations, and they seem to have run successfully.

@jacobtylerwalls
Copy link
Member Author

For the first error, that looks like a graph publication dating from an earlier version of this PR when grouping_node_id was on Node. Unfortunately I think you'll need to edit it by hand. (Perhaps you could adjust def remove_grouping_node in the migration to also act on Nodes, and then run the migration backwards and forwards.)

For the second error, nice catch -- I'm going to remove the code that apparently I thought I didn't need two months ago 😄 . Graph publication works for me now.

Comment on lines 23 to 24
for node_dict in published_graph.serialized_graph["nodes"]:
node_dict["grouping_node_id"] = node_dict["nodegroup_id"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error that I am seeing is caused here. The "grouping_node_id" is getting assigned back to the node. Shouldn't it be assigned to the nodegroup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, should have retested this. Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that did the trick!

One other thing. I noticed that if I create a new graph and then create a new node, my new node does not have the groupingnodeid assigned. That may be fine because it's a nodegroup of 1, so there really isn't a group to be concerned about. However, if that first node is semantic, and I add another node, then the first semantic node still doesn't have a groupingnodeid. This seems like it could be an issue because it really does have a child node in its group.

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, taking a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@chiatt chiatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jacobtylerwalls
Copy link
Member Author

Thanks for the reviews!

@jacobtylerwalls jacobtylerwalls merged commit 4d198a8 into dev/8.0.x Jan 10, 2025
7 checks passed
@jacobtylerwalls jacobtylerwalls deleted the jtw/node-for-nodegroup branch January 10, 2025 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add grouping_node column to NodeGroup
4 participants