Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate defined_list/initial_list/frozen_list into entity_list #11

Merged
merged 1 commit into from
Feb 21, 2025

Conversation

bradenmacdonald
Copy link
Member

@bradenmacdonald bradenmacdonald commented Feb 21, 2025

So far in all the test cases I've written including test_snapshots_of_published_unit (viewing all the historic versions of a unit), there has been no need to reference initial_list nor frozen_list. So in this PR I'm just removing them entirely, consolidating so that each container only has a single entity_list. All of the new "units" tests continue to pass, unchanged.

Question 1: Any objections, or am I missing anything? I know we'll need to update the ADRs as well, but they're not included in this branch.

Question 2: Do we still need the EntityList at all? Right now we have ContainerEntityVersion which is 1:1 with EntityList and 1:N with EntityListRow. It would be simpler to just have ContainerEntityVersion which is 1:N with EntityListRow directly; no need for an EntityList middleman.

The code does say:

sometimes we'll want the same kind of data structure for things that we
dynamically generate for individual students (e.g. Variants). EntityLists are
anonymous in a sense–they're pointed to by ContainerEntityVersions and
other models, rather than being looked up by their own identifiers.

However, EntityLists as implemented in the prototype are part of the authoring suite, so it doesn't seem to me like it would make sense to use them to store anything related to individual students, even if they represent the right kind of data structure for doing so. We could move them out into a common app if that is the intention. But perhaps we don't know enough about those use cases yet.

Anyhow, let me know your thoughts!

@bradenmacdonald
Copy link
Member Author

@ormsbee @mariajgrimaldi @kdmccormick Can I get your thoughts on this please?

@kdmccormick
Copy link

kdmccormick commented Feb 21, 2025

@bradenmacdonald great question, I'm currently refreshing myself on why we had 3 separate lists in the first place. In the mean time, can you close this issue and re-create it in the upstream openedx-learning repo rather than the OC fork?

@kdmccormick
Copy link

^ Nevermind, I see it's a PR again a branch in you fork.

Anyway, my surface-level understanding is that the three lists exist to simultaneously support (using Unit/Component as the specific Container/Entity type):

  1. Units with pinned component versions
  2. Units with unpinned ("use latest") components
  3. Ability to recreate a Unit at any arbitrary version given its edit history.

I recall us discussing that maybe (1) and (2) shouldn't be supported in the same data model. And if that's the case, then we can collapse down to a single defined_list in order to support (1) and (3).

Still thinking this through...

@bradenmacdonald
Copy link
Member Author

bradenmacdonald commented Feb 21, 2025

@kdmccormick As you can see in the test cases, our existing prototype code can already handle all three of those things using only a single list:

@ormsbee
Copy link

ormsbee commented Feb 21, 2025

I think it's okay to consolidate down to one list, but I would like to keep the separate model for EntityList. I have a longer write up for it this evening.

Also, I should have a PR into your branch on Mon or Tues with the bookkeeping models. I think doing the tests like this was a great idea.

Thank you.

Copy link

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, got it--so, to summarize and generalize, we had thought we'd need all three lists in order to implement get_entities_in_container_as_of. But instead, it turns out that the container can have just a single entity_list (formerly defined_list), and we can implement get_entities_in_container_as_of by:

  • looking up the containerversion and its entity_list using the publish_log_id,
  • for the pinned entities in entity_list, return each one's pinned version (duh), and
  • for the unpinned entities in the entity_list, use the same publish_log_id to look up what each entity's latest publish version was at the time the containerversion as published, and return that.

I don't know why we hadn't resolved just to use the PublishLog to create versions before. Maybe there was an efficiency concern? Or maybe we were just too bogged down in the details to realize this solution? In any case, awesome simplification, and thank you for the very helpful unit tests 💯

@bradenmacdonald
Copy link
Member Author

@kdmccormick Exactly!

I don't know why we hadn't resolved just to use the PublishLog to create versions before.

I got the idea from @ormsbee - when I read his docstring on the PublishLog model. So I guess he had already been thinking along those lines at some point.

I think it's okay to consolidate down to one list, but I would like to keep the separate model for EntityList. I have a longer write up for it this evening.

Sounds good. And that's what I'd done in the PR anyways.

I should have a PR into your branch on Mon or Tues with the bookkeeping models.

Awesome :)

@bradenmacdonald bradenmacdonald merged commit e09a20e into braden/containers-units Feb 21, 2025
8 checks passed
@bradenmacdonald bradenmacdonald deleted the braden/monolist branch February 21, 2025 22:17
@bradenmacdonald
Copy link
Member Author

for the unpinned entities in the entity_list, use the same publish_log_id to look up what each entity's latest publish version was at the time the containerversion as published, and return that.

I don't know why we hadn't resolved just to use the PublishLog to create versions before. Maybe there was an efficiency concern?

Yeah, just to be clear: I implemented this and it works well, but I just went with a very naive implementation for now that does a query inside a loop over the unit contents, so it's not very efficient. It's not clear if looking up historic snapshots is something we even need to optimize, but if we do there's lots of ways to do that, such as lifting that query outside of the loop or using the bookkeeping models @ormsbee is working on.

@ormsbee
Copy link

ormsbee commented Feb 22, 2025

The extra lists were trying to handle weird edge cases that I think we've simplified. For instance, how to handle unpinned references to components from units when those components are deleted (and those deletes are published). There was the question of whether that should force the creation of a new version of the Unit or not, and how to handle reverts. Some of that thrashing is in this ticket, though honestly it's probably not worth reading through at this point.

Also, while the PublishLog provides the ability to reconstruct what the published view of this Unit was across time, there was no way to do that for the Draft because there's no history of the Draft. We know when versions were created, but we don't track soft deletes or "revert to published", so we can't reliably reconstruct that history.

I think we've come to some simplifying modifications since then. We'll accept null references to components in Units (i.e. don't force container updates) to simplify publishes and reverts. I'm creating a DraftChange model so we can reconstruct that history. (We'll want it for capturing "child element has changed" information anyway.) Etc.


With respect to keeping EntityList separate, it's a few things:

  1. I want to avoid re-writing the list every time there's a metadata change to the container, like a name change or other metadata we're going to hang off of Unit over time.
  2. It's also nice to just be able to separate out metadata changes vs. structural changes in an easy way.
  3. There are places where I think having raw EntityLists is useful on the authoring side. For instance, Split Tests would want to encode multiple separate EntityLists as their different options.

@kdmccormick
Copy link

thanks for the background @ormsbee :)

We'll accept null references to components in Units (i.e. don't force container updates) to simplify publishes and reverts.

I don't follow (not out of disagreement, I think I'm just missing some context). Can you explain or link me to an issue/model?

Regarding EntityList, I'm aligned with you. My sense is that EntityList feels redundant currently because we've only been modeling static content trees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants