Consolidate defined_list/initial_list/frozen_list into entity_list #11

bradenmacdonald · 2025-02-21T18:00:51Z

So far in all the test cases I've written including test_snapshots_of_published_unit (viewing all the historic versions of a unit), there has been no need to reference initial_list nor frozen_list. So in this PR I'm just removing them entirely, consolidating so that each container only has a single entity_list. All of the new "units" tests continue to pass, unchanged.

Question 1: Any objections, or am I missing anything? I know we'll need to update the ADRs as well, but they're not included in this branch.

Question 2: Do we still need the EntityList at all? Right now we have ContainerEntityVersion which is 1:1 with EntityList and 1:N with EntityListRow. It would be simpler to just have ContainerEntityVersion which is 1:N with EntityListRow directly; no need for an EntityList middleman.

The code does say:

sometimes we'll want the same kind of data structure for things that we
dynamically generate for individual students (e.g. Variants). EntityLists are
anonymous in a sense–they're pointed to by ContainerEntityVersions and
other models, rather than being looked up by their own identifiers.

However, EntityLists as implemented in the prototype are part of the authoring suite, so it doesn't seem to me like it would make sense to use them to store anything related to individual students, even if they represent the right kind of data structure for doing so. We could move them out into a common app if that is the intention. But perhaps we don't know enough about those use cases yet.

Anyhow, let me know your thoughts!

…ty_list

bradenmacdonald · 2025-02-21T18:05:42Z

@ormsbee @mariajgrimaldi @kdmccormick Can I get your thoughts on this please?

kdmccormick · 2025-02-21T18:20:10Z

@bradenmacdonald great question, I'm currently refreshing myself on why we had 3 separate lists in the first place. In the mean time, can you close this issue and re-create it in the upstream openedx-learning repo rather than the OC fork?

kdmccormick · 2025-02-21T18:55:45Z

^ Nevermind, I see it's a PR again a branch in you fork.

Anyway, my surface-level understanding is that the three lists exist to simultaneously support (using Unit/Component as the specific Container/Entity type):

Units with pinned component versions
Units with unpinned ("use latest") components
Ability to recreate a Unit at any arbitrary version given its edit history.

I recall us discussing that maybe (1) and (2) shouldn't be supported in the same data model. And if that's the case, then we can collapse down to a single defined_list in order to support (1) and (3).

Still thinking this through...

bradenmacdonald · 2025-02-21T18:56:41Z

@kdmccormick As you can see in the test cases, our existing prototype code can already handle all three of those things using only a single list:

a unit with pinned component versions
a unit with unpinned components
ability to recreate a Unit at any arbitrary ~~version given its edit history~~ point in publishing history, which is not quite the same as "edit history" but means "at any point when changes were published" (any changes in the learning package create a new point in the publishing history). i.e. we can't retrieve arbitrary drafts that were never published.

ormsbee · 2025-02-21T20:07:33Z

I think it's okay to consolidate down to one list, but I would like to keep the separate model for EntityList. I have a longer write up for it this evening.

Also, I should have a PR into your branch on Mon or Tues with the bookkeeping models. I think doing the tests like this was a great idea.

Thank you.

kdmccormick

Okay, got it--so, to summarize and generalize, we had thought we'd need all three lists in order to implement get_entities_in_container_as_of. But instead, it turns out that the container can have just a single entity_list (formerly defined_list), and we can implement get_entities_in_container_as_of by:

looking up the containerversion and its entity_list using the publish_log_id,
for the pinned entities in entity_list, return each one's pinned version (duh), and
for the unpinned entities in the entity_list, use the same publish_log_id to look up what each entity's latest publish version was at the time the containerversion as published, and return that.

I don't know why we hadn't resolved just to use the PublishLog to create versions before. Maybe there was an efficiency concern? Or maybe we were just too bogged down in the details to realize this solution? In any case, awesome simplification, and thank you for the very helpful unit tests 💯

bradenmacdonald · 2025-02-21T22:16:39Z

@kdmccormick Exactly!

I don't know why we hadn't resolved just to use the PublishLog to create versions before.

I got the idea from @ormsbee - when I read his docstring on the PublishLog model. So I guess he had already been thinking along those lines at some point.

I think it's okay to consolidate down to one list, but I would like to keep the separate model for EntityList. I have a longer write up for it this evening.

Sounds good. And that's what I'd done in the PR anyways.

I should have a PR into your branch on Mon or Tues with the bookkeeping models.

Awesome :)

bradenmacdonald · 2025-02-21T22:21:03Z

for the unpinned entities in the entity_list, use the same publish_log_id to look up what each entity's latest publish version was at the time the containerversion as published, and return that.

I don't know why we hadn't resolved just to use the PublishLog to create versions before. Maybe there was an efficiency concern?

Yeah, just to be clear: I implemented this and it works well, but I just went with a very naive implementation for now that does a query inside a loop over the unit contents, so it's not very efficient. It's not clear if looking up historic snapshots is something we even need to optimize, but if we do there's lots of ways to do that, such as lifting that query outside of the loop or using the bookkeeping models @ormsbee is working on.

ormsbee · 2025-02-22T07:31:47Z

The extra lists were trying to handle weird edge cases that I think we've simplified. For instance, how to handle unpinned references to components from units when those components are deleted (and those deletes are published). There was the question of whether that should force the creation of a new version of the Unit or not, and how to handle reverts. Some of that thrashing is in this ticket, though honestly it's probably not worth reading through at this point.

Also, while the PublishLog provides the ability to reconstruct what the published view of this Unit was across time, there was no way to do that for the Draft because there's no history of the Draft. We know when versions were created, but we don't track soft deletes or "revert to published", so we can't reliably reconstruct that history.

I think we've come to some simplifying modifications since then. We'll accept null references to components in Units (i.e. don't force container updates) to simplify publishes and reverts. I'm creating a DraftChange model so we can reconstruct that history. (We'll want it for capturing "child element has changed" information anyway.) Etc.

With respect to keeping EntityList separate, it's a few things:

I want to avoid re-writing the list every time there's a metadata change to the container, like a name change or other metadata we're going to hang off of Unit over time.
It's also nice to just be able to separate out metadata changes vs. structural changes in an easy way.
There are places where I think having raw EntityLists is useful on the authoring side. For instance, Split Tests would want to encode multiple separate EntityLists as their different options.

kdmccormick · 2025-02-24T15:40:24Z

thanks for the background @ormsbee :)

We'll accept null references to components in Units (i.e. don't force container updates) to simplify publishes and reverts.

I don't follow (not out of disagreement, I think I'm just missing some context). Can you explain or link me to an issue/model?

Regarding EntityList, I'm aligned with you. My sense is that EntityList feels redundant currently because we've only been modeling static content trees.

refactor: consolidate defined_list/initial_list/frozen_list into enti…

e09a20e

…ty_list

kdmccormick approved these changes Feb 21, 2025

View reviewed changes

bradenmacdonald merged commit e09a20e into braden/containers-units Feb 21, 2025
8 checks passed

bradenmacdonald deleted the braden/monolist branch February 21, 2025 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate defined_list/initial_list/frozen_list into entity_list #11

Consolidate defined_list/initial_list/frozen_list into entity_list #11

bradenmacdonald commented Feb 21, 2025 •

edited

Loading

bradenmacdonald commented Feb 21, 2025

kdmccormick commented Feb 21, 2025 •

edited

Loading

kdmccormick commented Feb 21, 2025

bradenmacdonald commented Feb 21, 2025 •

edited

Loading

ormsbee commented Feb 21, 2025

kdmccormick left a comment

bradenmacdonald commented Feb 21, 2025

bradenmacdonald commented Feb 21, 2025

ormsbee commented Feb 22, 2025 •

edited

Loading

kdmccormick commented Feb 24, 2025

Consolidate defined_list/initial_list/frozen_list into entity_list #11

Consolidate defined_list/initial_list/frozen_list into entity_list #11

Conversation

bradenmacdonald commented Feb 21, 2025 • edited Loading

bradenmacdonald commented Feb 21, 2025

kdmccormick commented Feb 21, 2025 • edited Loading

kdmccormick commented Feb 21, 2025

bradenmacdonald commented Feb 21, 2025 • edited Loading

ormsbee commented Feb 21, 2025

kdmccormick left a comment

Choose a reason for hiding this comment

bradenmacdonald commented Feb 21, 2025

bradenmacdonald commented Feb 21, 2025

ormsbee commented Feb 22, 2025 • edited Loading

kdmccormick commented Feb 24, 2025

bradenmacdonald commented Feb 21, 2025 •

edited

Loading

kdmccormick commented Feb 21, 2025 •

edited

Loading

bradenmacdonald commented Feb 21, 2025 •

edited

Loading

ormsbee commented Feb 22, 2025 •

edited

Loading