Optimise external secondary instance memory use #6454

seadowg · 2024-10-14T13:29:45Z

As we did for entities with #5623, we should optimise all external CSV secondary instances (that aren't entity lists) so that they do not need to be loaded into memory for simple eq filters (<child> = '<value>').

Unlike with entities, we can get away with creating the optimised representation (probably a SQLite DB) on form load as long as this only has to happen once for each form version.

The result of this should be that, for an expression like age = 25:

The full secondary instance does not need to be in memory to evaluate the expression (other than temporarily on first form version load)
The expression can be evaluated in a reasonable time (under 5s) on mainstream devices for datasets as large as 100k items (like with entities).

Questions

~~Should this be multiple issues?~~
- Yes probably to handle the pulldata clean up
~~How do we deal with search? Should we reimplement it as sugar like with pulldata or do we just remove it?~~
- Discussed at Don't create a search/pulldata database when pulldata is used over an Entity List #6471
What expressions should we support?
- We want to replace pulldata so just need enough to do that. That means <child> = <value>.

Notes

Implementing this should be fairly similar to entities, with the exception of creating the optimised representation on load rather than on form download. This probably means that we want to create a custom FileInstanceParser that Collect can configure JavaRosa to use that creates this representation if it doesn't already exist when the instance is parsed and then handles returning the instance as a TreeElement. Like with entities, this will have to handle "partial" and "full" parses to allow the low memory footprint for large datasets. There will also need to be some kind of FilterStrategy that (again like entities) handles the optimised expressions and deals with replacing partial elements in the instance.

It might be possible to share the entities FilterStrategy (and potentially even the FileInstanceParser/Instance Provider with some rework) by generalizing LocalEntitiesInstanceAdapter.

Another thing to point out is that we'll most likely want to create the optimised representation at download time in the future, so the implementation should account for making that easy.

The text was updated successfully, but these errors were encountered:

seadowg added the needs discussion label Oct 14, 2024

seadowg added this to the v2024.4 milestone Oct 14, 2024

seadowg added this to ODK Collect Oct 14, 2024

github-project-automation bot moved this to not ready in ODK Collect Oct 14, 2024

seadowg mentioned this issue Oct 14, 2024

Add support for pulldata with local entities #6451

Merged

6 tasks

lognaturel mentioned this issue Oct 23, 2024

Don't create a search/pulldata database when pulldata is used over an Entity List #6471

Closed

seadowg mentioned this issue Dec 17, 2024

Replace pulldata implementation with sugar #6552

Open

seadowg removed the needs discussion label Dec 17, 2024

seadowg moved this from not ready to ready in ODK Collect Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise external secondary instance memory use #6454

Optimise external secondary instance memory use #6454

seadowg commented Oct 14, 2024 •

edited

Loading

Optimise external secondary instance memory use #6454

Optimise external secondary instance memory use #6454

Comments

seadowg commented Oct 14, 2024 • edited Loading

Questions

Notes

seadowg commented Oct 14, 2024 •

edited

Loading