feat(cu): optionally store and hydrate message duplication data to and from postgres #1107

VinceJuliano · 2025-01-14T19:36:32Z

Background

The CU requires a local store of message data in order to deduplicate incoming messages for evaluation. As a result, this local database must be copied over when creating a new CU to run certain processes. It also contains a postgres implementation when can utilize postgres for the deduplication in an ongoing manner for purposes of creating horizontally scalable reader units.

It is required that the CU not be reliant on postgres, but rather use it to store the message data to postgres via a worker that can be decoupled from the main CU code, then when a new CU receives a request for a process, it can signal to the worker to rehydrate its sqlite database from this remote postgres store for a given process, and then continue operation using sqlite.

Solution

Create a worker which can be signaled to, this worker will read the sqlite database and populate a postgres database with all of the message deduplication data for a process. The worker should be a black box while the CU does not know what it is running on in terms of database, it should be modeled after the evaluation worker at - https://github.com/permaweb/ao/blob/main/servers/cu/src/effects/worker
Within the CU create a new Environment variable called ENABLE_HYDRATOR, if this is set to true, when a CU receives a request for a result, signal to a worker which handles this hydration to hydrate sqlite for this process. Once complete, continue with the pipeline.
Create a way in the worker to flush the data for a process from sqlite to postgres upon receiving a signal. Build the flush one process function to be composable. Also create a flush all available processes composed on top of flush one process.
Enable the CU to receive a signal which signals the hydration worker to flush 1 process to postgres.
Enable the CU to receive another signal which will signal to the hydration worker to store everything it has for all of its processes in postgres.
Process flushing could potentially be incorporated into the existing checkpointing interval for example when a CU gets a signal to checkpoint it also flushes its processes. This is not a requirement however.
Create a new Environment variable called HYDRATOR_DB_URL which will hold the database url for this particular purpose. The worker will use this to hydrate to and from.

Developer Notes

A CU should still operate normally whether or not ENABLE_HYDRATOR and HYDRATOR_DB_URL are set, it should not require a postgres database to be there. The CU already has a postgres implementation this could be utilized.

VinceJuliano added enhancement New feature or request cu ao Compute Unit labels Jan 14, 2025

VinceJuliano assigned jfrain99 Jan 14, 2025

VinceJuliano mentioned this issue Jan 14, 2025

feat(cu): read message results from the file system, and optionally store them and hydrate them from a remote source #1108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cu): optionally store and hydrate message duplication data to and from postgres #1107

feat(cu): optionally store and hydrate message duplication data to and from postgres #1107

VinceJuliano commented Jan 14, 2025 •

edited

Loading

feat(cu): optionally store and hydrate message duplication data to and from postgres #1107

feat(cu): optionally store and hydrate message duplication data to and from postgres #1107

Comments

VinceJuliano commented Jan 14, 2025 • edited Loading

Background

Solution

Developer Notes

VinceJuliano commented Jan 14, 2025 •

edited

Loading