feat: page level caching #3158

elevatebart · 2025-02-19T12:38:25Z

🔗 Linked issue

closes #3151

❓ Type of change

📖 Documentation (updates to the documentation or readme)
🐞 Bug fix (a non-breaking change that fixes an issue)
👌 Enhancement (improving an existing functionality like performance)
✨ New feature (a non-breaking change that adds functionality)
⚠️ Breaking change (fix or feature that would cause existing functionality to change)

📚 Description

This PR moves the caching level to each page/record instead of each collection.

This allows for faster iteration.

Build time
- Only take into account the definition of the table in the table hash
- Add a __hash__ field on each content table to store the hash of the current record
  - it should have the "unique" constraint
- Calculate at build time the hash of the values then append the hash at the end of the values before setting the query
- Store the list of hash in the database dump as the first line of the dump.
  - Store in a SQL comment to respect sql syntax
  - Store it as a JSON table to facilitate parsing and avoid bad chars
At runtime
- Start by getting the list of existing hash from the database
- From the database dump get the list of hash that needs to be set after init
- Get the list of hash that need to be removed and delete them (split into multiple queries if necessary)
- Run the needed inserts from the dump
  - Since the queries and the hash are in the same order, use the array to know what inserts to run

📝 Checklist

I have linked an issue or discussion.
I have updated the documentation accordingly.

elevatebart · 2025-02-19T12:39:04Z

This PR still needs some extra tests for updating an existing db.

pkg-pr-new · 2025-02-19T13:49:04Z

npm i https://pkg.pr.new/@nuxt/content@3158

commit: 53198e2

elevatebart · 2025-02-20T09:12:45Z

Live testing has been made on this PR.

I would love some help for E2E testing of updates.
I can't find a way to setup a project, build it is serve it... specially since I am trying to build/serve multiple times with file changes.

If you have the time to show me the ropes, I am all ears.
If not, I'd rather have you write the E2E tests here.

Test scenarii:

initialization works on an empty database
when collection structure change (add a new field), table is dropped
when adding md doc, the new doc is added in the DB
when removing a doc, removed in the DB
when updating a doc, doc ...
All 3 together (add some, remove some, update some)

To be complete, docs should be in multiple directories so re-order has an effect.

farnabaz

Generally, I like the idea.
I have concerns about the solution. Removing drop queries is a breaking change for the full dump and breaks NuxtHub deployment. (NuxtHub uses this full dump to refill the database)

What if we keep the drop query and prefix it with structure checksum /* checksum */ DROP ..., then we ignore these lines if the checksum is not changed.

We can add similar prefixes for insert and update queries. This will simply hash detection here

if (unchangedStructure && (sql.startsWith('INSERT ') || sql.startsWith('UPDATE ')) && !indexesToInsert.includes(index)) {

The dump can be like

/* structure:{HASH} */ DROP TABLE ......
/* structure:{HASH} */ CREATE TABLE ...
/* hash:{ROW_HASH} */ INSERT ...
/* hash:{ROW_HASH} */ UPDATE ...

With these comments, the dump is executable outside of our logic and we have optimize our import

WDYT?

farnabaz · 2025-02-24T16:12:33Z

src/runtime/internal/database.server.ts

+    // If the structure has not changed,
+    // skip any insert/update line whose hash is already in the database.
+    // If not, since we dropped the table, no record is skipped, insert them all again.
+    if (unchangedStructure && (sql.startsWith('INSERT ') || sql.startsWith('UPDATE ')) && !indexesToInsert.includes(index)) {


Values of indexesToInsert does not match the index of queries in dump therefore indexesToInsert.includes(index) is not a valid condition to check

The hashes indexed here come directly from the dump

So they are ordered the same way as the insert query, I make sure of it at build time.

The table of hash is padded at the beginning with the exact number of lines in the init. If there are 3 lines of insert, we insert 3 empty strings at the beginning.

That is why I would have loved a few e2e tests.

I like your idea of having a hash at the begining of the line so the code is simpler to understand

elevatebart · 2025-02-25T10:02:09Z

I have tweaked the system a little:

restored the drop table in the dump
added the hash in comment in front of updates and inserts
ignored all structure queries if the structure has not changed
made sure that info version inserts and updates are always committed

Example of resulting dump

-- ["VwBBpsSOHZ","EO7IcoosJP"]
CREATE TABLE IF NOT EXISTS _content_info (id TEXT PRIMARY KEY, "ready" BOOLEAN, "structureVersion" VARCHAR, "version" VARCHAR, "__hash__" TEXT UNIQUE);
/* starting_init */ INSERT INTO _content_info VALUES ('checksum_content', false, 'fr2JfCq2gl', 'v3.2.3--dE9pgeuPBt', '3hvYqGvzZ8');
DROP TABLE IF EXISTS _content_content;
CREATE TABLE IF NOT EXISTS _content_content (id TEXT PRIMARY KEY, "title" VARCHAR, "body" TEXT, "description" VARCHAR, "extension" VARCHAR, "meta" TEXT, "navigation" TEXT DEFAULT true, "path" VARCHAR, "seo" TEXT DEFAULT '{}', "stem" VARCHAR, "__hash__" TEXT UNIQUE);
/* VwBBpsSOHZ */ INSERT INTO _content_content VALUES ('content/about.md', 'About', '{"type":"minimal","value":[["h1",{"id":"about"},"About"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', '', 'md', '{"booleanField":false,"numberField":123,"arrayField":["item3","item4"]}', 'true', '/about', '{"title":"About","description":""}', 'about', 'VwBBpsSOHZ');
/* EO7IcoosJP */ INSERT INTO _content_content VALUES ('content/index.md', 'Home page', '{"type":"minimal","value":[["h1",{"id":"home-page"},"Home page"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', '', 'md', '{"booleanField":true,"numberField":1,"arrayField":["item1","item2"]}', 'true', '/', '{"title":"Home page","description":""}', 'index', 'EO7IcoosJP');
/* successful_init */ UPDATE _content_info SET ready = true WHERE id = 'checksum_content';

NOTA: I kept the first line of hashes as an index that I do not need to rebuild as I run through the dump.
The deletes have to be done first to avoid primary key conflicts.
To do delete the records that need cleaning, I need the list of hash.
To get the list of hash I need to roll through the dump which could be costly.

I will optimize for readability and remove that first line.

elevatebart · 2025-02-25T10:51:40Z

In that last commit I changed the format of the dump once more:

CREATE TABLE IF NOT EXISTS _content_info (id TEXT PRIMARY KEY, "ready" BOOLEAN, "structureVersion" VARCHAR, "version" VARCHAR, "__hash__" TEXT UNIQUE); /* structure */
INSERT INTO _content_info VALUES ('checksum_content', false, 'RYgZDcy6sk', 'v3.2.3--DZOppTJAd2', 'synekhxovA');
DROP TABLE IF EXISTS _content_content; /* structure */
CREATE TABLE IF NOT EXISTS _content_content (id TEXT PRIMARY KEY, "title" VARCHAR, "arrayField" TEXT, "body" TEXT, "booleanField" BOOLEAN, "description" VARCHAR, "extension" VARCHAR, "meta" TEXT, "navigation" TEXT DEFAULT true, "numberField" INT, "path" VARCHAR, "seo" TEXT DEFAULT '{}', "stem" VARCHAR, "__hash__" TEXT UNIQUE); /* structure */
INSERT INTO _content_content VALUES ('content/index.md', 'Home page', '["item1","item2"]', '{"type":"minimal","value":[["h1",{"id":"home-page"},"Home page"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', true, '', 'md', '{}', 'true', 1, '/', '{"title":"Home page","description":""}', 'index', 'JApZF0Dncz'); /* checksum: JApZF0Dncz */
UPDATE _content_info SET ready = true WHERE id = 'checksum_content';

This feels more readable to both human beings and sql scripts.

All comments are at the end to avoid changing the tests.
Structure related statements are flagged with a /* structure */ suffix.
Version related must never be ignored so they have no comment.

elevatebart added 5 commits February 19, 2025 11:01

feat: add a hash per record in pages

cf060b9

fix: add the hash in local preview database too

e7222f8

create the page level caching system

4cdbd7a

fix: version missing

d4c99b8

fix tests

8869a48

elevatebart marked this pull request as draft February 19, 2025 12:38

elevatebart added 3 commits February 19, 2025 14:31

fix lints

4405e39

Merge branch 'main' into feat/page-level-caching

177550b

set proper insert queries in dev

0dd1ea7

elevatebart added 7 commits February 19, 2025 15:05

fix: only drop the table if the structure has changed

37432d0

update database version

21560fc

fix tests and build

bd0f83f

only have the database version on the structure

c690a23

fix: order hash by the same as the insert statements

6db9e28

Merge branch 'main' into feat/page-level-caching

6e9dbce

fix database version snafu

9326dca

elevatebart changed the title ~~feat: page level caching (WIP)~~ feat: page level caching Feb 19, 2025

elevatebart marked this pull request as ready for review February 19, 2025 19:44

Merge branch 'main' into feat/page-level-caching

06d551e

farnabaz requested changes Feb 24, 2025

View reviewed changes

elevatebart added 4 commits February 25, 2025 09:55

Merge branch 'main' into feat/page-level-caching

bdb89f5

fix: restore drop database in the dump

0f5259e

add hash on query

985e3bd

fix info comments

1c38a8a

elevatebart added 2 commits February 25, 2025 11:04

simplify: remove the sql comment skip optim

77da3a8

update tests

940f03c

elevatebart added 2 commits February 25, 2025 11:49

final format for dump

74ee06b

remove final comment

53198e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: page level caching #3158

feat: page level caching #3158

elevatebart commented Feb 19, 2025 •

edited

Loading

elevatebart commented Feb 19, 2025

pkg-pr-new bot commented Feb 19, 2025 •

edited

Loading

elevatebart commented Feb 20, 2025

farnabaz left a comment •

edited

Loading

farnabaz Feb 24, 2025

elevatebart Feb 25, 2025

elevatebart commented Feb 25, 2025 •

edited

Loading

elevatebart commented Feb 25, 2025 •

edited

Loading

feat: page level caching #3158

Are you sure you want to change the base?

feat: page level caching #3158

Conversation

elevatebart commented Feb 19, 2025 • edited Loading

🔗 Linked issue

❓ Type of change

📚 Description

📝 Checklist

elevatebart commented Feb 19, 2025

pkg-pr-new bot commented Feb 19, 2025 • edited Loading

elevatebart commented Feb 20, 2025

farnabaz left a comment • edited Loading

Choose a reason for hiding this comment

farnabaz Feb 24, 2025

Choose a reason for hiding this comment

elevatebart Feb 25, 2025

Choose a reason for hiding this comment

elevatebart commented Feb 25, 2025 • edited Loading

elevatebart commented Feb 25, 2025 • edited Loading

elevatebart commented Feb 19, 2025 •

edited

Loading

pkg-pr-new bot commented Feb 19, 2025 •

edited

Loading

farnabaz left a comment •

edited

Loading

elevatebart commented Feb 25, 2025 •

edited

Loading

elevatebart commented Feb 25, 2025 •

edited

Loading