-
-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: page level caching #3158
base: main
Are you sure you want to change the base?
feat: page level caching #3158
Conversation
This PR still needs some extra tests for updating an existing db. |
commit: |
Live testing has been made on this PR. I would love some help for E2E testing of updates. If you have the time to show me the ropes, I am all ears. Test scenarii:
To be complete, docs should be in multiple directories so re-order has an effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I like the idea.
I have concerns about the solution. Removing drop queries is a breaking change for the full dump and breaks NuxtHub deployment. (NuxtHub uses this full dump to refill the database)
What if we keep the drop query and prefix it with structure checksum /* checksum */ DROP ...
, then we ignore these lines if the checksum is not changed.
We can add similar prefixes for insert and update queries. This will simply hash detection here
if (unchangedStructure && (sql.startsWith('INSERT ') || sql.startsWith('UPDATE ')) && !indexesToInsert.includes(index)) {
The dump can be like
/* structure:{HASH} */ DROP TABLE ......
/* structure:{HASH} */ CREATE TABLE ...
/* hash:{ROW_HASH} */ INSERT ...
/* hash:{ROW_HASH} */ UPDATE ...
With these comments, the dump is executable outside of our logic and we have optimize our import
WDYT?
// If the structure has not changed, | ||
// skip any insert/update line whose hash is already in the database. | ||
// If not, since we dropped the table, no record is skipped, insert them all again. | ||
if (unchangedStructure && (sql.startsWith('INSERT ') || sql.startsWith('UPDATE ')) && !indexesToInsert.includes(index)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values of indexesToInsert
does not match the index of queries in dump
therefore indexesToInsert.includes(index)
is not a valid condition to check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hashes indexed here come directly from the dump
- So they are ordered the same way as the insert query, I make sure of it at build time.
- The table of hash is padded at the beginning with the exact number of lines in the init. If there are 3 lines of insert, we insert 3 empty strings at the beginning.
- That is why I would have loved a few e2e tests.
- I like your idea of having a hash at the begining of the line so the code is simpler to understand
I have tweaked the system a little:
Example of resulting dump -- ["VwBBpsSOHZ","EO7IcoosJP"]
CREATE TABLE IF NOT EXISTS _content_info (id TEXT PRIMARY KEY, "ready" BOOLEAN, "structureVersion" VARCHAR, "version" VARCHAR, "__hash__" TEXT UNIQUE);
/* starting_init */ INSERT INTO _content_info VALUES ('checksum_content', false, 'fr2JfCq2gl', 'v3.2.3--dE9pgeuPBt', '3hvYqGvzZ8');
DROP TABLE IF EXISTS _content_content;
CREATE TABLE IF NOT EXISTS _content_content (id TEXT PRIMARY KEY, "title" VARCHAR, "body" TEXT, "description" VARCHAR, "extension" VARCHAR, "meta" TEXT, "navigation" TEXT DEFAULT true, "path" VARCHAR, "seo" TEXT DEFAULT '{}', "stem" VARCHAR, "__hash__" TEXT UNIQUE);
/* VwBBpsSOHZ */ INSERT INTO _content_content VALUES ('content/about.md', 'About', '{"type":"minimal","value":[["h1",{"id":"about"},"About"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', '', 'md', '{"booleanField":false,"numberField":123,"arrayField":["item3","item4"]}', 'true', '/about', '{"title":"About","description":""}', 'about', 'VwBBpsSOHZ');
/* EO7IcoosJP */ INSERT INTO _content_content VALUES ('content/index.md', 'Home page', '{"type":"minimal","value":[["h1",{"id":"home-page"},"Home page"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', '', 'md', '{"booleanField":true,"numberField":1,"arrayField":["item1","item2"]}', 'true', '/', '{"title":"Home page","description":""}', 'index', 'EO7IcoosJP');
/* successful_init */ UPDATE _content_info SET ready = true WHERE id = 'checksum_content'; NOTA: I kept the first line of hashes as an index that I do not need to rebuild as I run through the dump. I will optimize for readability and remove that first line. |
In that last commit I changed the format of the dump once more: CREATE TABLE IF NOT EXISTS _content_info (id TEXT PRIMARY KEY, "ready" BOOLEAN, "structureVersion" VARCHAR, "version" VARCHAR, "__hash__" TEXT UNIQUE); /* structure */
INSERT INTO _content_info VALUES ('checksum_content', false, 'RYgZDcy6sk', 'v3.2.3--DZOppTJAd2', 'synekhxovA');
DROP TABLE IF EXISTS _content_content; /* structure */
CREATE TABLE IF NOT EXISTS _content_content (id TEXT PRIMARY KEY, "title" VARCHAR, "arrayField" TEXT, "body" TEXT, "booleanField" BOOLEAN, "description" VARCHAR, "extension" VARCHAR, "meta" TEXT, "navigation" TEXT DEFAULT true, "numberField" INT, "path" VARCHAR, "seo" TEXT DEFAULT '{}', "stem" VARCHAR, "__hash__" TEXT UNIQUE); /* structure */
INSERT INTO _content_content VALUES ('content/index.md', 'Home page', '["item1","item2"]', '{"type":"minimal","value":[["h1",{"id":"home-page"},"Home page"]],"toc":{"title":"","searchDepth":2,"depth":2,"links":[]}}', true, '', 'md', '{}', 'true', 1, '/', '{"title":"Home page","description":""}', 'index', 'JApZF0Dncz'); /* checksum: JApZF0Dncz */
UPDATE _content_info SET ready = true WHERE id = 'checksum_content'; This feels more readable to both human beings and sql scripts. All comments are at the end to avoid changing the tests. |
π Linked issue
closes #3151
β Type of change
π Description
This PR moves the caching level to each page/record instead of each collection.
This allows for faster iteration.
__hash__
field on each content table to store the hash of the current recordπ Checklist