IndexedDB versioned page store #37
Replies: 1 comment
-
I implemented this in a private branch. I like it a lot. I think the code is more elegant, especially because it doesn't use a write cache like the current VFS. I like everything about it except...it's slower. On the benchmarks page, it's slower (on Chrome) on most of the operations. It is still slower if I disable purging obsolete pages (step 5, which can be deferred) and clearing pages with the new version number (step 2, which is unsafe so just for analysis). My hope was that it would be faster at least when deferring purging. Why is it slower? I'm not sure yet. Right now I'm speculating that IDBObjectStore.get() is just slower with an IDBKeyRange than with a key (which I can measure if I get around to it), but it is also possible I've done something dumb in my code, such as await-ing somewhere that doesn't need to be synchronous. I did learn some things along the way. The most important thing is that doing "smart" things with SQLite transactions in a VFS requires recognizing a transaction from the pattern of VFS calls. The problem is that those patterns can change with different PRAGMA settings. For example:
|
Beta Was this translation helpful? Give feedback.
-
[Update: These ideas have been incorporated into example VFS classes IDBBatchAtomicVFS and IDBVersionedVFS.]
Here's a different idea for an IndexedDB VFS - allowing multiple versions of each page to minimize data movement.
The current VFS stores database pages in IndexedDB with a [filename, pageIndex] key, plus a metadata object for file size, page size, etc. That's a pretty straightforward way of emulating a block device. It caches all the writes for a SQLite transaction in memory plus spillover to another IndexedDB store, and doesn't flush them to the main store until the SQLite transaction finishes. Deferring that flush is good for two reasons: first, it minimizes when an exclusive lock is needed, and second, the pre-transaction contents of the database can be used instead of storing journal data (a savings of both space and time).
The cost of the deferred flush is that when the transaction overflows the in-memory write cache, then modified pages are written to the spill store and later read back and written to the database, a put-get-put operation on IndexedDB. Without the deferred flush, the VFS might need to read the current page, write it to the journal, and write the new contents to the database, a get-put-put operation but the get-put for the journal will sometimes be skipped.
The new idea is to store versioned database pages, i.e. with a [filename, pageIndex, version] key and an index on [filename, version], where newer versions are lower numbers. The current version number is stored in metadata. Fetching a page is now done by using an IDBKeyRange with a lower bound of the current version, which retrieves the first page at that version or higher (older). Here's how a transaction works with versioned pages:
The per-page operation is now put-delete (put the new version, delete the old version), which only moves data once, and it has the same ability to avoid storing journal data. In addition, the purge doesn't need to happen immediately at the end of the transaction, as the old versions don't affect correctness. So purging could be deferred until the database is otherwise idle.
If deferring purging is to be an option, storing a purge version (i.e. the last version we purged at) in the metadata will make it more efficient to determine which pages might need to be purged of old versions. The disadvantage of deferring purging, other than the storage used, is that it would make prefetching pages for read (e.g. using getAll() or a cursor) less efficient since there is no way to fetch or traverse only the most recent version of multiple pages at once. If delete is not cheap then the best purging and prefetching strategy will likely be application-specific.
Beta Was this translation helpful? Give feedback.
All reactions