-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure recovery #31
Comments
See https://cseweb.ucsd.edu/~swanson/papers/DAC2011PowerCut.pdf for a paper researching behavior of different SSD on power failure |
To effectively check for corrupted documents, a checksum is necessary (see #72), otherwise only unfinished writes could be detected. However, filesystems typically increase file size first, then write to the file, so the file size could be correct, but the contents be corrupted. This could only be detected for corruptions breaking the serialization format, but there's still a chance a document gets deserialized that was not written. The checksum should contain previous document checksum in order to also be able to guarantee immutability of the whole partition. |
After thinking more about checksums, this should be solely a serializer concern and hence fully pluggable. Dictating a checksum into the document has a couple of consequences:
So checksums only play a role in the following use-cases:
For all those use cases, it is good enough and relatively easy to achieve, by changing the serializer methods. Maybe the common use-cases (custom serialization format, immutability guarantees) should be shown in the documentation. |
Requires #24 in order to fix the global index in case it is broken. |
With #145 included the next steps are roughly like this:
*another option would be to only truncate the single torn writes and keep potentially succesfully written later documents in other partitions. However, this would mean that documents go missing in between and sequence numbers have holes. Also, indexes would still potentially point to non-existing/wrong documents. |
After a crash and potentially broken records, the storage should heal itself.
This can be achieved by following steps:
new Error('Index file is corrupt!')
)new Error('Can only truncate on valid document boundaries.')
)The text was updated successfully, but these errors were encountered: