Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk: update epoc.txt and vere.txt atomically #669

Merged
merged 3 commits into from
Jul 24, 2024
Merged

disk: update epoc.txt and vere.txt atomically #669

merged 3 commits into from
Jul 24, 2024

Conversation

pkova
Copy link
Collaborator

@pkova pkova commented Jul 4, 2024

Fixes #640

@pkova pkova requested a review from a team as a code owner July 4, 2024 13:01
pkg/vere/disk.c Outdated
fprintf(epv_f, "%d", U3E_VERLAT);
fclose(epv_f);
snprintf(epv_c, sizeof(epv_c), "%s/epoc.txt", epo_c);
rename(epi_c, epv_c);
c3_unlink(epi_c);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any reason to write this metadata in one place and then rename() it. That operation is atomic, but that doesn't guard against anything that we care about -- we're just creating a new epoc here, not updating meaningful data "in place".

I think the issue is simply that these version files are not being synced to disc. After fwrite(), we need to call fflush() and then c3_sync(fileno(...)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further discussion, there is one possible advantage to rename() -- any parsing errors in reading epoc.txt would then indicate tampering with the file, not a crash while writing it (or the current problem, failure to sync). The practical result would be that, in the case that vere crashes while creating a new epoc but before rename()-ing epoc.txt into place, that epoc would be automatically removed on startup as opposed to producing a fatal error that leaves it in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just tested this with a fresh fakezod. I simulated a crash before rename() of the epoc.tmp to epoc.txt file during a roll, thereby leaving only the snapshot files and an epoc.tmp file in the new epoch's folder. Afterwards, I restarted the ship with ./urbit zod and it successfully detected a bogus epoch, removed it, and booted from the latest valid epoch.

@pkova pkova merged commit e388128 into develop Jul 24, 2024
5 checks passed
@pkova pkova deleted the pkova/epoc branch July 24, 2024 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ships failing to start with "disk: unknown epoch version"
3 participants