Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk: update epoc.txt and vere.txt atomically #669

Merged
merged 3 commits into from
Jul 24, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 25 additions & 9 deletions pkg/vere/disk.c
Original file line number Diff line number Diff line change
Expand Up @@ -1123,18 +1123,26 @@ u3_disk_epoc_zero(u3_disk* log_u)
}

// create epoch version file, overwriting any existing file
c3_c epi_c[8193];
c3_c epv_c[8193];
snprintf(epv_c, sizeof(epv_c), "%s/epoc.txt", epo_c);
FILE* epv_f = fopen(epv_c, "w"); // XX errors
snprintf(epi_c, sizeof(epv_c), "%s/epoc.tmp", epo_c);
FILE* epv_f = fopen(epi_c, "w"); // XX errors
fprintf(epv_f, "%d", U3E_VERLAT);
fclose(epv_f);
snprintf(epv_c, sizeof(epv_c), "%s/epoc.txt", epo_c);
rename(epi_c, epv_c);
c3_unlink(epi_c);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any reason to write this metadata in one place and then rename() it. That operation is atomic, but that doesn't guard against anything that we care about -- we're just creating a new epoc here, not updating meaningful data "in place".

I think the issue is simply that these version files are not being synced to disc. After fwrite(), we need to call fflush() and then c3_sync(fileno(...)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further discussion, there is one possible advantage to rename() -- any parsing errors in reading epoc.txt would then indicate tampering with the file, not a crash while writing it (or the current problem, failure to sync). The practical result would be that, in the case that vere crashes while creating a new epoc but before rename()-ing epoc.txt into place, that epoc would be automatically removed on startup as opposed to producing a fatal error that leaves it in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just tested this with a fresh fakezod. I simulated a crash before rename() of the epoc.tmp to epoc.txt file during a roll, thereby leaving only the snapshot files and an epoc.tmp file in the new epoch's folder. Afterwards, I restarted the ship with ./urbit zod and it successfully detected a bogus epoch, removed it, and booted from the latest valid epoch.


// create binary version file, overwriting any existing file
c3_c bii_c[8193];
c3_c biv_c[8193];
snprintf(biv_c, sizeof(biv_c), "%s/vere.txt", epo_c);
FILE* biv_f = fopen(biv_c, "w"); // XX errors
snprintf(bii_c, sizeof(biv_c), "%s/vere.tmp", epo_c);
FILE* biv_f = fopen(bii_c, "w"); // XX errors
fprintf(biv_f, URBIT_VERSION); // XX append a sentinel for auto-rollover?
fclose(biv_f);
snprintf(biv_c, sizeof(epv_c), "%s/vere.txt", epo_c);
rename(bii_c, biv_c);
c3_unlink(bii_c);

if ( -1 == c3_sync(epo_i) ) { // XX fdatasync on linux?
fprintf(stderr, "disk: sync epoch dir 0i0 failed: %s\r\n", strerror(errno));
Expand Down Expand Up @@ -1205,18 +1213,26 @@ _disk_epoc_roll(u3_disk* log_u, c3_d epo_d)
}

// create epoch version file, overwriting any existing file
c3_c epi_c[8193];
c3_c epv_c[8193];
snprintf(epv_c, sizeof(epv_c), "%s/epoc.txt", epo_c);
FILE* epv_f = fopen(epv_c, "w"); // XX errors
snprintf(epi_c, sizeof(epv_c), "%s/epoc.tmp", epo_c);
FILE* epv_f = fopen(epi_c, "w"); // XX errors
fprintf(epv_f, "%d", U3E_VERLAT);
fclose(epv_f);
snprintf(epv_c, sizeof(epv_c), "%s/epoc.txt", epo_c);
rename(epi_c, epv_c);
c3_unlink(epi_c);

// create binary version file, overwriting any existing file
c3_c bii_c[8193];
c3_c biv_c[8193];
snprintf(biv_c, sizeof(biv_c), "%s/vere.txt", epo_c);
FILE* biv_f = fopen(biv_c, "w"); // XX errors
fprintf(biv_f, URBIT_VERSION);
snprintf(bii_c, sizeof(biv_c), "%s/vere.tmp", epo_c);
FILE* biv_f = fopen(bii_c, "w"); // XX errors
fprintf(biv_f, URBIT_VERSION); // XX append a sentinel for auto-rollover?
fclose(biv_f);
snprintf(biv_c, sizeof(epv_c), "%s/vere.txt", epo_c);
rename(bii_c, biv_c);
c3_unlink(bii_c);

// get metadata from old log
c3_d who_d[2];
Expand Down
Loading