Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a salt to the model output hash #275

Merged
merged 1 commit into from
Feb 12, 2025
Merged

Conversation

WardBrian
Copy link
Collaborator

@jsoules brought up an issue I had also seen in #274, which I think is caused by the changes #257 made which, among other things, mean the URL for a given model was stable across backend updates/restarts.

I think this is actually a bad thing, since it can mean a compilation result is served from the user's local cache rather than the server. Even if this isn't the source of the weird loading behavior we've both seen, it certain would cause issues if we needed to change something about the returned model's API that needed a simultaneous backend and frontend update.

This PR does the simplest fix I could think of, which is the docker build now computes a hash of all of the application files, and this hash is used to salt the model output folder generation. This means in a given build, all the workers will still compute the same hashes, but changing any file in our code or in the tinystan or tbb dependencies will lead to new URLs for the same models.

@WardBrian
Copy link
Collaborator Author

Things to verify as the reviewer:

  1. Running locally, repeated compilations are still hitting the server cache (i.e., each worker is still computing the same hash for the same model)
  2. After changing a file and re-building, the returned URL is different for a given model

Copy link
Collaborator

@jsoules jsoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hash seems consistent between different threads when recompiling, and I see no evidence of any compilation issues. Salt changed when I changed a file in the wasm-server directory, and reverted when I re-checked out the earlier version.

I think we're probably good to go here, though there is some sort of squirrelly unpredictability with when the hash-salt-lookup function gets called and when it doesn't.


# compute a hash of our app to make the model cache use unique URLS
# (to avoid browsers caching across different versions)
RUN find /app/ -type f -print0 | sort -z | xargs -0 sha1sum | sha1sum | cut -d' ' -f1 | tee /app/stan-wasm-server/hash-salt.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to sort them before hashing? Isn't the goal just for it to be different from last time (vs being reproducible between builds)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it doesn't matter much either way, but if no files changed we actually wouldn't mind a user's local cache being warm

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I suppose redeploying the server isn't exactly something we do so often that it's worth worrying about a spurious sort.

Comment on lines +22 to +36
@lru_cache
def _get_salt() -> bytes:
"""
Returns a salt to use when hashing Stan programs if one was set during the Docker build.
This helps ensure that models don't get cached by the browser across different deployments.
"""
salt_file = Path(__file__).parent.parent.parent.parent / "hash-salt.txt"
if not salt_file.exists():
logger.warning("No hash salt file found at %s", salt_file)
salt = b""
else:
salt = salt_file.read_bytes().strip()
logger.info("Using hash salt from %s: %s", salt_file, repr(salt))
return salt

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I be seeing the side effect of this function every time I compile a new (cache miss) main.stan? I would've thought these would be cache hits, but maybe I need a refresher on our threading model here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's not even that predictable when the function reruns and when it doesn't.

I don't think this much matters, but I like to note everything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should see it once per thread, so up to 4 times, and then no more

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any rhyme or reason to when a new thread is spawned?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, that would be a uvicorn implementation detail. It seems like it’s roughly round-robin

@WardBrian WardBrian merged commit 9a23544 into main Feb 12, 2025
2 checks passed
@WardBrian WardBrian deleted the salt-model-output-hash branch February 12, 2025 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants