-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a salt to the model output hash #275
Conversation
Things to verify as the reviewer:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash seems consistent between different threads when recompiling, and I see no evidence of any compilation issues. Salt changed when I changed a file in the wasm-server
directory, and reverted when I re-checked out the earlier version.
I think we're probably good to go here, though there is some sort of squirrelly unpredictability with when the hash-salt-lookup function gets called and when it doesn't.
|
||
# compute a hash of our app to make the model cache use unique URLS | ||
# (to avoid browsers caching across different versions) | ||
RUN find /app/ -type f -print0 | sort -z | xargs -0 sha1sum | sha1sum | cut -d' ' -f1 | tee /app/stan-wasm-server/hash-salt.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to sort them before hashing? Isn't the goal just for it to be different from last time (vs being reproducible between builds)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it doesn't matter much either way, but if no files changed we actually wouldn't mind a user's local cache being warm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I suppose redeploying the server isn't exactly something we do so often that it's worth worrying about a spurious sort.
@lru_cache | ||
def _get_salt() -> bytes: | ||
""" | ||
Returns a salt to use when hashing Stan programs if one was set during the Docker build. | ||
This helps ensure that models don't get cached by the browser across different deployments. | ||
""" | ||
salt_file = Path(__file__).parent.parent.parent.parent / "hash-salt.txt" | ||
if not salt_file.exists(): | ||
logger.warning("No hash salt file found at %s", salt_file) | ||
salt = b"" | ||
else: | ||
salt = salt_file.read_bytes().strip() | ||
logger.info("Using hash salt from %s: %s", salt_file, repr(salt)) | ||
return salt | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I be seeing the side effect of this function every time I compile a new (cache miss) main.stan
? I would've thought these would be cache hits, but maybe I need a refresher on our threading model here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's not even that predictable when the function reruns and when it doesn't.
I don't think this much matters, but I like to note everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should see it once per thread, so up to 4 times, and then no more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any rhyme or reason to when a new thread is spawned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, that would be a uvicorn implementation detail. It seems like it’s roughly round-robin
@jsoules brought up an issue I had also seen in #274, which I think is caused by the changes #257 made which, among other things, mean the URL for a given model was stable across backend updates/restarts.
I think this is actually a bad thing, since it can mean a compilation result is served from the user's local cache rather than the server. Even if this isn't the source of the weird loading behavior we've both seen, it certain would cause issues if we needed to change something about the returned model's API that needed a simultaneous backend and frontend update.
This PR does the simplest fix I could think of, which is the docker build now computes a hash of all of the application files, and this hash is used to salt the model output folder generation. This means in a given build, all the workers will still compute the same hashes, but changing any file in our code or in the tinystan or tbb dependencies will lead to new URLs for the same models.