Add a salt to the model output hash #275

WardBrian · 2025-02-12T17:08:39Z

@jsoules brought up an issue I had also seen in #274, which I think is caused by the changes #257 made which, among other things, mean the URL for a given model was stable across backend updates/restarts.

I think this is actually a bad thing, since it can mean a compilation result is served from the user's local cache rather than the server. Even if this isn't the source of the weird loading behavior we've both seen, it certain would cause issues if we needed to change something about the returned model's API that needed a simultaneous backend and frontend update.

This PR does the simplest fix I could think of, which is the docker build now computes a hash of all of the application files, and this hash is used to salt the model output folder generation. This means in a given build, all the workers will still compute the same hashes, but changing any file in our code or in the tinystan or tbb dependencies will lead to new URLs for the same models.

WardBrian · 2025-02-12T17:10:03Z

Things to verify as the reviewer:

Running locally, repeated compilations are still hitting the server cache (i.e., each worker is still computing the same hash for the same model)
After changing a file and re-building, the returned URL is different for a given model

jsoules

Hash seems consistent between different threads when recompiling, and I see no evidence of any compilation issues. Salt changed when I changed a file in the wasm-server directory, and reverted when I re-checked out the earlier version.

I think we're probably good to go here, though there is some sort of squirrelly unpredictability with when the hash-salt-lookup function gets called and when it doesn't.

jsoules · 2025-02-12T20:24:00Z

backend/Dockerfile

+
+# compute a hash of our app to make the model cache use unique URLS
+# (to avoid browsers caching across different versions)
+RUN find /app/ -type f -print0 | sort -z | xargs -0 sha1sum | sha1sum | cut -d' ' -f1 | tee /app/stan-wasm-server/hash-salt.txt


Any particular reason to sort them before hashing? Isn't the goal just for it to be different from last time (vs being reproducible between builds)?

I suppose it doesn't matter much either way, but if no files changed we actually wouldn't mind a user's local cache being warm

Well, I suppose redeploying the server isn't exactly something we do so often that it's worth worrying about a spurious sort.

jsoules · 2025-02-12T21:02:07Z

backend/stan-wasm-server/src/app/logic/compilation.py

+@lru_cache
+def _get_salt() -> bytes:
+    """
+    Returns a salt to use when hashing Stan programs if one was set during the Docker build.
+    This helps ensure that models don't get cached by the browser across different deployments.
+    """
+    salt_file = Path(__file__).parent.parent.parent.parent / "hash-salt.txt"
+    if not salt_file.exists():
+        logger.warning("No hash salt file found at %s", salt_file)
+        salt = b""
+    else:
+        salt = salt_file.read_bytes().strip()
+        logger.info("Using hash salt from %s: %s", salt_file, repr(salt))
+    return salt
+


Should I be seeing the side effect of this function every time I compile a new (cache miss) main.stan? I would've thought these would be cache hits, but maybe I need a refresher on our threading model here.

Actually, it's not even that predictable when the function reruns and when it doesn't.

I don't think this much matters, but I like to note everything.

You should see it once per thread, so up to 4 times, and then no more

Is there any rhyme or reason to when a new thread is spawned?

Not sure, that would be a uvicorn implementation detail. It seems like it’s roughly round-robin

Add a salt to the model output hash

ec7de38

WardBrian added the backend label Feb 12, 2025

WardBrian requested a review from jsoules February 12, 2025 17:08

jsoules approved these changes Feb 12, 2025

View reviewed changes

WardBrian merged commit 9a23544 into main Feb 12, 2025
2 checks passed

WardBrian deleted the salt-model-output-hash branch February 12, 2025 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a salt to the model output hash #275

Add a salt to the model output hash #275

WardBrian commented Feb 12, 2025

WardBrian commented Feb 12, 2025

jsoules left a comment

jsoules Feb 12, 2025

WardBrian Feb 12, 2025

jsoules Feb 12, 2025

jsoules Feb 12, 2025

jsoules Feb 12, 2025

WardBrian Feb 12, 2025

jsoules Feb 12, 2025

WardBrian Feb 12, 2025

Add a salt to the model output hash #275

Add a salt to the model output hash #275

Conversation

WardBrian commented Feb 12, 2025

WardBrian commented Feb 12, 2025

jsoules left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment