Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a salt to the model output hash #275

Merged
merged 1 commit into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,13 @@ RUN cd tinystan && \
emmake make test_models/bernoulli/bernoulli.js -j$(nproc) && \
emstrip test_models/bernoulli/bernoulli.wasm

COPY stan-wasm-server /stan-wasm-server
WORKDIR /stan-wasm-server
COPY stan-wasm-server /app/stan-wasm-server

# compute a hash of our app to make the model cache use unique URLS
# (to avoid browsers caching across different versions)
RUN find /app/ -type f -print0 | sort -z | xargs -0 sha1sum | sha1sum | cut -d' ' -f1 | tee /app/stan-wasm-server/hash-salt.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to sort them before hashing? Isn't the goal just for it to be different from last time (vs being reproducible between builds)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it doesn't matter much either way, but if no files changed we actually wouldn't mind a user's local cache being warm

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I suppose redeploying the server isn't exactly something we do so often that it's worth worrying about a spurious sort.


WORKDIR /app/stan-wasm-server

ENV SWS_PASSCODE=1234
ENV SWS_LOG_LEVEL=debug
Expand Down
22 changes: 20 additions & 2 deletions backend/stan-wasm-server/src/app/logic/compilation.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import asyncio
import logging
import time
from functools import lru_cache
from hashlib import sha1
from pathlib import Path
from shutil import copy2
Expand All @@ -18,10 +19,27 @@
logger = logging.getLogger(__name__)


@lru_cache
def _get_salt() -> bytes:
"""
Returns a salt to use when hashing Stan programs if one was set during the Docker build.
This helps ensure that models don't get cached by the browser across different deployments.
"""
salt_file = Path(__file__).parent.parent.parent.parent / "hash-salt.txt"
if not salt_file.exists():
logger.warning("No hash salt file found at %s", salt_file)
salt = b""
else:
salt = salt_file.read_bytes().strip()
logger.info("Using hash salt from %s: %s", salt_file, repr(salt))
return salt

Comment on lines +22 to +36
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I be seeing the side effect of this function every time I compile a new (cache miss) main.stan? I would've thought these would be cache hits, but maybe I need a refresher on our threading model here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's not even that predictable when the function reruns and when it doesn't.

I don't think this much matters, but I like to note everything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should see it once per thread, so up to 4 times, and then no more

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any rhyme or reason to when a new thread is spawned?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, that would be a uvicorn implementation detail. It seems like it’s roughly round-robin


def _compute_stan_program_hash(program_file: Path) -> str:
stan_program = program_file.read_text()
# MAYBE: replace stan_program with a canonical form?
return sha1(stan_program.encode()).hexdigest()
hasher = sha1(_get_salt())
hasher.update(stan_program.encode())
return hasher.hexdigest()


def make_canonical_model_dir(src_file: Path, built_model_dir: Path) -> Path:
Expand Down
26 changes: 18 additions & 8 deletions backend/stan-wasm-server/src/app/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,18 +138,28 @@ async def compile_stan(
return {"model_id": model_dir.name}


def send_interrupt() -> None:
"""
Send an interrupt signal to the parent process.
uvicorn interprets this like Ctrl-C, and gracefully shuts down.
The orchestrator then restarts the server.
"""
import os
import signal

os.kill(os.getppid(), signal.SIGINT)


@app.post("/restart")
async def restart(
settings: DependsOnSettings, authorization: str = Header(None)
) -> None:
settings: DependsOnSettings,
background_tasks: BackgroundTasks,
authorization: str = Header(None),
) -> DictResponse:
if settings.restart_token is None:
raise StanPlaygroundAuthenticationException("Restart token not set at startup")
check_authorization(authorization, settings.restart_token)

import os
import signal
background_tasks.add_task(send_interrupt)

# send an interrupt signal to the parent process
# uvicorn interprets this like Ctrl-C, and gracefully shuts down
os.kill(os.getppid(), signal.SIGINT)
# actual restart is handled by the orchestrator
return {"status": "restarting"}
Loading