-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache YouTube video titles in the DB #1093
Conversation
1d2075d
to
518ac76
Compare
Testing this a a way to test the boilerplate in other PRs, doing:
|
via/models/_mixins.py
Outdated
class CreatedUpdatedMixin(MappedAsDataclass): | ||
created: Mapped[datetime] = mapped_column( | ||
init=False, | ||
server_default=func.now(), # pylint:disable=not-callable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was holding the SQLA alchemy PR over LMS because this is annoying (not-callable)
I guess I'll have to bite the bullet and spirnkle a few of these there
hypothesis/lms#5283
0b3210c
to
35c1239
Compare
Context ------- At startup (when you run `make dev`) Via's Gunicorn processes attempt to create the DB tables if they don't already exist. This is done in dev only, not in production: # via/db.py def includeme(config): ... if config.registry.settings.get("dev"): # Create the database tables if they don't already exist. Base.metadata.create_all(engine) Problem ------- Via actually has four Gunicorn workers in dev (see the `workers = 4` in `conf/gunicorn/dev.conf.py`). All 4 try to create the DB tables at the same time and 3 of them crash, printing out error messages. The issue is kind of difficult to produce: 1. You need to actually have a model, so you won't be able to produce the issue on the `db` branch without changes. See #1072 and #1093 for branches on which you can reproduce the issue. 2. The model's table needs to actually be missing from the DB table. You can do that by running this command: make services args=down && make services && make dev 3. Since the issue is a race condition you won't be able to reproduce it reliably, but if you run the above command several times you should see it. Solution -------- This commit moves the `sqlalchemy.create_all()` call out of the Gunicorn worker process and instead adds a separate `make_db` process to the list of processes that `supervisord-dev.conf` runs when you run `make dev`. This new `make_db` process starts up, calls `create_all()`, and then exits, and since it's a singular process there's no collision.
Context ------- At startup (when you run `make dev`) Via's Gunicorn processes attempt to create the DB tables if they don't already exist. This is done in dev only, not in production: # via/db.py def includeme(config): ... if config.registry.settings.get("dev"): # Create the database tables if they don't already exist. Base.metadata.create_all(engine) Problem ------- Via actually has four Gunicorn workers in dev (see the `workers = 4` in `conf/gunicorn/dev.conf.py`). All 4 try to create the DB tables at the same time and 3 of them crash, printing out error messages. The issue is kind of difficult to produce: 1. You need to actually have a model, so you won't be able to produce the issue on the `db` branch without changes. See #1072 and #1093 for branches on which you can reproduce the issue. 2. The model's table needs to actually be missing from the DB table. You can do that by running this command: make services args=down && make services && make dev 3. Since the issue is a race condition you won't be able to reproduce it reliably, but if you run the above command several times you should see it. Solution -------- This commit moves the `sqlalchemy.create_all()` call out of the Gunicorn worker process and instead adds a separate `make_db` process to the list of processes that `supervisord-dev.conf` runs when you run `make dev`. This new `make_db` process starts up, calls `create_all()`, and then exits, and since it's a singular process there's no collision.
Context ------- At startup (when you run `make dev`) Via's Gunicorn processes attempt to create the DB tables if they don't already exist. This is done in dev only, not in production: # via/db.py def includeme(config): ... if config.registry.settings.get("dev"): # Create the database tables if they don't already exist. Base.metadata.create_all(engine) Problem ------- Via actually has four Gunicorn workers in dev (see the `workers = 4` in `conf/gunicorn/dev.conf.py`). All 4 try to create the DB tables at the same time and 3 of them crash, printing out error messages. The issue is kind of difficult to produce: 1. You need to actually have a model, so you won't be able to produce the issue on the `db` branch without changes. See #1072 and #1093 for branches on which you can reproduce the issue. 2. The model's table needs to actually be missing from the DB table. You can do that by running this command: make services args=down && make services && make dev 3. Since the issue is a race condition you won't be able to reproduce it reliably, but if you run the above command several times you should see it. Solution -------- This commit moves the `sqlalchemy.create_all()` call out of the Gunicorn worker process and instead adds a separate `make_db` process to the list of processes that `supervisord-dev.conf` runs when you run `make dev`. This new `make_db` process starts up, calls `create_all()`, and then exits, and since it's a singular process there's no collision.
1ec445e
to
7f96bf1
Compare
Context ------- At startup (when you run `make dev`) Via's Gunicorn processes attempt to create the DB tables if they don't already exist. This is done in dev only, not in production: # via/db.py def includeme(config): ... if config.registry.settings.get("dev"): # Create the database tables if they don't already exist. Base.metadata.create_all(engine) Problem ------- Via actually has four Gunicorn workers in dev (see the `workers = 4` in `conf/gunicorn/dev.conf.py`). All 4 try to create the DB tables at the same time and 3 of them crash, printing out error messages. The issue is kind of difficult to produce: 1. You need to actually have a model, so you won't be able to produce the issue on the `db` branch without changes. See #1072 and #1093 for branches on which you can reproduce the issue. 2. The model's table needs to actually be missing from the DB table. You can do that by running this command: make services args=down && make services && make dev 3. Since the issue is a race condition you won't be able to reproduce it reliably, but if you run the above command several times you should see it. Solution -------- This commit moves the `sqlalchemy.create_all()` call out of the Gunicorn worker process and instead adds a separate `make_db` process to the list of processes that `supervisord-dev.conf` runs when you run `make dev`. This new `make_db` process starts up, calls `create_all()`, and then exits, and since it's a singular process there's no collision.
Context ------- At startup (when you run `make dev`) Via's Gunicorn processes attempt to create the DB tables if they don't already exist. This is done in dev only, not in production: # via/db.py def includeme(config): ... if config.registry.settings.get("dev"): # Create the database tables if they don't already exist. Base.metadata.create_all(engine) Problem ------- Via actually has four Gunicorn workers in dev (see the `workers = 4` in `conf/gunicorn/dev.conf.py`). All 4 try to create the DB tables at the same time and 3 of them crash, printing out error messages. The issue is kind of difficult to reproduce: 1. You need to actually have a model, so you won't be able to produce the issue on the `db` branch without changes. See #1072 and #1093 for branches on which you can reproduce the issue. 2. The model's table needs to actually be missing from the DB table. You can do that by running this command: make services args=down && make services && make dev 3. Since the issue is a race condition you won't be able to reproduce it reliably, but if you run the above command several times you should see it. Solution -------- This commit moves the `sqlalchemy.create_all()` call out of the Gunicorn worker process and instead adds a separate `make_db` process to the list of processes that `supervisord-dev.conf` runs when you run `make dev`. This new `make_db` process starts up, calls `create_all()`, and then exits, and since it's a singular process there's no collision.
Context ------- At startup (when you run `make dev`) Via's Gunicorn processes attempt to create the DB tables if they don't already exist. This is done in dev only, not in production: # via/db.py def includeme(config): ... if config.registry.settings.get("dev"): # Create the database tables if they don't already exist. Base.metadata.create_all(engine) Problem ------- Via actually has four Gunicorn workers in dev (see the `workers = 4` in `conf/gunicorn/dev.conf.py`). All 4 try to create the DB tables at the same time and 3 of them crash, printing out error messages. The issue is kind of difficult to reproduce: 1. You need to actually have a model, so you won't be able to produce the issue on the `db` branch without changes. See #1072 and #1093 for branches on which you can reproduce the issue. 2. The model's table needs to actually be missing from the DB table. You can do that by running this command: make services args=down && make services && make dev 3. Since the issue is a race condition you won't be able to reproduce it reliably, but if you run the above command several times you should see it. Solution -------- This commit moves the `sqlalchemy.create_all()` call out of the Gunicorn worker process and instead adds a separate `make_db` process to the list of processes that `supervisord-dev.conf` runs when you run `make dev`. This new `make_db` process starts up, calls `create_all()`, and then exits, and since it's a singular process there's no collision.
Context ------- At startup (when you run `make dev`) Via's Gunicorn processes attempt to create the DB tables if they don't already exist. This is done in dev only, not in production: # via/db.py def includeme(config): ... if config.registry.settings.get("dev"): # Create the database tables if they don't already exist. Base.metadata.create_all(engine) Problem ------- Via actually has four Gunicorn workers in dev (see the `workers = 4` in `conf/gunicorn/dev.conf.py`). All 4 try to create the DB tables at the same time and 3 of them crash, printing out error messages. The issue is kind of difficult to reproduce: 1. You need to actually have a model, so you won't be able to produce the issue on the `db` branch without changes. See #1072 and #1093 for branches on which you can reproduce the issue. 2. The model's table needs to actually be missing from the DB table. You can do that by running this command: make services args=down && make services && make dev 3. Since the issue is a race condition you won't be able to reproduce it reliably, but if you run the above command several times you should see it. Solution -------- This commit moves the `sqlalchemy.create_all()` call out of the Gunicorn worker process and instead adds a separate `make_db` process to the list of processes that `supervisord-dev.conf` runs when you run `make dev`. This new `make_db` process starts up, calls `create_all()`, and then exits, and since it's a singular process there's no collision.
Context ------- At startup (when you run `make dev`) Via's Gunicorn processes attempt to create the DB tables if they don't already exist. This is done in dev only, not in production: # via/db.py def includeme(config): ... if config.registry.settings.get("dev"): # Create the database tables if they don't already exist. Base.metadata.create_all(engine) Problem ------- Via actually has four Gunicorn workers in dev (see the `workers = 4` in `conf/gunicorn/dev.conf.py`). All 4 try to create the DB tables at the same time and 3 of them crash, printing out error messages. The issue is kind of difficult to reproduce: 1. You need to actually have a model, so you won't be able to produce the issue on the `db` branch without changes. See #1072 and #1093 for branches on which you can reproduce the issue. 2. The model's table needs to actually be missing from the DB table. You can do that by running this command: make services args=down && make services && make dev 3. Since the issue is a race condition you won't be able to reproduce it reliably, but if you run the above command several times you should see it. Solution -------- This commit moves the `sqlalchemy.create_all()` call out of the Gunicorn worker process and instead adds a separate `make_db` process to the list of processes that `supervisord-dev.conf` runs when you run `make dev`. This new `make_db` process starts up, calls `create_all()`, and then exits, and since it's a singular process there's no collision.
60b0edb
to
964fb9f
Compare
964fb9f
to
64983da
Compare
via/models/video.py
Outdated
class Video(CreatedUpdatedMixin, Base): | ||
__tablename__ = "video" | ||
|
||
id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True, init=False) | ||
video_id: Mapped[str] = mapped_column(unique=True) | ||
title: Mapped[str] = mapped_column(repr=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we've done with transcripts (over in #1072) I've decided not to namespace videos by type/source yet (e.g. by adding a type="youtube"
column). We can add this later if and when we ever add support for a second type of video
64983da
to
deae077
Compare
1a75004
to
d99233a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍
Added the do-not-merge
tag here just as remainder to deploy and run the migration first
via/services/youtube.py
Outdated
# https://developers.google.com/youtube/v3/docs/videos/list | ||
video = self._db.scalar(select(Video).where(Video.video_id == video_id)) | ||
|
||
if video: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess is now idiomatic python to use if video :=
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice! Done
class Video(AutoincrementingIntegerIDMixin, CreatedUpdatedMixin, Base): | ||
__tablename__ = "video" | ||
|
||
video_id: Mapped[str] = mapped_column(unique=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we already have transcripts in the DB it's probably too complicated to be worth it but we could have a FK in transcripts pointing to this table.
We can join now by video_id if we need to, it has an index in both tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about this but I didn't really think it was worth the complexity. We don't need an fkey in order to look up a Video
given a Transcript
, as you said. So I'm not sure an fkey would really buy us anything. In fact, I think we might actually not want to enforce that every Video
in the DB must have a matching Transcript
or vice-versa, the order of operations (API calls etc) may call for the Video
alone to first be saved and then the transcript not downloaded until a later request or vice-versa. I suppose this does mean that any code that has a Video
cannot assume that a matching Transcript
exists or vice-versa, which may lead to a little more code complexity at some point.
d99233a
to
f4a6310
Compare
f4a6310
to
3231d37
Compare
Fixes #1118.
Testing
Test
models.Video
Test
VideoFactory
Test the web UI
http://localhost:9083/https://www.youtube.com/watch?v=byN4S8WDPt8 (the video title is used for the tab's title and for the document title if you create an annotation)