Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflows for local development using containerised YNR #2485

Merged
merged 33 commits into from
Jan 16, 2025

Conversation

jpluscplusm
Copy link
Collaborator

@jpluscplusm jpluscplusm commented Dec 30, 2024

This adds workflows, scripts, docs and in-repo infra for YNR to be invoked inside a container during local development. Non-local environments are not impacted by this change.

The branch being PR'd contains ~30 commits that generally build on top of each other in narrative fashion. These are presented as-is, as they contain information that's relevant about the development of the local-dev setup that results. They can, of course, be squashed if needed, but that loss of information should be an explicit project choice.


CI failure on tip of master previously blocking this PR

Edit: whilst this failure was previously reproducible, it has now fixed itself in CircleCI when the same commit is exercised (see the two runs visible at https://app.circleci.com/pipelines/github/DemocracyClub/yournextrepresentative/3940). I'll leave this section here for posterity, and in case the problem reoccurs. The first and last commits mentioned here have been removed.

Original: The first and last commits of this PR's stack relate to a single Python test that only started to fail after the stack was rebased on top of the recent #2481. The failure is present on the commit at the tip of the master branch, suggesting the failure is unrelated to this stack's changes. This pair of commits should be removed from this PR, pre-merge, after the root cause of the issue is addressed.


This stack is considered complete, but the PR is WIP. Changes may be needed to address various items in the PR checklist.

To review

  • Check you're happy with the rendered changes to README.md
  • Check out the PR's branch and ...
    • Follow and validate the instructions in docs/INSTALL.md
    • Follow and validate as many of the workflows in docs/DEVELOPMENT.md as you feel appropriate
    • Use the workflows in docs/DEVELOPMENT.md to test the process of making a "normal" code change to YNR.
  • Review container/build/Containerfile
  • Review scripts/*
  • Review docker-compose.yml
  • Review the changes to these pre-existing files:
    • requirements/base.txt
    • requirements/sopn_parsing.txt
    • ynr/apps/sopn_parsing/README.md

PR Checklist

  • Update changelog (ie https://keepachangelog.com/en/1.0.0/)
  • References a specific issue or if not, describes the bug or feature in detail
  • Note what card in Trello this work relates to
  • Instructions for how reviewers can test the code locally
  • Tests have been added and/or updated
  • Screenshot of the feature/bug fix (if applicable)
  • If any new text is added, it's internationalized
  • Any new elements have aria labels
  • No unintentional console.logs left behind after debugging
  • Did I use the clear and concise names for variables and functions?
  • Did I explain all possible solutions and why I chose the one I did?
  • Added any comments to make new functions clearer
  • Did I added or updated any new dependencies? Explain why.
  • Did I update the package.json and/or Pipfile.lock version if relevant?
  • Have I rebased with the latest version of master?
  • Added PR labels
  • Update any history/changelog file
  • Update any documentation
  • Update dev handbook

@jpluscplusm jpluscplusm requested a review from symroe December 30, 2024 11:36
Copy link
Member

@chris48s chris48s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then. Lets start off with the stuff that is just broken. I don't think these things necessarily are blockers before Sym can a go at bootstrapping the project, but @symroe - if you want to jump in and do it now, these are the things that I found were just broken and here are the workarounds/fixes. Hopefully that will stop you from wasting time hitting the same rough edges as I did.

docker-compose.yml Show resolved Hide resolved
docker-compose.yml Show resolved Hide resolved
container/build/Containerfile Show resolved Hide resolved
Copy link
Member

@chris48s chris48s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium priority

There's a bag of related issues around how we configure the application. I'm going to structure this as one "review" containing a series of inline comments, but I think its all kind of related. This might be one where its useful for us to collectively discuss the approach to application configuration (again) to hammer this out and work out wehat next. Maybe we can compare notes when we've both spun the app up locally.

This stuff does need sorting, but I don't think its a pre-requisite before Sym has a go at spinning up the application and the next actions are perhaps not 100% defined yet.

docs/INSTALL.md Outdated Show resolved Hide resolved
container/build/Containerfile Outdated Show resolved Hide resolved
ynr/settings/testing.py Outdated Show resolved Hide resolved
ynr/settings/local.py.container.example Outdated Show resolved Hide resolved
Copy link
Member

@chris48s chris48s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lower priority things

docs/DEVELOPMENT.md Outdated Show resolved Hide resolved
.circleci/config.yml Outdated Show resolved Hide resolved
container/build/Containerfile Outdated Show resolved Hide resolved
docker-compose.yml Show resolved Hide resolved
docs/INSTALL.md Outdated Show resolved Hide resolved
container/build/system-packages Show resolved Hide resolved
docs/DEVELOPMENT.md Outdated Show resolved Hide resolved
container/build/Containerfile Outdated Show resolved Hide resolved
@chris48s
Copy link
Member

chris48s commented Jan 9, 2025

Sorry. Another one that I didn't catch yesterday but I think this one matters.

Every time I load a page, I get

Internal Server Error: /ajax/ballots/ballots_for_select.json
Traceback (most recent call last):
  File "/dc/ynr/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/dc/ynr/venv/lib/python3.8/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/sentry_sdk/integrations/django/views.py", line 94, in sentry_wrapped_callback
    return callback(request, *args, **kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/django/views/generic/base.py", line 104, in view
    return self.dispatch(request, *args, **kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/django/views/generic/base.py", line 143, in dispatch
    return handler(request, *args, **kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/django/utils/decorators.py", line 46, in _wrapper
    return bound_method(*args, **kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/django/utils/decorators.py", line 126, in _wrapper_view
    result = middleware.process_request(request)
  File "/dc/ynr/venv/lib/python3.8/site-packages/django/middleware/cache.py", line 158, in process_request
    cache_key = get_cache_key(request, self.key_prefix, "GET", cache=self.cache)
  File "/dc/ynr/venv/lib/python3.8/site-packages/django/utils/cache.py", line 391, in get_cache_key
    headerlist = cache.get(cache_key)
  File "/dc/ynr/venv/lib/python3.8/site-packages/debug_toolbar/panels/cache.py", line 42, in wrapper
    return panel._record_call(cache, name, original_method, args, kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/debug_toolbar/panels/cache.py", line 149, in _record_call
    value = original_method(*args, **kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/django/core/cache/backends/memcached.py", line 75, in get
    return self._cache.get(key, default)
  File "/dc/ynr/venv/lib/python3.8/site-packages/pymemcache/client/hash.py", line 347, in get
    return self._run_cmd("get", key, default, default=default, **kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/pymemcache/client/hash.py", line 322, in _run_cmd
    return self._safely_run_func(client, func, default_val, *args, **kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/pymemcache/client/hash.py", line 211, in _safely_run_func
    result = func(*args, **kwargs)
  File "/dc/ynr/venv/lib/python3.8/site-packages/pymemcache/client/base.py", line 687, in get
    return self._fetch_cmd(b"get", [key], False, key_prefix=self.key_prefix).get(
  File "/dc/ynr/venv/lib/python3.8/site-packages/pymemcache/client/base.py", line 1133, in _fetch_cmd
    self._connect()
  File "/dc/ynr/venv/lib/python3.8/site-packages/pymemcache/client/base.py", line 424, in _connect
    sock.connect(sockaddr)
ConnectionRefusedError: [Errno 111] Connection refused

in the console. This is using ynr.settings.base for my settings. I think we are basically failing on any operation that tries to use the cache. You will not have hit this using ynr.settings.testing because the test settings just mocks out the cache.

We could get round it for now by just using a mock or local filesystem cache in local dev, but then that's a difference between local and prod. Any bug relating to cache won't reproduce locally. This brings a question about how we're going to approach cache. I guess in our current setup, we just run memcached on the VM (and in local dev) but in a docker-based setup memcached probably needs to become a separate service (ElastiCache?). Ultimately I'd quite like our dev setup to mirror what we're going to do in prod, but also maybe that is something we come to when we look at deployment rather than while we're still doing docker in local dev but VMs in prod.

This might be another one to discuss first.

@chris48s
Copy link
Member

chris48s commented Jan 9, 2025

Having pulled one worm out of the can, as a follow up to that: @symroe - are you aware of anything else that we're running as a service on the VM at the moment aside from memcached?

This adds a Containerfile (a technology-agnostic Dockerfile) that can
successfully install the app's dependencies into a container image. The
image can be built with the following command (note the trailing build
context of "."):

    podman build .

The image's contents are controlled via an allowlist in
.containerignore, which is expected to change significantly in later
commits as more of this repo is used inside the container.

The image is not yet set up to be invoked.
This adds a postgresql compose manifest as a demonstration that
podman-compose can successfully instatiate the service, via:

    podman compose -f container/run/compose.service.postgres.yml up

This is not yet used in any context.
This changes the approach to building the app image by using an Ubuntu
LTS release as the base, and starting by installing the current set of
packages explicitly referenced in deploy/vars.yml (modulo some outdated
entries such as redis-server).

This change is based on discussions with DC personnel with knowledge of
the current setup, and influenced by the likely path we'll take to
adopting a container-based production setup in the future.

The image that's produced is not small, as it contains many, many system
packages that are pulled in as dependencies of the package set we
explicitly specify. The resulting large package set does, however,
permit the app's base Python requirements to be installed successfully,
and this can act as a baseline for future size/efficiency improvements.
This adds a "container" workflow to CircleCI. The workflow is
unoptimised, and initially serves only to test the buildability of the
container image.

The .containerignore file is not respected by CircleCI's Docker
builders, so it is symlinked into place as .dockerignore.
This adds "python manage.py check" to the end of the container image
build process. Whilst this isn't strictly necessary it serves two
useful purposes by:

- demonstrating that the container *can* run the check; and
- pre-generating many .pyc files and including them in the image for
  faster app initialisation.

The Containerfile is also tweaked to:

- read the list of system packages from an external file;
- install the app's python requirements before copying its code, so the
  DX happy path doesn't incur the pip-install cost when only modifying
  code;
- install the app's python requirements based on the sopn_parsing.txt
  file as this transitively includes "pandas", without which the check
  command fails.
This adds a simplistic mechanism to invalidate CircleCI's Docker Layer
Cache (DLC), and force an image rebuild from scratch.

Locally, "podman build --no-cache ." should be preferred instead.
Invoke a cut-down version of the current CI test command inside the
built container.
This adds some system and pip dependencies that are currently encoded in
the CI config. The system dependencies are copied into the container's
package file, but a manually fetched system dependency (pandoc) is
translated into its convenience pip package equivalent.

The change to pandoc's installation mechanism requires the use of an
alternative version of the pypandoc library that also provides the
underlying pandoc binary. This requires us to use a later version than
the currently-used pypandoc version, as the currently-used version
doesn't have a binary-including-package counterpart. We bump to the
latest pip package version.
This adds a docker-compose.yml file which, along with the included changes,
permits a developer to run a hot-reloading instance of the app.

To make this work, it:
- invokes `npm install` early in the image build process
- adds gunicorn to requirements/base.txt (as there aren't any environments
  where this is optional)
- makes some (hopefully harmless!) changes to ynr/settings/testing.py that
  allow the app to start successfully

This change also:
- adds a convenience script at scripts/container.image.build.bash, intended for
  use both locally and in CI
- increases the commenting/structure in the Containerfile

The compose file must be at the repo root (and not inside the defunct
container/run/ directory) because relative bind mount sources are resolved
relative to the compose file's location, and we need to mount the ynr/
directory from inside the repo root. This Docker (hence also Podman) feature
even goes so far (or appears to!) as to resolve the location reading through a
symlink, so we can't simply link a compose file sitting at the repo root to a
file in a subdirectory.

The developer currently needs to manually execute some asset-related commands
before starting the compose stack, in order to make app render correctly. These
commands place content into the ynr/ directory, which is bind-mounted from the
developer's machine. They will be automated in a later change:

    podman compose run --rm --no-deps -e DJANGO_SETTINGS_MODULE=ynr.settings.testing frontend npm run build
    podman compose run --rm --no-deps -e DJANGO_SETTINGS_MODULE=ynr.settings.testing frontend python manage.py collectstatic --no-input
This cleans the NPM cache after installation, and also uses the more
deterministic "npm ci" command instead of "npm install".

Also:
- help future beginners by defining SOPN in the top-level README
- reduce top-level repo clutter by renaming .containerignore to .dockerignore.
This adds a convenience shim for invoking "python manage.py" inside a freshly
instantiated container, connected to the same DB and bind mounts that the
composed "frontend" container would be able to access:

    $ ./scripts/container.compose.manage-py.bash check 2>/dev/null
    WARNING: no local settings file found. See local.py.example
    System check identified no issues (0 silenced).

Invocations are currently noisy (hence the 2> redirection); this will be
improved later.
This expands the set of file and directory paths which are bind mounted into
the frontend container when using compose. The set is aligned with the paths
included in the built container image via the .dockerignore file's contents.
This permits the container image required by the local compose stack to be
built directly using the compose subcommand:

    podman compose build
or
    podman compose build frontend

A test is added to CI that asserts compose is superficially happy. We don't
(yet) use compose to run any containers in CI, so this step is necessary
trivial.

Also: fix a podman/docker inconsistency where podman permitted compose "ports"
to be a scalar value, but docker requires a list of values.
This allows a developer to place a file at the gitignored path env/frontend.env
containing VAR=val key/value pairs which the frontend compose container will
read on startup and instantiate as environment variables.

This mechanism has been chosen as it affects all frontend container
invocations, including those that don't start Django as a web server. This
consistency will reduce unexpected drift between the primary frontend container
started by a developer and any secondary containers.

There are complications that currently make it fiddly to get this file *also*
bind-mounted as the .env file that ynr/wsgi.py loads. For the moment, this
means that environment variable changes require a restart of the frontend
container.
This adds a script that shims "podman compose exec", and changes the renamed
shim script that invokes Django management commands to use it.

It also removes a couple of envvars from the compose file which are more
properly situated in the gitignored, per-developer env/frontend.env file.
/ynr/media/ doesn't need to be included inside the frontend container image.
YNR's production RDS runs v16.4. Both the local containerised setup and the
in-CI DB should mirror this.
This adds a script that executes pytest inside a running frontend container,
passing any parameters through to pytest.
Docker compose requires an explicit healthcheck, whereas podman appears to
infer a healthy container when an exposed port is listening.
This reverts commit 4af80b5.

This was added as docker required it, so that we could test using
docker-compose in CI. This has proven impractical, given the warnings at
https://circleci.com/docs/docker-compose/#using-docker-compose-with-docker-executor:

    The remote docker daemon runs on a different system than the docker CLI and
    docker compose, so you must move data around to make this work. Mounting
    can usually be solved by making content available in a docker volume.

In order to make this work, we'd have to vary the compose file so that in local
dev content was bind-mounted, whereas it would be made available in a volume in
CI. This would negate the point of running the compose stack in CI: to prove
that local-dev workflows are happy.

Because podman appears to have a problem with this setup on certain machines:

    ERRO[0000] Failed to start transient timer unit: Unit
    cdb85bbc93bfd59cc38703842f96c12d0cced997a7ee582b2954a6bb87905804.timer was
    already loaded or has a fragment file.

... our easiest route forward is to revert this change until we genuinely need
to solve docker's healthcheck problem.
This adds scripts/container.run.bash, which runs a command in a fresh frontend
container.

Also: fix trailing whitespace
This allows a developer who invokes scripts/container.image.build.bash to
specify arbitrary parameters for the underlying builder.

e.g. "--no-cache", which is a better way of avoiding using the local build
cache than bumping the CI-focussed build ARG at the top of
container/build/Containerfile.
This adds some docs to help a developer get started modifying and running YNR
inside a container.

Also: structure README.md's headings more hierarchically, leaving only a single
H1 on the page; some light rewording in README.md to improve its flow, along
with a short new section that points a member of the public towards
whocanivotefor.co.uk if they happen to come across this repo.
@jpluscplusm jpluscplusm force-pushed the jcm/add-local-dev-container-workflows branch from a169175 to b2b4d5c Compare January 9, 2025 10:51
Initial development of containerised YNR used podman-compose v1.0.6, but that
version is incompatible with the current schema used by the docker-compose.yml
file. v1.2.0 is noted as being specifically required as the very recent v1.3.0
has not yet been properly tested with this branch.
@symroe
Copy link
Member

symroe commented Jan 9, 2025

On cache: I use the dummy cache locally, but I suspect the easiest thing to do here is to switch to DatabaseCache. We don't really need a dedicated caching service for this app.

Some DB migrations require files that exist in the data directory. This change
ensures that it exists in the container image by adding it to the .dockerignore
include list, and also bind mounts the path so that developers can update its
files without rebuilding the image.
@chris48s
Copy link
Member

chris48s commented Jan 9, 2025

Just a note on ways of working:

With a big PR like this where there will be multiple rounds of comments and changes, I prefer it if we don't rebase and force-push the branch mid-review.

The reason for this is: If we push additional commits to address review comments, it is easy for the reviewer to come back to a PR and quickly see what has changed since they last looked at it. The reviewer can easily tell from GitHub's UI that they've reviewed everything up to commit a169175 (for example) so everything after that is new since the last round of comments. If the entire history has been re-written since they last viewed the PR and new changes are squashed into commits further back in history, it is much harder to tell at a glance "what changed since I last looked at this?". The reviewer basically has to come back to the thing from first principles.

It doesn't matter that we've done a rebase and force-push already. Lets just play the ball where it landed from this point, but for subsequent changes, can we just push more commits to the branch and then squash things to tidy up as a final step once we're ready to merge.

Thanks

@jpluscplusm
Copy link
Collaborator Author

With a big PR like this where there will be multiple rounds of comments and changes, I prefer it if we don't rebase and force-push the branch mid-review.

Understood. The rebase was solely to remove the first-and-last commits' mitigations for a failing test that magically fixed itself on tip of master. I'm already adding commits as requested and won't force push any more rebases.

Some editors complain if a Containerfile doesn't use uppercase instructions.
This tweaks some docs following PR feedback.
@jpluscplusm
Copy link
Collaborator Author

On cache: I use the dummy cache locally, but I suspect the easiest thing to do here is to switch to DatabaseCache. We don't really need a dedicated caching service for this app.

Whilst I can configure the webapp to use DatabaseCache, I feel that changing this setting (specifically where it's configured, and how we nudge the developer to alter it if required) should come after we've had the broader settings-related discussion that @chris48s suggested above. Thoughts?

This alters the Containerfile's setting of NODE_ENV so it's less susceptible to
future line-order changes having unintended side effects.
@chris48s
Copy link
Member

Whilst I can configure the webapp to use DatabaseCache, I feel that changing this setting (specifically where it's configured, and how we nudge the developer to alter it if required) should come after we've had the broader settings-related discussion

Yeah. Agreed 👍 Migrating to the DB cache engine should be another PR for another day. For the moment, we'll use dummy cache for local dev. I've captured this in the notes doc from this morning too.

This reverts all changes to ynr.settings.testing, and defaults the
containerised YNR app to use the base ynr.settings module throughout the build
process and the compose invocation. The pytest wrapper script is allowed to use
pytest's default settings already configured via pyproject.toml.

This change is generally implemented by removing the explicit use of
ynr.settings.testing across various scenarios, and opting in to manage.py's
default of ynr.settings (which inherits ynr.settings.base).

Also: small tweaks to docs.

Signed-off-by: Jonathan Matthews <[email protected]>
@jpluscplusm
Copy link
Collaborator Author

we'll use dummy cache for local dev

Resolved by 196d5f3. The dummy cache is specified in local.py.container.example.

@jpluscplusm jpluscplusm marked this pull request as ready for review January 15, 2025 09:06
@chris48s
Copy link
Member

chris48s commented Jan 15, 2025

Thanks. I've had a look at the latest changes and given it another spin locally.

I think from my point of view everything is either addressed, or we've made the conscious decision to deal with it as a follow up PR and noted it somewhere else.

So, 👍 from me. There's a couple of comment threads still open where it would be useful to get a final check from Sym before we merge.

Copy link
Member

@symroe symroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've run through the docs from start to finish on a computer that I didn't use previously (to isolate any problems that might happen when setting it up in on a system that had tested this half way through the PR).

Everything worked as expected 🎉

So: from the point of view of "can I install the app" I think this is good to go. This isn't the same as saying that all the code is perfect, but I don't have anything to add to that that we've not talked about and planned already.

👍

This removes mention of an alternative method for initially populating the DB,
via the API, as the fact that it doesn't currently work is a known issue.
@jpluscplusm
Copy link
Collaborator Author

@symroe I'm not sure what the DC etiquette is for merging PRs, but if it would be my responsibility then I'll defer to yourself for this one. I'll let you choose a time when you'd be able to respond to any production deployment problems that might result from these changes. As discussed, we've carefully isolated this PR's scope so it shouldn't affect production; but this feels like the appropriate handling for this change.

@symroe
Copy link
Member

symroe commented Jan 16, 2025

what the DC etiquette is for merging PRs

The general rule is "you break it, you fix it" (e.g merge your own PRs and own that through to production). Given that you don't have prod access, it's reasonable that we don't do that now. I'll merge it now, as I currently have time to fix any problems.

@symroe symroe merged commit 839744a into master Jan 16, 2025
5 of 6 checks passed
@jpluscplusm jpluscplusm deleted the jcm/add-local-dev-container-workflows branch January 16, 2025 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants