-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
concourse-worker fails to connect to concourse-web #1
Comments
Try to manually run it as ssh into the container by "juju ssh --container concourse-worker concourse-worker/0", run with the command: set -x export CONCOURSE_BAGGAGECLAIM_DRIVER=overlay /usr/local/concourse/bin/concourse worker I didn't saw error on connect to TSA, however local gdn that should listen to port 7777 not run. Per check the log, I think it's because: |
Did more test and here is the theory and verification:
Given so, here is what I did:
2021-07-08T09:47:21.189Z [concourse-worker] {"timestamp":"2021-07-08T09:47:21.189242695Z","level":"error","source":"worker","message":"worker.beacon-runner.beacon.forward-conn.failed-to-dial","data":{"addr":"127.0.0.1:7788","error":"dial tcp 127.0.0.1:7788: connect: connection refused","network":"tcp","session":"4.1.5"}} but not exit like previously as it can't connect to web/tsa via port 2222. I think the direction is:
|
I've made a few small updates to the charm, to create an k8s service for the concourse-web application on port 2222. There's still a race condition that needs fixing where as noted above the worker won't keep retrying to connect to web via port 2222, so if the relation is established too early this will still fail to connect to that port. However, if after everything is up and confirming the concourse-web unit is responding on port 2222, you then add a new unit to concourse-worker (or likely if you wait to do this before relating concourse-worker to concourse-web) I get the following errors:
So, it's failing to authenticate, but is at least able to connect. This needs further investigation. The cgroup problem is still there, and you're correct that's because the worker pod isn't privileged. I'm not actually sure how to do that in k8s charms (yet), will need to look into that. Having said that, it seems like something that would be good to avoid if possible - per https://kubernetes.io/docs/concepts/policy/pod-security-policy/#privileged "By default a container is not allowed to access any devices on the host, but a "privileged" container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host. This is useful for containers that want to use linux capabilities like manipulating the network stack and accessing devices.". It would be good to understand why concourse needs this, as it might restrict the places we're prepared to run this. |
This can be done via pebble. I've put together a PR can you take a look? mthaddon/concourse-web-operator#1
|
Per my understanding, running concourse ci worker inside k8s or docker will be a container in a container, and that needs privileged mode. Per Workers Architecture in [1], "Workers are machines running Garden and Baggageclaim servers and registering themselves via the TSA." And Garden is "a platform-agnostic Go API for container creation and management". Ref: [1] https://concourse-ci.org/internals.html Per my test using plain old yaml to try running concourse ci on k8s, it does need privileged mode in worker. $ grep -r privileged . |
I test the PR and the result is positive to restart the web as the worker public key changed. |
Not sure if concourse-remote-worker help[1]. I think we might need to define remote Concourse worker as a K8s cluster. The reason behind it, in my naive opinion, is because a Concourse worker is a collection of docker containers when we run it locally.[2] A K8s Pod is basically mapping to a Docker container (or at most 2 containers in that single Pod). When we deploy Concourse-worker we actually deploy it into one Pod. This will lead us trying to create lots of containers in that Pod. I also agree with @ycheng that we are going to create containers inside that container (K8s Pod.) [1] https://tanzu.vmware.com/developer/guides/concourse-remote-workers/ |
To investigate the issue a bit further, I have tried deploying concourse-CI on microk8s following this guild [1]. When running a pipeline I do hit some issue of cgroup. On the other hand, I have also tried the deployment on GKE[2], which works well. My best guess is that microk8s is using K8S node based on LxD container, where GKE create node based on VM. The difference then would be Type 2 V.S. Type 1 Hypervisor. The charms should be work, under my guess, when deploying to K8S with VM level virtualization. To sum up, maybe we can try deploying charms (Concourse-web and Concourse-worker) on Kubernetes with VM level of virtualization. Do we have any of that environment for handy for testing? Or is it possible that we configure mirok8s to based on LxD vm instead? [1] https://tanzu.vmware.com/developer/guides/concourse-gs/ |
Juju deploy Concourse-web/worker [1][2] on GKE configured using this doc.[3]
http://concourse-web not working [1] https://charmhub.io/concourse-web |
Currently there's an issue with concourse-worker connecting to concourse-web as follows:
I believe this is simply a case of figuring out how to expose port 2222 on the concourse-worker instance, but haven't yet had a chance to look into this further.
The text was updated successfully, but these errors were encountered: