-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Production runtime should use a VM for isolation/security #742
Comments
I don't think this is relevant now we have moved to Kube as the primary target. |
It is still relevant because the runner and the user code are seperate security domains, and the runner is a policy enforcement point. The user code should not have access to anything but the runner "proxy" ports. |
In that case from a kube PoV you probably want the enforcement part of the runner in one container, and the actual user runtime in a different container inside the pod. AFAIK you can't really do this sort of isolation if they are all inside the same container. |
That could be a good first step, and maybe sufficient long term, but we should run that past @AlexSzlavik. From everything I've read, containers are not a reliable security boundary. But perhaps now that we're all-in on Kubernetes we can combine this with other Kubernetes security approaches, like routing policies, and coupled with our own policy enforcement that might be fine. |
Chatted to Alex about this earlier today and we think the Runner could be a sidecar, with the user code proxying everything through it. Presumably the user container can be locked down such that it can't access anything except for the Runner. One issue is that because we currently route everything through the Controller, the Runner needs to be able to differentiate between traffic originating from the user module and all other traffic in order to avoid routing loops. There's code in place to do that, but it's likely bitrotted, and also requires changes to each runtime, so the JVM runtime probably doesn't support this currently. Needs testing. |
One other thing that occurred to me is that we'll need to split the "runner" into two containers - the ftl-runner itself and the image that user code runs on (ie. what is currently the ftl-runner image). |
I feel like, running VMs in place of containers directly would be a challenge in K8s. Some googling around seemed to indicate that it's doable, but would definitely be a specialized deployment strategy. I'd be concerned that this approach would get in the way of adoption of FTL. If we aren't doing this, a sidecar model (a la envoy) makes sense to me. I guess this means that we'd have to split the runner image into 2 right? The "edge" runner sidecar and the main "workload" runner. The former is responsible for interfacing with the cluster while the latter is responsible for launching the user code. That component would probably also act as a bridge to the "edge" runner. The main reason to seperate these, is that we want isolation of user code from ftl cluster internal components. We wouldn't want user code to be able to assume the capabilities of an FTL component. Have we considered what a future FTL deployment, in a common production grade K8s deployment might look like? If the state of the art involves istio or other additional components, should we design for them now? Or at least make sure that we can interoperate with them? |
All the approaches out there are fairly immature, and definitly specialized. I evaluated this a couple of years ago and ended up needed to write my own VM provisioner to support multi platform builds on kube rather than using existing systems. Things may have gotten better since then, but it is still not something that we could require.
This is doable. We want to avoid image building by the user so it is slightly tricky, but it is doable. If we are going to require an OCI registry for artifacts anyway once possibility is to have the controller generate the image (not via a docker build, directly through the OCI registry). Another possibility is to have a shared volume between the sidecar and the runner container, and transfer the user code over the shared volume.
We definitely will need istio, we should be thinking about this. |
Looping in @tlongwell-block for his thoughts. |
Runners currently execute user code directly on the same host that they run on. In k8s this is not terrible, but ideally FTL would execute user code inside a VM to completely isolate it. This would also allow us to restrict inbound/outbound network, and so on.
Useful references:
The text was updated successfully, but these errors were encountered: