feat: Support sync code with local folders #102

VoVAllen · 2022-12-06T06:16:02Z

Description

Currently, envd-server will clone the repo from the info of image label. We should also support sync with user's local folder

Current logic at https://github.com/tensorchord/envd-server/blob/main/pkg/server/environment_create.go#L175-L201

Reference:

https://www.okteto.com/docs/reference/file-synchronization/
ksync
Syncthing

Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

VoVAllen · 2022-12-06T06:17:50Z

Other discussions: tensorchord/envd#530

AlexXi19 · 2022-12-21T00:54:40Z

Implementation notes (Ignore)

Envd needs to:
- Install syncthing (bin) locally
- Communicate envd-server endpoint to set up and start the sync
Envd server needs to:
- Have endpoints to interact with syncthing
- Perform configurations (connect local directory to container directory)

TODO:

Determine which configurations needs to happen and where it's happening (setting up connections, match devices etc.)

Syncthing

Rest API
Kubernetes

Ksync Implementation

ksync architecture

Oketeto Implementation

Interacts with syncthing via binary cli
Interacts with syncthing via syncthing api

Questions

Will multiple users connect to the same container? (i.e. will it always be a 1-1 sync and not 1-many)
- This should be ok because syncthing supports multi-device sync, just need to figure out how to manage device IDs.

Nonessential Features

Progress bar
- Okteto implementation: repeatedly ping the syncthing server for progress and show in terminal ui.

AlexXi19 · 2022-12-24T22:09:19Z

Proof of Concept

For a proof of concept, I added a syncthing container to environment_create.go here, and manually configured the file sync connection between the source folder and the folder in the docker container using the GUI.

Demo

In the demo, I'm manually clicking the sync button, but we can adjust the syncthing sync interval.

Screen.Recording.2022-12-24.at.1.48.27.PM.mov

Next steps

Functionality

Local setup

Install syncthing binary on user's computer
Run the syncthing binary on the user's computer

Implement the manual sync steps with code: (I don't know how to do this yet)

Syncthing uses xml to set up configurations (link)

Working with DeviceIDs (use syncthing REST API like (this)[https://docs.syncthing.net/rest/system-status-get.html]
Host sends "add device" request to container and container accepts request
Host creates sync folder and shares the sync with the container

Other features

Sync Logging
Error handling with sync (might be tricky)
Edge cases
- What happens with the sync for github repos? Which do we prioritize?
Configurations
- Syncthing sync interval
- public/private keys
- https certificates

Questions

When should I install syncthing? On boostrap? On run?
How to link the source directory with the target directory? I think when we send the request to the server from run, we can send the local path with the request to know which directory to sync.
About device discovery and connection
- Need to learn more about the syncthing discovery server to better decide what's the best choice for connecting devices. Syncthing also offers a global discovery service, we should be careful to NOT use that.
- Ksync uses arbitrary deviceIDs and uses the API to modify and update the configurations
- For device discovery, it opens a tunnel between the two devices and only allows discovery through the tunnel, it has a service that gets device IDs and connects them.
- Oketeto uses deterministic deviceIDs in configurations
  - For service discovery, it allows discovery through TBD i dont know yet, need to look more into it but there should be some kind of connection maybe through port forwarding the discovery service to allow devices to find each other, need to find how this is done.

AlexXi19 · 2022-12-24T22:09:35Z

@VoVAllen

VoVAllen · 2022-12-25T13:43:45Z

When should I install syncthing? On boostrap? On run?

I think we can add this to envd attach now. When user attaches to the envs, it will do ssh + port-forwarding + file sync.

Local setup

We can setup a synching docker at user's side also as the client. It makes the binary delivery easier.

How to link the source directory with the target directory? I think when we send the request to the server from run, we can send the local path with the request to know which directory to sync.

You can assume the working directory as the source directory now. We may provide more detailed configuration later.

About device discovery and connection

We can generate a random ID directly. And use ssh port-forwarding to connect to the pod's syncthing ports.
We don't need service discovery at all I think. Since the target and source is deterministic here

gaocegege · 2022-12-26T01:29:21Z

When should I install syncthing? On boostrap? On run?

If you mean the binary install, bootstrap will be better. Or we need to add complex logic in attach

gaocegege · 2022-12-26T01:31:24Z

And I am not sure if we should use synching or https://github.com/rclone/rclone

XML config looks weird to me.

AlexXi19 · 2022-12-26T01:36:45Z

And I am not sure if we should use synching or https://github.com/rclone/rclone

XML config looks weird to me.

You don't really have to work with the xml other than writing up the default. If you want to make changes to the configuration, you can also use the go struct (here)[https://pkg.go.dev/github.com/syncthing/[email protected]/lib/config#Configuration] so you don't have to work with the xml directly. Ksync and okteto both use syncthing but i can look into rclone a bit more.

AlexXi19 · 2022-12-29T08:07:54Z

Design Document

Envd-server file sync functionality

Description

When a pod/environment is provisioned after envd up --image <image-name>, the user's project directory and it's files are synced into the container in the development pod. When the files are modified either locally or within the container, the changes are synced to keep project files consistent.

Functional Requirements

Syncs file between project directory and remote container

Non-functional Requirements

TBD (Sync interval, latency requirements, security etc.)

Implementation

Syncing

The core sync functionality is implemented by using syncthing, which can sync files between two devices.

Syncthing on Local

The syncthing binary is downloaded based on the user's os and architecture, it is installed on envd bootstrap and executed on envd-up. Before the syncthing is executed, we write a config.xml file to the syncthing home directory so that syncthing can read in the configuration on startup. When the syncthing binary is executed, we configure the home (config) directory to be in .config/envd/syncthing as to not interfere with the user's own syncthing configurations. After the binary is executed, a local instance of syncthing starts running and we can start connecting it to the remote instance.

Syncthing on Kubernetes

For syncthing in the kubernetes pod, we use an image of syncthing here to start it up as a container. In terms of the starting configuration, we send in the config.xml file via kubernetes configmaps. However, there is a caveat that configmaps are read only but we can bypass this by mounting the configmap to a temporary directory and using a container lifecycle event on container start to copy the file into the correct directory, which for this syncthing image is /config.

Working with Syncthing

In order to make changes or get information from the syncthing application (to add devices, add folders, check on status, etc.), we use the syncthing rest api. To communicate with the syncthing instance on kubernetes, the appropriate port needs to be forwarded.

The two syncthing instances also need to be discoverable by each other. This can be done through ssh tunnel port forwarding. (I've only tested discovery with the kubernetes cluster and local syncthing instance on the same network/computer so I'm not sure if it'll be different if the kubernetes cluster is on another instance)

Waiting for Events

Since most interactions with syncthing are asynchronous, we need to wait for operations to complete before proceeding. As a few examples, when the binary is executed, you need to wait for the syncthing application to start up to start calling the rest api, when configurations are applied via the rest api, syncthing returns a response immediately and you have to wait for the changes to actually be applied. Some other example asynchronous operations are when files are being scanned or folders are being synced.

Therefore, for asynchronous operations that need to be awaited, there are Wait functions that queries the syncthing rest api on the status of the operation.

Design Choices

Syncthing configurations

For the other kubernetes file sync implementations that I referenced (okteto, ksync) both use xml files to initialize the configuration. However, I chose to not work with xml files to make code more readable, maintainable, consistent and to also keep the configurations closer to the application code. I chose to move the complexity from the build code to the application code (config logic with xml files via Dockerfile vs. with structs in go).

Connecting Two Devices

Syncthing's deviceIDs are generated deterministically from the priv/pub keys. For okteto's implementation, the deviceIDs and priv/pub keys are hard coded. However, for my implementation, I let syncthing autogenerate the deviceID and priv/pub keys and use the API to query the deviceID, and configure the file sync. Hopefully this will prove to be more flexible and extendable in the future.

gaocegege · 2022-12-29T10:46:25Z

Will there be a syncthing process in the local host?

AlexXi19 · 2022-12-29T18:43:19Z

Will there be a syncthing process in the local host?

Yes. The process is started with cmd := exec.Command(GetSyncthingBinPath(), "-no-browser", "-no-restart", "-home", s.HomeDirectory) and the cmd object is kept in memory in the Syncthing struct so we can use it to manage the process later.

gaocegege · 2022-12-30T01:25:41Z

Cool. Then when will the process terminate? I think we can terminate it when users stop the ssh connection.

kemingy · 2022-12-30T02:54:58Z

For local development, I'm not sure if it's better to use mount instead of sync.

Some corner cases:

file size limitation (users may accidentally add a model file to the folder)
follow the soft link or not?

AlexXi19 · 2022-12-30T05:05:59Z

Cool. Then when will the process terminate? I think we can terminate it when users stop the ssh connection.

Sure, or also when the environment is destroyed. I'm almost done with functionalities but haven't actually put the code in the cli commands yet. We can discuss more details after the core sync functionalities are finished!

AlexXi19 · 2022-12-30T05:07:36Z

For local development, I'm not sure if it's better to use mount instead of sync.

Some corner cases:

file size limitation (users may accidentally add a model file to the folder)

follow the soft link or not?

There can be an option to configure ignore files/folder but the case when you accidentally drop a large file into the folder can definitely be problematic. Potential solutions could be halt if a large file is detected, or ignore large files. What are you suggesting with mount?

kemingy · 2022-12-31T02:04:27Z

What are you suggesting with mount?

By default, we will mount the current working directory. I think it's not necessary to sync the files in this dir.

AlexXi19 self-assigned this Dec 6, 2022

This was referenced Dec 29, 2022

File sync for envd-server tensorchord/envd#1352

Closed

File sync for envd-server #159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support sync code with local folders #102

feat: Support sync code with local folders #102

VoVAllen commented Dec 6, 2022

VoVAllen commented Dec 6, 2022

AlexXi19 commented Dec 21, 2022 •

edited

Loading

AlexXi19 commented Dec 24, 2022 •

edited

Loading

AlexXi19 commented Dec 24, 2022

VoVAllen commented Dec 25, 2022 •

edited

Loading

gaocegege commented Dec 26, 2022

gaocegege commented Dec 26, 2022

AlexXi19 commented Dec 26, 2022

AlexXi19 commented Dec 29, 2022 •

edited

Loading

gaocegege commented Dec 29, 2022

AlexXi19 commented Dec 29, 2022

gaocegege commented Dec 30, 2022

kemingy commented Dec 30, 2022

AlexXi19 commented Dec 30, 2022 •

edited

Loading

AlexXi19 commented Dec 30, 2022 •

edited

Loading

kemingy commented Dec 31, 2022

feat: Support sync code with local folders #102

feat: Support sync code with local folders #102

Comments

VoVAllen commented Dec 6, 2022

Description

VoVAllen commented Dec 6, 2022

AlexXi19 commented Dec 21, 2022 • edited Loading

Implementation notes (Ignore)

Syncthing

Ksync Implementation

Oketeto Implementation

Questions

Nonessential Features

AlexXi19 commented Dec 24, 2022 • edited Loading

Proof of Concept

Demo

Next steps

Functionality

Local setup

Implement the manual sync steps with code: (I don't know how to do this yet)

Other features

Questions

AlexXi19 commented Dec 24, 2022

VoVAllen commented Dec 25, 2022 • edited Loading

gaocegege commented Dec 26, 2022

gaocegege commented Dec 26, 2022

AlexXi19 commented Dec 26, 2022

AlexXi19 commented Dec 29, 2022 • edited Loading

Design Document

Description

Functional Requirements

Non-functional Requirements

Implementation

Syncing

Syncthing on Local

Syncthing on Kubernetes

Working with Syncthing

Waiting for Events

Design Choices

Syncthing configurations

Connecting Two Devices

gaocegege commented Dec 29, 2022

AlexXi19 commented Dec 29, 2022

gaocegege commented Dec 30, 2022

kemingy commented Dec 30, 2022

AlexXi19 commented Dec 30, 2022 • edited Loading

AlexXi19 commented Dec 30, 2022 • edited Loading

kemingy commented Dec 31, 2022

AlexXi19 commented Dec 21, 2022 •

edited

Loading

AlexXi19 commented Dec 24, 2022 •

edited

Loading

VoVAllen commented Dec 25, 2022 •

edited

Loading

AlexXi19 commented Dec 29, 2022 •

edited

Loading

AlexXi19 commented Dec 30, 2022 •

edited

Loading

AlexXi19 commented Dec 30, 2022 •

edited

Loading