Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first commit #30

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ PouchContainer 底层对接的是 Containerd v1.0.3 ,对比 Moby,在容器
## Upgrade 功能具体实现
### Upgrade API 定义

首先说明一下 `upgrade` API 入口层定义,用于定义升级操作可以对容器的哪些参数进行修改。如下 `ContainerUpgradeConfig` 的定义,容器升级操作可以对容器 `ContainerConfig` 和 `HostConfig` 都可以进行操作,如果在 PouchContainer github 代码仓库的 `apis/types` 目录下参看这两个参数的定义,可以发现实际上,`upgrade` 操作可以修改旧容器的__所有__相关配置
首先说明一下 `upgrade` API 入口层定义,用于定义升级操作可以对容器的哪些参数进行修改。如下 `ContainerUpgradeConfig` 的定义,容器升级操作可以对容器 `ContainerConfig` 和 `HostConfig` 都可以进行操作,如果在 PouchContainer github 代码仓库的 `apis/types` 目录下参看这两个参数的定义,可以发现实际上,`upgrade` 操作可以修改旧容器的 __所有__ 相关配置
```go
// ContainerUpgradeConfig ContainerUpgradeConfig is used for API "POST /containers/upgrade".
// It wraps all kinds of config used in container upgrade.
Expand Down Expand Up @@ -96,7 +96,7 @@ defer func() {
}()
```

在升级过程中,如果出现异常情况,会将新创建的 Snapshot 等相关资源进行清理操作,在回滚阶段,只需要恢复旧容器的配置,然后用恢复后的配置文件启动一个新容器既可
在升级过程中,如果出现异常情况,会将新创建的 Snapshot 等相关资源进行清理操作,在回滚阶段,只需要恢复旧容器的配置,然后用恢复后的配置文件启动一个新容器即可

### Upgrade 功能演示

Expand Down
23 changes: 23 additions & 0 deletions blog-en/Design and implementation of PouchContainer CRI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## 2.CRI Design Overview

![cri-1.png | left | 827x299](https://cdn.yuque.com/lark/0/2018/png/103564/1527478355304-a20865ae-81b8-4f13-910d-39c9db4c72e2.png "")

As shown in the figure above, the Kubelet on the left is the node agent of the Kubernetes cluster, which monitors the state of the containers on this node to ensure that they all run as expected. In order to achieve this goal, Kubelet will continuously call the relevant CRI interface to synchronize the container.

CRI shim can be regarded as an interface translation layer which can convert CRI interface to the interface of corresponding underlying container at runtime and call this interface, as well as get the response. CRI shim exists as a unique process for some running containers. For instance, when Docker is selected as the container for Kubernetes, Kubelet will start with a Docker shim process, which is Docker's CRI shime. For PouchContainer, its CRI shim is embedded in Pouchd, which is called CRI manager. In this regard, we will give more details when talking about PouchContainer related architecture in the next section.

CRI is essentially a set of gRPC interfaces. Kubelet has a built-in gRPC Client, and CRI shim has a built-in gRPC Server. Each time the call from Kubelet to the CRI interface will be converted to a gRPC request sent by the gRPC Client to the gRPC Server in the CRI shim. Server calls the underlying container to process the request and return the result, thus completing a CRI interface call.

The CRP-defined gRPC interface can be divided into two categories, ImageService and RuntimeService: where ImageService is responsible for managing the container's image, while RuntimeService is responsible for managing the container's lifecycle and interacting with the container (exec/attach/port-forward).

## 3.CRI Manager architecture design

![yzz's pic.jpg | left | 827x512](https://cdn.yuque.com/lark/0/2018/jpeg/95844/1527582870490-a9b9591d-d529-4b7d-bc5f-69514ef115e7.jpeg "")

In the entire architecture of PouchContainer, CRI Manager implements all the interfaces defined by CRI and plays the role of CRI shim in PouchContainer. When Kubelet calls a CRI interface, the request is sent to the gRPC Server in the above figure via the Kublet's gRPC Client. Server will parse the request and call the corresponding method of CRI Manager to process.

Let's take a look at an example to get a brief look at the founctionality of each module. For example, when the arrived request is to create a Pod, the CRI Manager will first convert the acquired CRI format configuration into a format that meets the requirements of the PouchContainer interface, call Image Manager to pull the required image, and then call Container Manager to create the required container, and call CNI Manager, using the CNI plugin to configure the Pod network. Finally, Stream Server handles interactive type CRI requests, such as exec/attach/portforward.

It is worth noting that CNI Manager and Stream Server are sub-modules of CRI Manager, and CRI Manager, Container Manager and Image Manager are three equal modules, all located in the same binary file Pouchd, so the calls between them are the most straightforward function call, and there is no remote call overhead required like when Docker shim interacts with Docker. Next, we will enter the CRI Manager to gain a deeper understanding of the implementation of important functions.


Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
## Background
"Rich container" mode is mainly applied in Alibaba, which is based on the traditional virtual machine operation and maintenance mode. A certain number of such containers are still stateful. It is frequent to update the stateful service in daily development. In terms of image-based container technology, there are two steps for updating services: the remove of the old image, and the create of the new image. Moreover, in order to update the stateful services, all of the resources including network, storage, etc., sould be inherited. The following two business cases can demonstarete the requirement of rich container in the scenario of business release.

* Case 1: In the case of a database business, the data from remote server would be downloaded to the local machine as the initial data at the first time the container service was created. Due to the long time that the process of initiating database takes, the new container should inherit data from the old one in the process of service upgrade that may exist in the future to decrease the waiting time of releasing business.
* Case 2: In the case of a middleware service, The business has "Service Registeration" mode, which means all containers that extends their volumns should be registered to the list of servers. Otherwise, these new containers cannot work. Therefore, every time the containers get updated, the new containers should inherit the IP adress from the old containers, or the released services cannot work.

Nowaday, Moby is popular as the engine of the container. However, there is no even one API can be used to update the containers. Even though updating containers could be done by calling multiple APIs, this would definitely increases the number of requests to extra APIs, like the API to add or delete containers and the API to reserve IP address. Moreover, this could inrease the risk of failure of the upgrade.

Based on the above background, PouchContainer provides an API `upgrade` on the engine level to realize the in-place updating functionality. It is more convinient and efficient to realize the updating functionality on the engine level due to the less API requests.

## Implementation of functionality upgrade
### Introduction to the underlying storage of containers
PouchContainer relies on Containerd v1.0.3, which is different from Moby in terms of storage architecture. Therefore, it is necessary to introduce the storage archtecture in PouchContainer before introducing its mechanism of in-place upgrading functionality.

![image.png | center | 600x336.3525091799266](https://cdn.yuque.com/lark/0/2018/png/95961/1527735535637-5afc58e6-31ef-400c-984c-a9d7158fd40d.png "")

Compared with the storage architecture in Moby, the main unique features of PouchContainer are:
* There is no concept of GraphDriver or Layer in PouchContainer, on the contrast, PouchContainer introduces Snapshotter and Snapshot, which embraces the architecture design of containerd of project CNCF. Snapshooter can be regarded as the storage driver, like overlay, devicemapper, btrfs, etc. Snapshot is the snapshot of the image, which has two types: the read-only version which is the read-only data in every layer of the container; the read-write version which is the read-write layer of the container that saves all the incremental data of the container.
* The container and the unit data of the image in Containerd are stored in boltdb, which brings an advantage that initiating contianer and image data can be done by initiating boltdb instead of reading host file directory information.

## The requirement of functionality upgrade
At the early stage of the design of any system or functionality, the thorough research should be done to discover the pain points of users that we need to solve by the designed system or functionality. Three requirements were concluded after the research about the business scenarios where the in-place upgrade of the container can be used during the daily development in Alibaba, which are:
* Data Coherency
* Flexibility
* Rubustness

Data Coherency refers to certain data remaining unchanged before and after the execution of `upgrade`:
* Network: the configuration of the network remains the same before and after the upgrade;
* Storage: new container should inherit all the volume from the old container;
* Config: new container should inherit some config information from the old container, e.g., `Env`, `Labels`, etc.

Felexibility means that the new config is able to be introduced while the `upgrade` is operated on old container:
* The information of new container like cpu and memory can be modified;
* With regard to new image, except that the `Entrypoint` of the old container can be inherited, the new `Entrypoint` can be supported as well;
* New volume can be added into the container, and new image could include the information of the new volume. When creating a new container, this part of the volume information should be parsed and a new volume is created.

Rubustness means that rollback strategy is supported during the in-place upgrade of the container to deal with the exceptions. If the upgrade was failed, the container would be rollbacked to the old version.

## Detailed implementation of upgrade
### Definition of upgrade API
First off, the definition of the entry layer of the upgrade API defines which parameters of the container can be modified by the upgrade operation. As shown in the following ContainerUpgradeConfig, the container upgrade operation can operate on both ContainerConfig and HostConfig. If you refer to the definition of these two parameters in the apis/types directory of the PouchContainer github repository, you can find that the upgrade operation can modify __All__ related configurations of the old container.
```go
// ContainerUpgradeConfig ContainerUpgradeConfig is used for API "POST /containers/upgrade".
// It wraps all kinds of config used in container upgrade.
// It can be used to encode client params in client and unmarshal request body in daemon side.
//
// swagger:model ContainerUpgradeConfig

type ContainerUpgradeConfig struct {
ContainerConfig

// host config
HostConfig *HostConfig `json:"HostConfig,omitempty"`
}
```

## Detailed workflow of upgrade
The `upgrade` operation of the container is actually deleting the old container and creating a new one using the new image with the same configuration of the network and initial volume. The detailed workflow of `upgrade` is divided into the following steps:
* First off, all the operations of the initial container should be copied, which would be used to rollback in case of failed upgrade;
* Updating configuration of new container. The new configuration in the request parameter would be merged with the configuration of the old container, and the new configuration would work.
* Parameter `Entrypoint` of image would be processed specially: if parameter `Entrypoint` was assgined in the new request parameter, this `Entrypoint` would be used; otherwise `Entrypoint` of the old container would be checked. If it was assigned from configuration instead of image, `Entrypoint` of the old container would be used as `Entrypoint` of the new container. If none of these conditions was matched, `Entrypoint` from new image should be used as `Entrypoint` of the new container. The reason why such logic was invoked to process `Entrypoint` is to keep its continuity;
* Then, the status of the container would be checked. If the status was "running", the container would be stopped; after that, a new `Snapshot` was created based on the new image and was used as the read-write layer of the new container;
* The status of the old container before upgrading would be checked again after the new Snapshot was created. If it was "running", the new container should be started, otherwise nothing needs to be done;
* Finally, the old Snapshot was deleted, and the latest configuration was saved.

## The rollback of upgrade
Some exceptions could appear when using `upgrade`, and the current upgrade strategy is to rollback to the old container status in the case of exceptions. Here we need to first define the situation of the upgrade failure:
* Failed to create a new resource for the new container, the rollback operation need to be performed: the rollback operation will be performed when the new Snapshot, Volumes, etc. resources are created for the new container;
* The rollback operation will be performed if the system errors happen when starting a new container: the rollback operation will be performed if failed to create the new container when calling containerd API. The rollback operation will not be performed if the container is exited caused by the abnormal program inside the container, even if the response of API is success. A basic operation of the rollback operation is given as follows:
```go
defer func() {
if !needRollback {
return
}

// rollback to old container.
c.meta = &backupContainerMeta

// create a new containerd container.
if err := mgr.createContainerdContainer(ctx, c); err != nil {
logrus.Errorf("failed to rollback upgrade action: %s", err.Error())
if err := mgr.markStoppedAndRelease(c, nil); err != nil {
logrus.Errorf("failed to mark container %s stop status: %s", c.ID(), err.Error()
}
}
}()
```

During the upgrade process, the newly created Snapshot and other related resources will be cleaned up if an abnormal situation occurs. In the rollback phase, only the configuration of the old container needs to be restored, and then a new container can be started by using the restored configuration file.

## Demo of upgrade
* Using `ubuntu` image to create a new container
```bash
$ pouch run --name test -d -t registry.hub.docker.com/library/ubuntu:14.04 top
43b75002b9a20264907441e0fe7d66030fb9acedaa9aa0fef839ccab1f9b7a8f

$ pouch ps
Name ID Status Created Image Runtime
test 43b750 Up 3 seconds 3 seconds ago registry.hub.docker.com/library/ubuntu:14.04 runc
```
* Upgrading the image of container `test` to `busybox`:
```bash
$ pouch upgrade --name test registry.hub.docker.com/library/busybox:latest top
test
$ pouch ps
Name ID Status Created Image Runtime
test 43b750 Up 3 seconds 34 seconds ago registry.hub.docker.com/library/busybox:latest runc
```

As shown in the above demo, the image of the container is directly replaced with the new image through the `upgrade` interface, and the other configurations are unchanged.

## Conclude
In the enterprise production environment, the `upgrade` operation of the container is also a frequent operation similar to the container expansion and shrinking operations. However, neither the current Moby community nor the Containerd community has an API similar to `upgrade` that PouchContainer has. PouchContainer is the first one to implement this functionality, which solves a painful problem of container technology about updating and releasing stateful services in the enterprise environment. Currently, PouchContainer is also trying to maintain close contact with its downstream dependent component services such as Containerd, so the `upgrade` functionality will be fed back to the Containerd community to increase the functionality richness of Containerd.

---
origin doc: https://github.com/pouchcontainer/blog/blob/master/blog-cn/%E6%B7%B1%E5%BA%A6%E8%A7%A3%E6%9E%90%20PouchContainer%20%E7%9A%84%E5%AF%8C%E5%AE%B9%E5%99%A8%E6%8A%80%E6%9C%AF.md