Skip to content

Commit

Permalink
Update KubeFATE version
Browse files Browse the repository at this point in the history
Signed-off-by: Chenlong Ma <[email protected]>
  • Loading branch information
owlet42 committed Nov 24, 2023
1 parent 9a9b93f commit 10f8456
Show file tree
Hide file tree
Showing 33 changed files with 476 additions and 319 deletions.
2 changes: 1 addition & 1 deletion docker-deploy/.env
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Nginx_IMAGE_TAG="v2.0.0-beta"
RabbitMQ_IMAGE="federatedai/rabbitmq"
RabbitMQ_IMAGE_TAG="3.8.3-management"
Pulsar_IMAGE="federatedai/pulsar"
Pulsar_IMAGE_TAG="2.10.1"
Pulsar_IMAGE_TAG="2.7.0"
Hadoop_NameNode_IMAGE="federatedai/hadoop-namenode"
Hadoop_NameNode_IMAGE_TAG="2.0.0-hadoop3.2.1-java8"
Hadoop_DataNode_IMAGE="federatedai/hadoop-datanode"
Expand Down
12 changes: 6 additions & 6 deletions docker-deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,13 +192,13 @@ The output is shown as follows. If the status of each component is `Up`, and the

```bash
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
confs-10000-client-1 federatedai/client:1.11.2-release "bash -c 'pipeline i…" client About a minute ago Up About a minute 0.0.0.0:20000->20000/tcp, :::20000->20000/tcp
confs-10000-clustermanager-1 federatedai/eggroll:1.11.2-release "/tini -- bash -c 'j…" clustermanager About a minute ago Up About a minute 4670/tcp
confs-10000-fateboard-1 federatedai/fateboard:1.11.2-release "/bin/sh -c 'java -D…" fateboard About a minute ago Up About a minute 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
confs-10000-fateflow-1 federatedai/fateflow:1.11.2-release "/bin/bash -c 'set -…" fateflow About a minute ago Up About a minute (healthy) 0.0.0.0:9360->9360/tcp, :::9360->9360/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp
confs-10000-client-1 federatedai/client:v2.0.0-beta "bash -c 'pipeline i…" client About a minute ago Up About a minute 0.0.0.0:20000->20000/tcp, :::20000->20000/tcp
confs-10000-clustermanager-1 federatedai/eggroll:v2.0.0-beta "/tini -- bash -c 'j…" clustermanager About a minute ago Up About a minute 4670/tcp
confs-10000-fateboard-1 federatedai/fateboard:v2.0.0-beta "/bin/sh -c 'java -D…" fateboard About a minute ago Up About a minute 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
confs-10000-fateflow-1 federatedai/fateflow:v2.0.0-beta "/bin/bash -c 'set -…" fateflow About a minute ago Up About a minute (healthy) 0.0.0.0:9360->9360/tcp, :::9360->9360/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp
confs-10000-mysql-1 mysql:8.0.28 "docker-entrypoint.s…" mysql About a minute ago Up About a minute 3306/tcp, 33060/tcp
confs-10000-nodemanager-1 federatedai/eggroll:1.11.2-release "/tini -- bash -c 'j…" nodemanager About a minute ago Up About a minute 4671/tcp
confs-10000-rollsite-1 federatedai/eggroll:1.11.2-release "/tini -- bash -c 'j…" rollsite About a minute ago Up About a minute 0.0.0.0:9370->9370/tcp, :::9370->9370/tcp
confs-10000-nodemanager-1 federatedai/eggroll:v2.0.0-beta "/tini -- bash -c 'j…" nodemanager About a minute ago Up About a minute 4671/tcp
confs-10000-rollsite-1 federatedai/eggroll:v2.0.0-beta "/tini -- bash -c 'j…" rollsite About a minute ago Up About a minute 0.0.0.0:9370->9370/tcp, :::9370->9370/tcp
```

### Verifying the deployment
Expand Down
12 changes: 6 additions & 6 deletions docker-deploy/README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,13 +231,13 @@ docker compose ps

```bash
NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS
confs-10000-client-1 federatedai/client:1.11.2-release "bash -c 'pipeline i…" client About a minute ago Up About a minute 0.0.0.0:20000->20000/tcp, :::20000->20000/tcp
confs-10000-clustermanager-1 federatedai/eggroll:1.11.2-release "/tini -- bash -c 'j…" clustermanager About a minute ago Up About a minute 4670/tcp
confs-10000-fateboard-1 federatedai/fateboard:1.11.2-release "/bin/sh -c 'java -D…" fateboard About a minute ago Up About a minute 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
confs-10000-fateflow-1 federatedai/fateflow:1.11.2-release "/bin/bash -c 'set -…" fateflow About a minute ago Up About a minute (healthy) 0.0.0.0:9360->9360/tcp, :::9360->9360/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp
confs-10000-client-1 federatedai/client:v2.0.0-beta "bash -c 'pipeline i…" client About a minute ago Up About a minute 0.0.0.0:20000->20000/tcp, :::20000->20000/tcp
confs-10000-clustermanager-1 federatedai/eggroll:v2.0.0-beta "/tini -- bash -c 'j…" clustermanager About a minute ago Up About a minute 4670/tcp
confs-10000-fateboard-1 federatedai/fateboard:v2.0.0-beta "/bin/sh -c 'java -D…" fateboard About a minute ago Up About a minute 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
confs-10000-fateflow-1 federatedai/fateflow:v2.0.0-beta "/bin/bash -c 'set -…" fateflow About a minute ago Up About a minute (healthy) 0.0.0.0:9360->9360/tcp, :::9360->9360/tcp, 0.0.0.0:9380->9380/tcp, :::9380->9380/tcp
confs-10000-mysql-1 mysql:8.0.28 "docker-entrypoint.s…" mysql About a minute ago Up About a minute 3306/tcp, 33060/tcp
confs-10000-nodemanager-1 federatedai/eggroll:1.11.2-release "/tini -- bash -c 'j…" nodemanager About a minute ago Up About a minute 4671/tcp
confs-10000-rollsite-1 federatedai/eggroll:1.11.2-release "/tini -- bash -c 'j…" rollsite About a minute ago Up About a minute 0.0.0.0:9370->9370/tcp, :::9370->9370/tcp
confs-10000-nodemanager-1 federatedai/eggroll:v2.0.0-beta "/tini -- bash -c 'j…" nodemanager About a minute ago Up About a minute 4671/tcp
confs-10000-rollsite-1 federatedai/eggroll:v2.0.0-beta "/tini -- bash -c 'j…" rollsite About a minute ago Up About a minute 0.0.0.0:9370->9370/tcp, :::9370->9370/tcp
```

### 验证部署
Expand Down
5 changes: 5 additions & 0 deletions docker-deploy/generate_config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,11 @@ ${party_id}:
port: 6650
sslPort: 6651
proxy: ""
default:
proxy: "proxy.fate.org:443"
domain: "fate.org"
brokerPort: 6650
brokerSslPort: 6651
EOF

fi
Expand Down
2 changes: 1 addition & 1 deletion docker-deploy/training_template/docker-compose-spark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ services:
set -x
cp /data/projects/fate/fate_flow/conf/pulsar_route_table.yaml /data/projects/fate/fate_flow/pulsar_route_table.yaml
cp /data/projects/fate/fate_flow/conf/rabbitmq_route_table.yaml /data/projects/fate/fate_flow/rabbitmq_route_table.yaml
pip install cryptography && sleep 5 && python fate_flow/python/fate_flow/fate_flow_server.py
sleep 5 && python fate_flow/python/fate_flow/fate_flow_server.py
environment:
FATE_PROJECT_BASE: "/data/projects/fate"
FATE_FLOW_UPLOAD_MAX_NUM: "1000000"
Expand Down
8 changes: 4 additions & 4 deletions docs/Manage_FATE_and_FATE-Serving_Version.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,18 +30,18 @@ The chart can be downloaded in each KubeFATE release, with name `fate-{release_v

Download it and copy it to the folder to upload.
```
$ kubefate chart upload -f ./fate-v1.11.2.tgz
$ kubefate chart upload -f ./fate-v2.0.0-beta.tgz
Upload file success
$ kubefate chart ls
UUID NAME VERSION APPVERSION
ca3f7843-749a-4f69-9f6b-4c544a7623ac fate v1.11.2 v1.11.2
ca3f7843-749a-4f69-9f6b-4c544a7623ac fate v2.0.0-beta v2.0.0-beta
```

Then, we can deploy the fate cluster of v1.11.2 version. The detail of cluster.yaml please refer to: [FATE Cluster Configuration](./configurations/FATE_cluster_configuration.md)
Then, we can deploy the fate cluster of v2.0.0-beta version. The detail of cluster.yaml please refer to: [FATE Cluster Configuration](./configurations/FATE_cluster_configuration.md)
```
chartName: fate
chartVersion: v1.11.2
chartVersion: v2.0.0-beta
```

We can delete the chart with:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,14 @@ After the tutorial, the deployment architecture looks like the following diagram
5. Network connectivity to dockerhub or 163 Docker Image Registry, and google gcr.
6. Setup the global KubeFATE version using in the tutorial and create a folder for the whole tutorial.
```
export fate_version=v1.11.2 && export kubefate_version=v1.4.5 && cd ~ && mkdir demo && cd demo
export fate_version=v2.0.0-beta && export kubefate_version=v1.4.5 && cd ~ && mkdir demo && cd demo
```

Notes:
* When talking about KubeFATE version, usually there are 3 notions:
* The KubeFATE CLI version, in this tutorial, it is v1.4.5.
* The KubeFATE service version, in this tutorial, it is v1.4.5.
* The FATE version, in this tutorial, it is v1.11.2, it also means the version of the helm chart of FATE, currently we use this version to tag the KubeFATE GitHub master branch.
* The FATE version, in this tutorial, it is v2.0.0-beta, it also means the version of the helm chart of FATE, currently we use this version to tag the KubeFATE GitHub master branch.
* **<font color="red">In this tutorial, the IP of the machine we used is 192.168.100.123. Please change it to your machine's IP in all the following commands and config files.</font></div>**

# Start Tutorial
Expand Down Expand Up @@ -87,7 +87,7 @@ When all the pods are in the ready state, it means your Kubernetes cluster is re
## Setup Kubefate
### Install KubeFATE CLI
Go to [KubeFATE Release](https://github.com/FederatedAI/KubeFATE/releases), and find the latest kubefate-k8s release
pack, which is `v1.11.2` as set to ENVs before. (replace ${fate_version} with the newest version available)
pack, which is `v2.0.0-beta` as set to ENVs before. (replace ${fate_version} with the newest version available)
```
curl -LO https://github.com/FederatedAI/KubeFATE/releases/download/${fate_version}/kubefate-k8s-${fate_version}.tar.gz && tar -xzf ./kubefate-k8s-${fate_version}.tar.gz
```
Expand Down Expand Up @@ -256,7 +256,7 @@ For `/kubefate/examples/party-9999/cluster-spark-pulsar.yaml`, modify it as foll
name: fate-9999
namespace: fate-9999
chartName: fate
chartVersion: v1.11.2
chartVersion: v2.0.0-beta
partyId: 9999
registry: ""
pullPolicy:
Expand Down Expand Up @@ -340,7 +340,7 @@ and for fate-10000:
name: fate-10000
namespace: fate-10000
chartName: fate
chartVersion: v1.11.2
chartVersion: v2.0.0-beta
partyId: 10000
registry: ""
pullPolicy:
Expand Down Expand Up @@ -440,8 +440,8 @@ or watch the clusters till their STATUS changing to `Running`:
```
kubefate@machine:~/kubefate$ watch kubefate cluster ls
UUID NAME NAMESPACE REVISION STATUS CHART ChartVERSION AGE
29878fa9-aeee-4ae5-a5b7-fd4e9eb7c1c3 fate-9999 fate-9999 1 Running fate v1.11.2 88s
dacc0549-b9fc-463f-837a-4e7316db2537 fate-10000 fate-10000 1 Running fate v1.11.2 69s
29878fa9-aeee-4ae5-a5b7-fd4e9eb7c1c3 fate-9999 fate-9999 1 Running fate v2.0.0-beta 88s
dacc0549-b9fc-463f-837a-4e7316db2537 fate-10000 fate-10000 1 Running fate v2.0.0-beta 69s
```
We have about 10G Docker images that need to be pulled, this step will take a while for the first time.
An alternative way is offline loading the images to the local environment.
Expand Down Expand Up @@ -479,13 +479,13 @@ UUID 29878fa9-aeee-4ae5-a5b7-fd4e9eb7c1c3
Name fate-9999
NameSpace fate-9999
ChartName fate
ChartVersion v1.11.2
ChartVersion v2.0.0-beta
Revision 1
Age 54m
Status Running
Spec algorithm: Basic
chartName: fate
chartVersion: v1.11.2
chartVersion: v2.0.0-beta
computing: Spark
device: CPU
federation: Pulsar
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@
5. 要保证安装机器可以正常访问Docker Hub或者网易云镜像仓库,以及Google gcr;
6. 预先创建一个目录,以便整个过程使用该目录作为工作目录,命令如下:
```
export fate_version=v1.11.2 && export kubefate_version=v1.4.5 && cd ~ && mkdir demo && cd demo
export fate_version=v2.0.0-beta && export kubefate_version=v1.4.5 && cd ~ && mkdir demo && cd demo
```

Notes:
* 当我们提到"KubeFATE的版本",通常来讲会有三个概念:
* KubeFATE命令行工具的版本,在本教程中为v1.4.5。
* KubeFATE服务版本,在本教程中为v1.4.5。
* FATE版本,在本教程中v1.11.2,它也意味着FATE的Helm Chart的版本, 值得注意的是我们用这个版本来给GitHub上的KubeFATE的发布打tag。
* FATE版本,在本教程中v2.0.0-beta,它也意味着FATE的Helm Chart的版本, 值得注意的是我们用这个版本来给GitHub上的KubeFATE的发布打tag。
* **<font color="red">下文介绍的MiniKube机器IP地址是192.168.100.123。请修改为你准备的实验机器IP地址</font></div>**

# 开始安装
Expand Down Expand Up @@ -77,7 +77,7 @@ sudo minikube addons enable ingress

## 安装Kubefate
### 下载KubeFATE命令行工具
我们从Github上 [KubeFATE Release](https://github.com/FederatedAI/KubeFATE/releases)页面找到Kuberetes部署的下载包,并下载对应版本,如前面环境变量设置`v1.11.2`
我们从Github上 [KubeFATE Release](https://github.com/FederatedAI/KubeFATE/releases)页面找到Kuberetes部署的下载包,并下载对应版本,如前面环境变量设置`v2.0.0-beta`
```
curl -LO https://github.com/FederatedAI/KubeFATE/releases/download/${fate_version}/kubefate-k8s-${fate_version}.tar.gz && tar -xzf ./kubefate-k8s-${fate_version}.tar.gz
```
Expand Down Expand Up @@ -237,7 +237,7 @@ kubectl -n fate-10000 create secret docker-registry myregistrykey \
name: fate-9999
namespace: fate-9999
chartName: fate
chartVersion: v1.11.2
chartVersion: v2.0.0-beta
partyId: 9999
registry: ""
pullPolicy:
Expand Down Expand Up @@ -322,7 +322,7 @@ pulsar:
name: fate-10000
namespace: fate-10000
chartName: fate
chartVersion: v1.11.2
chartVersion: v2.0.0-beta
partyId: 10000
registry: ""
pullPolicy:
Expand Down Expand Up @@ -418,8 +418,8 @@ create job success, job id=7752db70-e368-41fa-8827-d39411728d1b
```
kubefate@machine:~/kubefate$ watch kubefate cluster ls
UUID NAME NAMESPACE REVISION STATUS CHART ChartVERSION AGE
29878fa9-aeee-4ae5-a5b7-fd4e9eb7c1c3 fate-9999 fate-9999 1 Running fate v1.11.2 88s
dacc0549-b9fc-463f-837a-4e7316db2537 fate-10000 fate-10000 1 Running fate v1.11.2 69s
29878fa9-aeee-4ae5-a5b7-fd4e9eb7c1c3 fate-9999 fate-9999 1 Running fate v2.0.0-beta 88s
dacc0549-b9fc-463f-837a-4e7316db2537 fate-10000 fate-10000 1 Running fate v2.0.0-beta 69s
```
因为这个步骤需要到网易云镜像仓库去下载约10G的镜像,所以第一次执行视乎你的网络情况需要一定时间。
检查下载的进度可以用
Expand All @@ -446,13 +446,13 @@ UUID 29878fa9-aeee-4ae5-a5b7-fd4e9eb7c1c3
Name fate-9999
NameSpace fate-9999
ChartName fate
ChartVersion v1.11.2
ChartVersion v2.0.0-beta
Revision 1
Age 54m
Status Running
Spec algorithm: Basic
chartName: fate
chartVersion: v1.11.2
chartVersion: v2.0.0-beta
computing: Spark
device: CPU
federation: Pulsar
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ spec:
- "rollsite"
containers:
- name: exchange
image: {{ .Values.image.registry }}/nginx:1.17
image: {{ .Values.image.registry }}nginx:1.17
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- containerPort: 9390
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ spec:
spec:
containers:
- name: datanode
image: {{ .Values.image.registry }}{{ .Values.modules.datanode.image }}:{{ .Values.modules.datanode.imageTag }}
image: {{ .Values.image.registry }}{{ .Values.modules.hdfs.datanode.image }}:{{ .Values.modules.hdfs.datanode.imageTag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
env:
- name: SERVICE_PRECONDITION
Expand Down
4 changes: 2 additions & 2 deletions helm-charts/FATE/templates/backends/spark/hdfs/namenode.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ spec:
{{ end }}
containers:
- name: namenode
image: {{ .Values.image.registry }}{{ .Values.modules.namenode.image }}:{{ .Values.modules.namenode.imageTag }}
image: {{ .Values.image.registry }}{{ .Values.modules.hdfs.namenode.image }}:{{ .Values.modules.hdfs.namenode.imageTag }}
imagePullPolicy: {{ .Values.image.pullPolicy }}
env:
- name: CLUSTER_NAME
Expand Down Expand Up @@ -130,5 +130,5 @@ spec:
resources:
requests:
storage: {{ .Values.modules.hdfs.namenode.size }}
{{- end }}
{{- end }}
{{- end }}
12 changes: 9 additions & 3 deletions helm-charts/FATE/templates/core/fateflow/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,9 @@ data:
proxy_name: rollsite
{{- end }}
nginx:
host: fateflow
http_port: 9380
grpc_port: 9360
host:
http_port:
grpc_port:
database:
engine: mysql
# encrypt passwd key
Expand Down Expand Up @@ -207,6 +207,12 @@ data:
default:
proxy: "{{ .ip }}:{{ .port }}"
domain: "{{ .domain }}"
{{- else }}
default:
proxy: "proxy.fate.org:443"
domain: "fate.org"
brokerPort: 6650
brokerSslPort: 6651
{{- end }}
{{- if .Values.modules.pulsar.route_table }}
{{- range $key, $val := .Values.modules.pulsar.route_table }}
Expand Down
9 changes: 6 additions & 3 deletions helm-charts/FATE/templates/core/python-spark.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,10 @@ spec:
cp /data/projects/spark-3.1.3-bin-hadoop3.2/conf/spark-defaults-template.conf /data/projects/spark-3.1.3-bin-hadoop3.2/conf/spark-defaults.conf
sed -i "s/fateflow/${POD_IP}/g" /data/projects/spark-3.1.3-bin-hadoop3.2/conf/spark-defaults.conf
pip install cryptography && sleep 5 && python fate_flow/python/fate_flow/fate_flow_server.py
cp /data/projects/fate/fate_flow/conf/pulsar_route_table/pulsar_route_table.yaml /data/projects/fate/fate_flow/pulsar_route_table.yaml
cp /data/projects/fate/fate_flow/conf/rabbitmq_route_table/rabbitmq_route_table.yaml /data/projects/fate/fate_flow/rabbitmq_route_table.yaml
sleep 5 && python fate_flow/python/fate_flow/fate_flow_server.py
livenessProbe:
tcpSocket:
port: 9380
Expand Down Expand Up @@ -158,11 +161,11 @@ spec:
name: python-confs
subPath: spark-defaults.conf
{{- if eq .Values.federation "RabbitMQ" }}
- mountPath: /data/projects/fate/conf/rabbitmq_route_table
- mountPath: /data/projects/fate/fate_flow/conf/rabbitmq_route_table
name: rabbitmq-route-table
{{- end }}
{{- if eq .Values.federation "Pulsar" }}
- mountPath: /data/projects/fate/conf/pulsar_route_table
- mountPath: /data/projects/fate/fate_flow/conf/pulsar_route_table
name: pulsar-route-table
{{- end }}
- mountPath: /data/projects/fate/fate_flow/jobs
Expand Down
4 changes: 2 additions & 2 deletions helm-charts/FATE/values-template-example.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: fate-9999
namespace: fate-9999
chartName: fate
chartVersion: v1.11.2
chartVersion: v2.0.0-beta
partyId: 9999
registry: ""
pullPolicy:
Expand Down Expand Up @@ -399,7 +399,7 @@ skippedKeys:

# pulsar:
# image: "federatedai/pulsar"
# imageTag: "2.10.1"
# imageTag: "2.7.0"
# nodeSelector:
# tolerations:
# affinity:
Expand Down
2 changes: 1 addition & 1 deletion helm-charts/FATE/values-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -650,7 +650,7 @@ modules:
{{- end }}
type: {{ .type | default "ClusterIP" }}
image: {{ .image | default "federatedai/pulsar" | quote }}
imageTag: {{ .imageTag | default "2.10.1" | quote }}
imageTag: {{ .imageTag | default "2.7.0" | quote }}
httpNodePort: {{ .httpNodePort }}
httpsNodePort: {{ .httpsNodePort }}
loadBalancerIP: {{ .loadBalancerIP }}
Expand Down
Loading

0 comments on commit 10f8456

Please sign in to comment.