Originally, the FATE use the underlying EggRoll as the underlying computing engine, the following picture illustrates the overall architecture.
As the above figure show, the EggRoll provide both computing and storage resource.
Since FATE v1.5.0 a user can select Spark as the underlying computing engine, however, spark itself is an in-memory computing engine without the data persistence. Thus, HDFS is also needed to be deployed to help on data persistence. For example, a user need to upload their data to HDFS through FATE before doing any training job, and the output data of each component will also be stored in the HDFS module.
Currently the verifed Spark version is 3.1.3 and the Hadoop is 3.2.1
The following picture shows the architecture of FATE on Spark:
In current implementation, the fate_flow
service uses the spark-submit
binary tool to submit job to the Spark cluster. With the configuration of the fate's job, a user can also specify the configuration for the spark application, here is an example:
{
"initiator": {
"role": "guest",
"party_id": 10000
},
"job_parameters": {
"spark_run": {
"executor-memory": "4G",
"total-executor-cores": 4
},
...
The above configuration limit the maximum memory and the cores that can be used by the executors. For more about the supported "spark_run" parameters please refer to this page