This project is used to do PyFlink related benchmark test
- python3
- java 1.8
- maven version >= 3.3.0
Python version (3.5, 3.6 or 3.7) is required for PyFlink. Please run the following command to make sure that it meets the requirements:
$ python --version
# the version printed here must be 3.5, 3.6 or 3.7
maven version >=3.3.0
$ tar -xvf apache-maven-3.6.1-bin.tar.gz
$ mv -rf apache-maven-3.6.1 /usr/local/
configuration environment variables
MAVEN_HOME=/usr/local/apache-maven-3.6.1
export MAVEN_HOME
export PATH=${PATH}:${MAVEN_HOME}/bin
If you want to install PyFlink version 1.10, you can execute the following command to install the PyFlink:
$ python -m pip install apache-flink==1.10
If you want to install PyFlink version 1.11 which has not released, you need to build from the source code. You can refer to Build PyFlink.
you need to download PySpark 3.0.0-preview2 from download page. Then you can execute the following command to install PySpark 3.0:
$ tar zxvf spark-3.0.0-preview2-bin-hadoop2.7.tgz
$ cd spark-3.0.0-preview2-bin-hadoop2.7/python
$ python setup.py sdist
$ pip install dist/pyspark-3.0.0.dev2.tar.gz
# Run PyFlink Python UDF Test
$ ./run_flink_test.sh
# Run PyFlink Pandas UDF Test
$ ./run_flink_pandas_test.sh
# Run PySpark Python UDF Test
$ ./run_spark_test.sh
# Run PySpark Pandas UDF Test
$ ./run_spark_pandas_test.sh