-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Build PaddlePaddle from source with latest local changes
*** This is a draft! ***
Normally if a user wants to use PaddlePaddle, he could just pull the latest Docker image from PaddlePaddle's Dockerhub
latest
has the latest prod version and is normally smaller whereas latest-dev
is the latest develop version with all the necessary tools installed (e.g., Git, Vim, various C++/Python libraries, etc.) therefore is usually larger. As of May 8, 2018, latest-dev
is 2GB and latest
is 529 MB.
Note even the latest-dev
Docker image might miss some even-more-recent code or local changes. So if a user (probably a developer) wants to build PaddlePaddle with the latest code, or wants to involve his local changes, then he could build PaddlePaddle from source. Furthermore, this can also allow the user to customize the build by tuning different parameters. For example, he could disable distributed functionality (WITH_DISTRIBUTE=OFF) if he only runs PaddlePaddle locally.
Building PaddlePaddle from source does not require much. All you need is:
- A computer. It can be Linux, Windows or macOS
- Docker
We do not need any other software. Even Python or C++ are not needed. Now let's walk through the steps of building PaddlePaddle from source.
Since we are building PaddlePaddle from source, we need to first clone the git repo to get the source.
git clone https://github.com/PaddlePaddle/Paddle.git
The safest way to guarantee PaddlePaddle can work is to first build the latest dev image from the user's computer. The dev image has all the dev tools installs, such as Vim, Git and util libraries.
docker build -t mypaddle .
This dev image is very large (5.2GB as of May 8, 2018)
Now we have a fresh dev environment from the dev image we just built in the previous step, we can use it to build a fresh prod PaddlePaddle.
Start a dev container with the parameters of your choice and bash into it:
docker run -it -v `pwd`:/paddle -v /root/.cache:/root/.cache -e WITH_GPU=OFF -e WITH_AVX=ON -e WITH_GOLANG=OFF -e WITH_TESTING=OFF -e WITH_COVERAGE=OFF -e COVERALLS_UPLOAD=OFF -e WITH_C_API=OFF -e CMAKE_BUILD_TYPE=RelWithDebInfo -e WITH_MKL=OFF -e WITH_DEB=OFF -e PADDLE_VERSION=0.10.0 -e PADDLE_FRACTION_GPU_MEMORY_TO_USE=0.15 -e RUN_TEST=OFF -e CUDA_ARCH_NAME=Auto -e WITH_FLUID_ONLY=ON -e WITH_DISTRIBUTE=OFF mypaddle:latest /bin/bash
Inside the dev environment, we can build a Dockerfile for the light-weight prod version
λ 38f6e151afea /paddle {develop} ./paddle/scripts/paddle_build.sh dockerfile
========================================
Generate /paddle/build/Dockerfile ...
========================================
Then we exit from the dev environment and go to paddle/build
to build the light-weight prod image which runs faster with a smaller size.
λ 38f6e151afea /paddle {develop} exit
/Paddle$ cd build/
/Paddle/build$ docker build -t mypaddleprod .
This prod version has a much smaller size (1.6GB as of May 8, 2018)
Then we can start and log in to the prod container and all the changes, both Python and C++, are taking effect in that container
docker run -it -v `pwd`:/paddle mypaddleprod /bin/bash