-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Paddle Refactorization Overall Design
-
PaddlePaddle represent the training and inference of DL models by computation graphs.
-
Graphs are constructed by a Python program.
-
A graph is composed of variabels and operators.
-
A graph should be able to be serialized for distributed training.
-
There are two stages to process the Graph:
- compile: runs a Python program to generate a protobuf message representation of the graph and send it to the C++ library/binaries, and
- run: construct class Variable and OperatorBase instances and run them.
compile time | runtime | |
---|---|---|
Data | VarDesc(proto) | Variable(cpp) |
Operation | OpDesc(proto) | Operator(cpp) |
-
User Use Python code to describe the Computation.
-
Compile Time
: generates Graph. -
Compile Time
: check, optimize, and transform Graph.- Check data size and attribute.
- Infer the shape of data.
- Do memory plan and reuse.
- Generate backward and optimization part of the Graph.
- split the graph for distributed training.
-
Runtime
: Run Graph.
Compile Time -> IR -> Runtime
- Optimization
Compile Time -> IR -> Optimized IR -> Runtime
- Send automatically partitioned IR to different nodes.
- Automatic data parallel
Compile Time |-> Single GPU IR |-> [trainer-IR-0, trainer-IR-1, pserver-IR] |-> Node-0 (runs trainer-IR-0) |-> Node-1 (runs trainer-IR-1) |-> Node-2 (runs pserver-IR)
- Automatic model parallel (planned for future)
- Automatic data parallel
-
Operator
is the fundamental building block as the user interface.- Operator stores input/output variable name, and attributes.
- The
InferShape
interface is used to infer output variable shapes by its input shapes. - Use
Run
to computeinput variables
tooutput variables
.
-
OpWithKernel
inheritsOperator
. -
OpWithKernel
contains a Kernel map.-
OpWithKernel::Run
get device's kernel, and invokeOpKernel::Compute
. -
OpKernelKey
is the map key. Only device place now, but may be data type later.
-
- Separate GPU and CPU code.
- Make Paddle can run without GPU.
- Make one operator (which is user interface) can contain many implementations.
- Same mul op, different FP16, FP32 Kernel. different MKL, eigen kernel.
-
Eigen::Tensor
contains basic math and element-wise functions.- Note that
Eigen::Tensor
has broadcast implementation. - Limit number of
tensor.device(dev) =
in your code.
- Note that
-
thrust::tranform
andstd::transform
.-
thrust
has the same API as C++ standard library. Usingtransform
can quickly implement a customized elementwise kernel. -
thrust
has more complex API, likescan
,reduce
,reduce_by_key
.
-
- Hand-writing
GPUKernel
andCPU
code- Do not write
.h
. CPU Kernel should be in.cc
. CPU kernel should be in.cu
. (GCC
cannot compile GPU code.)
- Do not write
We need a method to build mappings between Op type names and Op classes.
Maintain a map, whose key is the type name and value is corresponding Op constructor.
op_type(string)
-> OpInfo
OpInfo
:
-
creator
: The Op constructor. -
grad_op_type
: The type of the gradient Op. -
proto
: The Op's Protobuf, including inputs, outputs and required attributes. -
checker
: Used to check attributes.
It's constructor takes proto
and checker
. They are compeleted during Op_Maker's construction. (ScaleOpMaker)
REGISTER_OP(op_type, op_class, op_maker_class, grad_op_type, grad_op_class)
REGISTER_OP_WITHOUT_GRADIENT(op_type, op_class, op_maker_class)
make sure the registration process is executed and linked.
-
Write Op class, as well as its gradient Op class if there is.
-
Write Op maker class. In the constructor, describe its inputs, outputs, and attributes.
-
Invoke macro
REGISTER_OP
. The macro will- call maker class to complete
proto
andchecker
- with the completed
proto
andchecker
, build a new key-value pair in theOpInfoMap
- call maker class to complete
-
Invoke
USE
macro in where the Op is used to make sure it is linked.
- Mapping from forwarding Op to backward Op
- Input graph of forwarding operators
- Output graph of backward operators
-
corner case in construction
- shared variable => insert
Add
operator - no gradient => insert
fill_zero_grad
operator - recursive netOp => call
Backward
recursively - RNN Op => recursively call
Backward
on stepnet
- shared variable => insert
-
Tensor
is an n-dimension array with type.- Only dims and data pointers are stored in
Tensor
. - All operators on
Tensor
is written inOperator
or global functions. - variable length Tensor design LoDTensor
- Only dims and data pointers are stored in
-
Variable
is the inputs and outputs of an operator. Not justTensor
.- step_scopes in RNN is a variable and not a tensor.
-
Scope
is where variables store at.- map<string/*var name */, Variable>
-
Scope
has a hierarchical structure. The local scope can get variable from its parent scope.
- as an operator is more intuitive than
RNNOp
, - offers new interface
Eval(targets)
to deduce the minimal block toRun
, - fits the compile-time/ runtime separation design.
- during the compilation,
SymbolTable
storesVarDesc
s andOpDesc
s and serialize to aBlockDesc
- when graph executes, a Block with
BlockDesc
passed in createsOp
andVar
thenRun
- during the compilation,
- take Paddle/books as the main line, the requirement of the models motivates framework refactoring,
- model migration
- framework development gives priority support to model migration, for example,
- the MNIST demo needs a Python interface,
- the RNN models require the framework to support
LoDTensor
.
- determine some timelines,
- heavily-relied Ops need to be migrated first,
- different models can be migrated parallelly.
- framework development gives priority support to model migration, for example,
- improve the framework at the same time
- accept imperfection, concentrated on solving the specific problem at the right price.
- compare the performance of migrated models with old ones.
- follow google C style
- build the automatic workflow of generating Python/C++ documentations
- the documentation of layers and ops should be written inside the code
- take the documentation quality into account when doing PR
- preview the documentations, read and improve them from users' perspective