The Glow test suite contains four major categories: unit tests, regression tests, the model loader, integrated tests and example programs.
Unit tests are the small tests that stress specific parts of the compiler.
These tests are added to the compiler when developing a feature. For example,
we train a number of small network and perform a gradient check on the operators.
We also compile networks to intermediate representations (IRs) and look for
specific patterns.
Regression tests are tests that are added when we fix bugs. Both regression tests and feature tests are found under the "test/" directory. To run the feature and regression tests run "ninja test".
We test the correctness of the Glow implementation by loading Caffe2 and ONNX models and executing them end-to-end.
The program image-classifier
loads a model, a PNG file, and runs a single pass
of inference. If everything goes right the output of the program is identical to
the output of the original (Caffe2 or ONNX) model. Unfortunately, the models do
not usually describe what the input format should be. Should the pixels be
between zero and one, or negative 128 to positive 128? The user needs to be
aware of these things when running the models. The script in the directory
'utils/' downloads a number of pre-trained networks that we can use for testing.
The Glow build scripts copy a few sample images and a run script that tests the
image-classifier
program. The script can be executed with the command:
build$./tests/images/run.sh
The script imagenet_topk_accuracy_driver.py
located in the utils/
directory
can be used to calculate Top-1 and Top-5 accuracy. It can be run via a command
like the following:
python utils/imagenet_topk_accuracy_driver.py --batch-size=10 --validation-images-dir=${PATH_TO_IMAGES} --image-classifier-cmd="${PATH_TO_IMAGE_CLASSIFIER_BINARY} -image-mode=0to1 -m=${PATH_TO_RESNET50_PROTOS_DIR} -model-input-name=gpu_0/data -backend=CPU -topk=5 -"
Note that the --image-classifier-cmd
must include -topk=5
for printing the
Top-5 labels, and -
to run in streaming mode.
The script expects the directory passed in via --validation-images-dir
to
contain subdirectories alphabetically ordered in order of increasing label. For
example, for Imagenet with 1000 labels, subdirectories could be listed as
label000/, label001/, ... , label999/
, where label000/
contains all images
that should be classified with label 0.
The script can be used to resize and center crop images to 224x224 via
--resize-input-images
. This resize and center cropping can be done by itself
via --only-resize-and-save
, improving execution time of calculating Top-k
accuracy more than once (this saves the processed images to
validation_images_dir/processed/
).
The program text-translator
loads a text translation model, reads a line from
stdin in a source language, and then prints the translation to the command line
in the destination language. The text translation model should be specified by a
directory via -m
, containing the source and destination dictionaries
(src_dictionary.txt
and dst_dictionary.txt
), as well as the protobuf files
for the model. A backend can be optionally specified, just like for the
image-classifier
.
$ ./bin/text-translator -m en2gr -backend=CPU
Enter a sentence in English to translate to German: My favorite sport is basketball .
mein Lieblingssport ist Basketball .
This program expects a sequence-to-sequence model with beam search. Because Glow
currently does not support models that contain control flow (e.g. the
RecurrentNetwork operator from
Caffe2), the
input model must be unrolled to some maximum input and output length. These can
be specified on the command line via -max-input-len
and
-max-output-len
. Additionally, the beam search size can be specified via
-beam-size
. The default options for the text-translator
match those for the
en2gr model currently downloaded via utils/download_datasets_and_models.py
(-max-input-len=10
, -max-output-len=14
, -beam-size=6
).
Model loader programs (e.g. image-classifier
and text-translator
) load
pre-trained models from protobuf file (either Caffe2 or ONNX). These pre-trained
models are downloaded via download_datasets_and_models.py
script located in utils/
.
There is a more general way to run a pre-trained model, not related to images.
The model-runner
program loads and runs a self-contained model, i.e. a model,
which has all its inputs initialized inside itself and does not ask for user's
input.
The caffe2_train_and_dump_pb.py
script in utils/
allows the user to define
their own models and input training set in Caffe2, and then dumps the network
and weights to protobuf files (the network structure in predict_net.pb/pbtxt
and the pre-trained weights in init_net.pb
). Right now it trains either LeNet
on MNIST; an MLP is also available and can be used by setting USE_LENET_MODEL = False
. This script is heavily based on the MNIST.py tutorial from Caffe2.
The caffe2_pb_runner.py
script in utils/
loads and runs a pre-trained model
using the protobuf files saved using caffe2_train_and_dump_pb.py
. This can be
used to compare the output from Glow to Caffe2. Its usage is similar to running
the image-classifier
, which is found in the run.sh
script in tests/images/
. For
example, the following command will run the pre-trained resnet50 model using
Caffe2:
python caffe2_pb_runner.py -i [location_of_image] -d resnet50
Glow also comes with tests integrated with the build environment for our command line tools. We run those tests as part of our continuous integration (CI).
Run them as part of your local build using the following
cmake -G Ninja <glow_src> -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_PREFIX_PATH=/usr/local/opt/llvm \
-DGLOW_MODELS_DIR=<downloaded_c2_models>
Followed by
ninja check_expensive
Note: ninja check_expensive
runs all of the tests that ninja check
runs plus
any tests that have been marked as EXPENSIVE using add_glow_test(EXPENSIVE ...)
such as the integration tests.
Note: The difference between ninja test
and ninja check
is that
ninja check
makes sure the build dependencies are current before
running the tests.
We rely on external test suites to test the compiler. We use the data sets CIFAR10 and MNIST (located in the "examples/" directory) to test the correctness of the whole system. The script under 'utils/' downloads and extracts the data set.
Numpy files are loaded automatically by recognizing the file header. Format limitations:
- Little-endian only.
- No FORTRAN data ordering support.
- Format v2.0 and v3.0 are not supported.
NUMPY files accept input-layout=LAYOUT
command line argument. LAYOUT can be:
NHWC
or NCHW
: tensors must be 3D or 4D only. 3D ones are expanded with
the batch. If the layout is not matching the one specified using existing -image-layout
command line argument, the input tensor is transposed accordingly. Tensors from multiple
files are concatenated along the batch dimensions. Thus, the batch dimension can differ
from file to file but other dimensions must match.
NonImage
: Can be uses for 1D-4D tensors. In this case, -input-layout is ignored, input
files are not transposed, and batching is not possible. Also, these tensors accept a
single mean/stddev only.