RANUM is a tool for detecting, testing and fixing the numerical defects in DNN architectures.
If you find the tool useful, please consider citing our accompanying paper at ICSE 2023:
@inproceedings{li2023reliability,
author={Linyi Li and Yuhao Zhang and Luyao Ren and Yingfei Xiong and Tao Xie},
title = {Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects},
booktitle = {45th International Conference on Software Engineering, {ICSE} 2023, Melbourne, Australia, 14-20 May 2023},
publisher = {{IEEE/ACM}},
year = {2023},
}
First, download missing benchmark architectures and running logs from https://doi.org/10.6084/m9.figshare.21973529.v1:
-
Following the link, you will find two zip files.
-
Download
model_zoo.zip
, and unzip it at the project root folder. You will get a new foldermodel_zoo/
. -
Download
results.zip
, and unzip it. You will find two folders:results/
andresults_digest/
. Move both folders to be under the project root .
The tool is expected to run on Linux + Python + PyTorch platform with around 500 GB storage space. Reference installation commands:
apt-get install -y python3.6 python3-pip cmake
pip3 install --upgrade pip
pip install -r requirements.txt
GPU support only requires minor changes in the code (place tensor.cuda()
at necessary places). With GPU and these minor changes, times of speed-ups are expected but we haven't tested yet.
For following commands, the working directory is the root directory of this project.
All new results are output to results/
folder.
This code base contains (1) the official implementation of the RANUM framework for assuring the numerical reliability of deep neural networks; (2) the reference running logs; and (3) detail case-level results for the empirical study.
How to replicate and reproduce the results?
(1) From the official RANUM framework, users can reproduce all technical experimental results in the RANUM paper.
Hardware requirements: To replicate the results from scratch, we need a CPU server with around 500G storage space, to save all raw experimental data.
Hardware requirements: A Linux environment, Python, and PyTorch are required. We also provide a Dockerfile, so you can run these commands to get results from scratch.
docker build -t ranum:v1
docker run -it ranum:v1
sh runall.sh
Then, all tables in the experimental section can be found in /srv/results/results.txt
.
It may take 4-5 days to finish runing.
(2) From the reference running logs, users can reproduce all technical experimental results in the RANUM paper without running from scratch.
Hardware requirements: We need a CPU server without extra storage space, to save all raw experimental data.
Hardware requirements: A Linux environment, Python, and PyTorch are required.
docker build -t ranum:v1
docker run -it ranum:v1
ipython evaluate/texify/final_texify.py --result_folder results_digest
All results will be printed in console in seconds.
Both results in /srv/results/results.txt
and results in console after running final_texify.py
have the same format.
-
==================== detection ====================
leads the detail results for static defect detection. This table is no longer shown in the paper, but it answers RQ1 and the statistics (total detection time and average detection time per case) are described in the paper. -
==================== Failure-Exhibiting Unit Test ====================
leads the detail results for unit test generation (Table 1) in the paper. -
==================== Failure-Exhibiting System Test ====================
leads the detail results for system test generation (Table 2) in the paper. The statistics in the end are number of successful cases and total used time, which are described in the paper for RQ2. -
==================== Precondition-Fix ====================
leads the overview results for precondition-fix suggestion in Table 3. The statistics are used to answer RQ3. In addition, we report the number of iterations in solving the optimization problem. These statistics are not reported in the paper, but they are mainly for diagnosis.
(3) Detail case-level log for the empirical study
The detail empirical study log (along with the repository URLs) is in empirical_study/precond/study_detail.txt
and the statistics are in empirical_study/precond/study_statistics.csv
.
The human-friendly precondition-fixes generated by RANUM are in empirical_study/precond
, where all
means preconditions can be imposed on both initial input nodes and weight nodes; input
means preconditions can be imposed only on initial input nodes; weight
means preconditions can be imposed only on weight nodes; and immediate
means preconditions can be imposed on the immediate vulnerable operators.
The list of DEBAR supported DNN operator types is in empirical_study/debar_abstraction_list.txt
.
The list of RANUM supported DNN operator types can be counted from interp/interp_operator.py
(inter_*
methods in Interpreter
class).
Specifically, we show the individual commands for each stage task.
RANUM: ipython evaluate/bug_verifier.py
The running results of DEBAR are from GRIST (Yan et al) paper and DEBAR (Zhang et al) repository.
ipython evaluate/robust_inducing_inst_generator.py
- generate failure-inducing intervals
ipython experiments/unittest/err_trigger.py random
- generate 1000 distinct unit tests from these intervals
System test generation relies on the generated unit tests, so please run unit test generation first
ipython evaluate/train_inst_generator.py ranum
ipython evaluate/train_inst_generator.py random
ipython evaluate/train_inst_generator.py ranum_p_random
ipython evaluate/train_inst_generator.py random_p_ranum
ipython evaluate/precond_generator.py ranum all
ipython evaluate/precond_generator.py ranumexpand all
ipython evaluate/precond_generator.py gd all
ipython evaluate/precond_generator.py ranum weight
ipython evaluate/precond_generator.py ranumexpand weight
ipython evaluate/precond_generator.py gd weight
ipython evaluate/precond_generator.py ranum input
ipython evaluate/precond_generator.py ranumexpand input
ipython evaluate/precond_generator.py gd input
ipython evaluate/precond_generator_immediate.py
-
Folder
model_zoo/grist_protobufs_onnx
includes ONNX-format DNN architecture files for the 79 real-world defect programs in GRIST. These files extract core DNN architectures in the original programs. Hope these help the evaluation and future development of defect-fixing methods by disentangling methods from concrete implementation codes and boilerplates. -
Folder
results/GRIST_log
contains our running log for the GRIST tool on the GRIST benchmark. Hope these provide detailed information and cross-validation sources for this widely-known baseline.