-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accuracy Discrepancies between the built Accelerator (68%) on ZCU102 and Brevitas Model (86%) #996
Comments
Hi there, Initially, during the ONNX execution, I achieved a commendable accuracy of 86% after applying tidy-up transformations, pre and post-processing transformations. To give you a clearer picture, here are the streamline transformations I've implemented: I also tried the finn.builder.build_dataflow, it still showed the same issue i.e. when streamline transformations are applied there is a drop in accuracy. Only when I take "model = model.transform(LowerConvsToMatMul())" this trasnformation off, I get the same 86% accuracy. And I know to convert the model to hls-compatible node we have to convert convs to matmul and we need this transformation. And the only difference other than this I see with and without transformation multithreshold_1 and multithreshold_2 finn_datatype are Binaray (with LowerConvsToMatMul: giving an accuracy of 68%), and are Bipolar (without LowerConvsToMatMul: giving an accuracy of 86%) respectively. I'm at a loss as to why this transformation is causing such a significant accuracy drop. Is it due to the Multithreshold finn_dtypes or even Kernel Size i.e 6x6 I am using in quantconv2d? Any insights or suggestions you could offer would be greatly appreciated. Thank you for your time and assistance. |
Hi @shakeelakram00, could you try the latest release with your flow? Note that you will need to change your flow for the new structure, this blog post might be helpful: #1020 |
Hi @auphelia , But moving forward when I apply the following Partitioning, Conversion to HW Layers and Folding transformations I get an error AssertionError: MultiThreshold_3: Signed output requires actval less than 0. which I suppose was due to the multithreshold_3 generated for the quantidentity layer associated with last convolution layer before linear layers. So, I tried to update the attributes of that node by making out_bias == -1.0 manually in the onnx generated after streamline transformations. This got me rid of the error but dropped the accuracy to even down to 51%, which I suppose is due to the force change of out_bias. So what you suggest, if I have the same convolution layers with same quantidentity layer associated with them except the second conv has an additional layer maxpool, why the MultiThreshold_3 doesn't automatically takes the out_bias = -1.0 as MultiThreshold_2 and 1 does when all three of them are coming from the same quantidentity layer. import finn.transformation.fpgadataflow.convert_to_hw_layers as to_hw model = ModelWrapper(build_dir + "/end2end_cnv_w1a1_streamlined.onnx") |
ZCU102: PYNQ Linux, based on Ubuntu 18.04 (GNU/Linux 4.19.0-xilinx-v2019.1 a)
FINN: v0.9
Xilinx tools: 2022.2
Ubuntu: 22.04.1 LTS
Start the docker container with the command: ./run-docker.sh notebook
commit b3bdff1 (HEAD -> main, origin/main, origin/HEAD)
Merge: cdc5ec4 9847528
Author: auphelia <[email protected]>
Date: Mon Feb 13 11:55:42 2023 +0000
commit 9847528
Author: auphelia <[email protected]>
Date: Mon Feb 13 11:52:15 2023 +0000
commit cdc5ec4 (tag: v0.9)
Merge: 41740ed 17af0c3
Author: auphelia <[email protected]>
Date: Fri Feb 10 12:00:49 2023 +0000
Summary
I have been working with the cnv_end2end_example and successfully modified it to build the Accelerator on a different dataset. The brevitas model was trained on a dataset with a shape of 1x1x14x14, dtype torch.float32 and values ranging between 0 and 1.
Following the cnv_end2end_example, the first layer that exists does the quantization and the ONNX conversion includes pre-processing (ToTensor(), i.e., division by 255 for normalization UINT8 inputs to FLOAT [0,1]) and post-processing (TopK=1). The ONNX model, after create_dataflow_partition, provides all the blocks converted into HLS_Layers, except the initial Transpose.
Given that the first Transpose was not converted to an HLS layer, and the accelerator works with a dataset of shape 1x14x14x1 and dtype UINT8, I scaled the original float32 dataset to np.uint8 (dataset*255.astype(np.uint8))) for inference on ZCU102. Though the generated validated file includes reshaping the data to the desired shape, I tried to input the data by reshaping and without reshaping. The results were the same in both cases i.e. 68%.
During the built process I included the verification steps that show the successful results again sample input and expected output even the built accelerator presents the correct output for the sample input but for the overall dataset, the accuracy drops to 68% rather than 86%. And even for the whole dataset After performing the Initial Tidyup Transformations below, the accuracy of the brevitas model exported to ONNX gives 86% accuracy.
Initial Tidyup Transformation:
bo.export_finn_onnx(brevitas_model, (1, 1, 14, 14), "export.onnx");
model = ModelWrapper("export.onnx")
model = model.transform(InferShapes())
model = model.transform(FoldConstants())
...
output_dict = oxe.execute_onnx(model_t, input_dict)
Accelerator Built Steps After Brevitas Model Exported to onnx
import brevitas.onnx as bo
bo.export_finn_onnx(model, (1, 1, 14, 14), "export.onnx");
from finn.util.pytorch import ToTensor
from qonnx.transformation.merge_onnx_models import MergeONNXModels
from qonnx.core.modelwrapper import ModelWrapper
from qonnx.core.datatype import DataType
from qonnx.transformation.insert_topk import InsertTopK
import finn.builder.build_dataflow as build
def custom_step_add_post_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):
model = model.transform(InsertTopK(k=1))
return model
def custom_step_add_pre_proc(model: ModelWrapper, cfg: build.DataflowBuildConfig):
ishape = model.get_tensor_shape(model.graph.input[0].name)
# preprocessing: torchvision's ToTensor divides uint8 inputs by 255
preproc = ToTensor()
bo.export_finn_onnx(preproc, ishape, "preproc.onnx", opset_version=11)
preproc_model = ModelWrapper("preproc.onnx")
# set input finn datatype to UINT8
preproc_model.set_tensor_datatype(preproc_model.graph.input[0].name, DataType["UINT8"])
# merge pre-processing onnx model with cnv model (passed as input argument)
model = model.transform(MergeONNXModels(preproc_model))
return model
import finn.builder.build_dataflow as build
import finn.builder.build_dataflow_config as build_cfg
import os
import shutil
model_file = "export.onnx"
rtlsim_output_dir = "output"
#Delete previous run results if exist
if os.path.exists(rtlsim_output_dir):
shutil.rmtree(rtlsim_output_dir)
print("Previous run results deleted!")
cfg_stitched_ip = build.DataflowBuildConfig(
output_dir = rtlsim_output_dir,
mvau_wwidth_max = 160,
synth_clk_period_ns = 20.0,
target_fps = 2000000,
board = "ZCU102",
fpga_part = "xczu9eg-ffvb1156-2-e",
shell_flow_type = build_cfg.ShellFlowType.VIVADO_ZYNQ,
folding_two_pass_relaxation = True,
folding_config_file = "auto_folding_config.json",
steps=[custom_step_add_pre_proc,
custom_step_add_post_proc,
"step_qonnx_to_finn",
"step_tidy_up",
"step_streamline",
"step_convert_to_hls",
"step_create_dataflow_partition",
"step_target_fps_parallelization",
"step_apply_folding_config",
"step_generate_estimate_reports",
"step_hls_codegen",
"step_hls_ipgen",
"step_set_fifo_depths",
"step_create_stitched_ip",
"step_measure_rtlsim_performance",
"step_out_of_context_synthesis",
"step_synthesize_bitfile",
"step_make_pynq_driver",
"step_deployment_package",
],
generate_outputs=[
build_cfg.DataflowOutputType.ESTIMATE_REPORTS,
build_cfg.DataflowOutputType.STITCHED_IP,
build_cfg.DataflowOutputType.RTLSIM_PERFORMANCE,
build_cfg.DataflowOutputType.OOC_SYNTH,
build_cfg.DataflowOutputType.BITFILE,
build_cfg.DataflowOutputType.PYNQ_DRIVER,
build_cfg.DataflowOutputType.DEPLOYMENT_PACKAGE,
],
verify_steps=[
build_cfg.VerificationStepType.QONNX_TO_FINN_PYTHON,
build_cfg.VerificationStepType.TIDY_UP_PYTHON,
build_cfg.VerificationStepType.STREAMLINED_PYTHON,
build_cfg.VerificationStepType.FOLDED_HLS_CPPSIM,
build_cfg.VerificationStepType.STITCHED_IP_RTLSIM,
]
)
build.build_dataflow_cfg(model_file, cfg_stitched_ip)
Moreover, Runtime_writeable_weights are enabled (set to 1) in the .json file for MVAU of CNV and Linear Layers, following the guidelines in 4_advanced_builder_settings and cnv-w1a1_folding_config.
I would appreciate any assistance in debugging this issue.
@fpjentzsch, you mentioned in your reply #995 that reshaping alone might not be sufficient. Could you please provide further guidance, considering my specific setup, to achieve the desired accuracy on the accelerator?
Thank you in advance for your help.
The text was updated successfully, but these errors were encountered: