Beginnings of the oneAPI backend (#955)

* snapshot adding oneapi * fix reduce constexpr * further updates * update the bridge and testbench * fix issues discovered when compiling * update bridge writing files * build library (but not tested) * fix a bug in testbench * snapshot after some debugging * remove forgotten debug printing * add build * pre-commit fixes * fix more pre-commit * fix more pre-commit errors * snapshot of work before reworking types * Use using to decide array type, some preliminary updates * snapshot unifying types * fix the testbench and bridge * snapshot updating nnet_utils (not finished) * define array in nnet_types for oneAPI * fix parallel conv2d * add back the streaming versions of algs, most unconverted * tentatively complete streaming for dense but not functional * first version that compiles streaming * change how the pipe value type is extracted * fix pre-commit error * always treat elu as ELU class * fix batchnorm * snapshot towards fixing conv * snapshot fixing test for streaming * fix conv1d * fix conv2d * fix reshape and flatten for oneAPI * initial oneAPI tests * remove nnet_dense_compressed from oneAPI * add merge functionality (untested) * fix merge for oneAPI * fix merge for oneAPI (missing commit) * add zeropadding * standardize paralellization spelling * fix pointwise for oneAPI * remove references to quartus * more replace quartus with oneapi * snapshot on the way towards implementing pooling * fix io_stream pooling for oneAPI * add fix for Conv2DBatchnorm * accidentally committed CMakeLists.txt in my debug setup * reshaping, not fully tested * fix cloning of streams * fix pytest library loading * remove unused template * fix some activation bugs * fix the overwriting of directories in the pytest * update version of test repository * try to fix docker issue * bump hls4ml-testing tag to 0.5.2 * try not restricting tensorflow-model-optimizatoin * Update to 0.5.3 for testing * bump to docker image 0.5.4, suggested by Ben * fix pre-commit warning * dial down N_TESTS_PER_YAML to 4 * revert tensorflow-model-optimization change * fix issue of saving in "obsolete" h5 format * fix embedding for oneAPI * First attempt at adding RNNs to oneAPI * fix bug in array size * fix order or indices * make queues static in bridge * fix logic error in repack stream * changing the style, but functionally identical * update pointwise optimizer for oneAPI * add oneAPI to test_multi_dense.py * fix updating weight types * initial changes of templates, for testing * fix weight naming, product selection * make im2col the default; fix winograd size * fix up streaming dense and convolution * fix prelu, some batchnorm * fix weight array of exponential types * move ACExponentialPrecisionDefinition to oneapi_types * attempt to fix batchnorm and recurrent * fixed BatchNormalizationQuantizedTanhConfigTemplate template selection * fix embedding_stream * fix lstm and simple rnn * fix GRU * fix winograd, and also disable it by default * fix threshold name * split bn_quant to be backend-specific * add type inference to oneAPI * add oneAPI to pytorch tests * fix pooling with padding for oneAPI and Quartus * Compilation for larger models enabled by increasing -fconstexpr-steps * add oneapi clone tests; remove reduntand multi_clone test * remove some attributes to avoid overwrite warnings * make extra handling for oneAPI like others (as in PR #1067) * remove warnings for extra optimizers that are not scheduled on purpose * update parametrized activations * fix reference to alpha that had not been switched to param * add oneapi documentation * add parallelization factor to the attributes for oneAPI --------- Co-authored-by: Lauri Laatu <[email protected]> Co-authored-by: Jan-Frederik Schulte <[email protected]>
fastmachinelearning · Oct 25, 2024 · 4518537 · 4518537
1 parent 03096cf
commit 4518537
Show file tree

Hide file tree

Showing 101 changed files with 10,764 additions and 169 deletions.
diff --git a/docs/advanced/oneapi.rst b/docs/advanced/oneapi.rst
@@ -0,0 +1,35 @@
+==============
+oneAPI Backend
+==============
+
+The ``oneAPI`` backend of hls4ml is designed for deploying NNs on Intel/Altera FPGAs. It will eventually
+replace the ``Quartus`` backend, which should really have been called the Intel HLS backend. (The actual Quartus
+program continues to be used with IP produced by the ``oneAPI`` backend.)
+This section discusses details of the ``oneAPI`` backend.
+
+The ``oneAPI`` code uses SYCL kernels to implement the logic that is deployed on FPGAs. It naturally leads to the
+accelerator style of programming. In the IP Component flow, which is currently the only flow supported, the
+kernel becomes the IP, and the "host code" becomes the testbench. An accelerator flow, with easier deployment on
+PCIe accelerator boards, is planned to be added in the future.
+
+The produced work areas use cmake to build the projects in a style based
+`oneAPI-samples <https://github.com/oneapi-src/oneAPI-samples/tree/main/DirectProgramming/C%2B%2BSYCL_FPGA>`_.
+The standard ``fpga_emu``, ``report``, ``fpga_sim``, and ``fpga`` are supported. Additionally, ``make lib``
+produces the library used for calling the ``predict`` function from hls4ml. The ``compile`` and ``build`` commands
+in hls4ml interact with the cmake system, so one does not need to manually use the build system, but it there
+if desired.
+
+The ``oneAPI`` backend, like the ``Quartus`` backend, only implements the ``Resource`` strategy for the layers. There
+is no ``Latency`` implementation of any of the layers.
+
+Note:  currently tracing and external weights (i.e. setting BramFactor) are not supported.
+
+io_parallel and io_stream
+=========================
+
+As mentioned in the :ref:`I/O Types` section, ``io_parallel`` is for small models, while ``io_stream`` is for
+larger models. In ``oneAPI``, there is an additional difference: ``io_stream`` implements each layer on its
+own ``task_sequence``. Thus, the layers run in parallel, with pipes connecting the inputs and outputs. This
+is similar in style to the `dataflow` implementation on Vitis, but more explicit. On the other hand, ``io_parallel``
+always uses a single task, relying on pipelining within the task for good performance. In contrast, the Vitis
+backend sometimes uses dataflow with ``io_parallel``.
diff --git a/docs/index.rst b/docs/index.rst
@@ -24,6 +24,7 @@
 
     advanced/fifo_depth
     advanced/extension
+    advanced/oneapi
     advanced/accelerator
     advanced/model_optimization
 

diff --git a/hls4ml/backends/__init__.py b/hls4ml/backends/__init__.py
@@ -1,5 +1,6 @@
 from hls4ml.backends.backend import Backend, get_available_backends, get_backend, register_backend  # noqa: F401
 from hls4ml.backends.fpga.fpga_backend import FPGABackend  # noqa: F401
+from hls4ml.backends.oneapi.oneapi_backend import OneAPIBackend
 from hls4ml.backends.quartus.quartus_backend import QuartusBackend
 from hls4ml.backends.symbolic.symbolic_backend import SymbolicExpressionBackend
 from hls4ml.backends.vivado.vivado_backend import VivadoBackend
@@ -16,3 +17,4 @@
 register_backend('Quartus', QuartusBackend)
 register_backend('Catapult', CatapultBackend)
 register_backend('SymbolicExpression', SymbolicExpressionBackend)
+register_backend('oneAPI', OneAPIBackend)
diff --git a/hls4ml/backends/fpga/passes/bn_quant.py → hls4ml/backends/catapult/passes/bn_quant.py b/hls4ml/backends/fpga/passes/bn_quant.py → hls4ml/backends/catapult/passes/bn_quant.py
diff --git a/hls4ml/backends/oneapi/__init__.py b/hls4ml/backends/oneapi/__init__.py