Skip to content

Commit

Permalink
Beginnings of the oneAPI backend (#955)
Browse files Browse the repository at this point in the history
* snapshot adding oneapi

* fix reduce constexpr

* further updates

* update the bridge and testbench

* fix issues discovered when compiling

* update bridge writing files

* build library (but not tested)

* fix a bug in testbench

* snapshot after some debugging

* remove forgotten debug printing

* add build

* pre-commit fixes

* fix more pre-commit

* fix more pre-commit errors

* snapshot of work before reworking types

* Use using to decide array type, some preliminary updates

* snapshot unifying types

* fix the testbench and bridge

* snapshot updating nnet_utils (not finished)

* define array in nnet_types for oneAPI

* fix parallel conv2d

* add back the streaming versions of algs, most unconverted

* tentatively complete streaming for dense but not functional

* first version that compiles streaming

* change how the pipe value type is extracted

* fix pre-commit error

* always treat elu as ELU class

* fix batchnorm

* snapshot towards fixing conv

* snapshot fixing test for streaming

* fix conv1d

* fix conv2d

* fix reshape and flatten for oneAPI

* initial oneAPI tests

* remove nnet_dense_compressed from oneAPI

* add merge functionality (untested)

* fix merge for oneAPI

* fix merge for oneAPI (missing commit)

* add zeropadding

* standardize paralellization spelling

* fix pointwise for oneAPI

* remove references to quartus

* more replace quartus with oneapi

* snapshot on the way towards implementing pooling

* fix io_stream pooling for oneAPI

* add fix for Conv2DBatchnorm

* accidentally committed CMakeLists.txt in my debug setup

* reshaping, not fully tested

* fix cloning of streams

* fix pytest library loading

* remove unused template

* fix some activation bugs

* fix the overwriting of directories in the pytest

* update version of test repository

* try to fix docker issue

* bump hls4ml-testing tag to 0.5.2

* try not restricting tensorflow-model-optimizatoin

* Update to 0.5.3 for testing

* bump to docker image 0.5.4, suggested by Ben

* fix pre-commit warning

* dial down N_TESTS_PER_YAML to 4

* revert tensorflow-model-optimization change

* fix issue of saving in "obsolete" h5 format

* fix embedding for oneAPI

* First attempt at adding RNNs to oneAPI

* fix bug in array size

* fix order or indices

* make queues static in bridge

* fix logic error in repack stream

* changing the style, but functionally identical

* update pointwise optimizer for oneAPI

* add oneAPI to test_multi_dense.py

* fix updating weight types

* initial changes of templates, for testing

* fix weight naming, product selection

* make im2col the default; fix winograd size

* fix up streaming dense and convolution

* fix prelu, some batchnorm

* fix weight array of exponential types

* move ACExponentialPrecisionDefinition to oneapi_types

* attempt to fix batchnorm and recurrent

* fixed BatchNormalizationQuantizedTanhConfigTemplate template selection

* fix embedding_stream

* fix lstm and simple rnn

* fix GRU

* fix winograd, and also disable it by default

* fix threshold name

* split bn_quant to be backend-specific

* add type inference to oneAPI

* add oneAPI to pytorch tests

* fix pooling with padding for oneAPI and Quartus

* Compilation for larger models enabled by increasing -fconstexpr-steps

* add oneapi clone tests; remove reduntand multi_clone test

* remove some attributes to avoid overwrite warnings

* make extra handling for oneAPI like others (as in PR #1067)

* remove warnings for extra optimizers that are not scheduled on purpose

* update parametrized activations

* fix reference to alpha that had not been switched to param

* add oneapi documentation

* add parallelization factor to the attributes for oneAPI

---------

Co-authored-by: Lauri Laatu <[email protected]>
Co-authored-by: Jan-Frederik Schulte <[email protected]>
  • Loading branch information
3 people authored Oct 25, 2024
1 parent 03096cf commit 4518537
Show file tree
Hide file tree
Showing 101 changed files with 10,764 additions and 169 deletions.
35 changes: 35 additions & 0 deletions docs/advanced/oneapi.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
==============
oneAPI Backend
==============

The ``oneAPI`` backend of hls4ml is designed for deploying NNs on Intel/Altera FPGAs. It will eventually
replace the ``Quartus`` backend, which should really have been called the Intel HLS backend. (The actual Quartus
program continues to be used with IP produced by the ``oneAPI`` backend.)
This section discusses details of the ``oneAPI`` backend.

The ``oneAPI`` code uses SYCL kernels to implement the logic that is deployed on FPGAs. It naturally leads to the
accelerator style of programming. In the IP Component flow, which is currently the only flow supported, the
kernel becomes the IP, and the "host code" becomes the testbench. An accelerator flow, with easier deployment on
PCIe accelerator boards, is planned to be added in the future.

The produced work areas use cmake to build the projects in a style based
`oneAPI-samples <https://github.com/oneapi-src/oneAPI-samples/tree/main/DirectProgramming/C%2B%2BSYCL_FPGA>`_.
The standard ``fpga_emu``, ``report``, ``fpga_sim``, and ``fpga`` are supported. Additionally, ``make lib``
produces the library used for calling the ``predict`` function from hls4ml. The ``compile`` and ``build`` commands
in hls4ml interact with the cmake system, so one does not need to manually use the build system, but it there
if desired.

The ``oneAPI`` backend, like the ``Quartus`` backend, only implements the ``Resource`` strategy for the layers. There
is no ``Latency`` implementation of any of the layers.

Note: currently tracing and external weights (i.e. setting BramFactor) are not supported.

io_parallel and io_stream
=========================

As mentioned in the :ref:`I/O Types` section, ``io_parallel`` is for small models, while ``io_stream`` is for
larger models. In ``oneAPI``, there is an additional difference: ``io_stream`` implements each layer on its
own ``task_sequence``. Thus, the layers run in parallel, with pipes connecting the inputs and outputs. This
is similar in style to the `dataflow` implementation on Vitis, but more explicit. On the other hand, ``io_parallel``
always uses a single task, relying on pipelining within the task for good performance. In contrast, the Vitis
backend sometimes uses dataflow with ``io_parallel``.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

advanced/fifo_depth
advanced/extension
advanced/oneapi
advanced/accelerator
advanced/model_optimization

Expand Down
2 changes: 2 additions & 0 deletions hls4ml/backends/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from hls4ml.backends.backend import Backend, get_available_backends, get_backend, register_backend # noqa: F401
from hls4ml.backends.fpga.fpga_backend import FPGABackend # noqa: F401
from hls4ml.backends.oneapi.oneapi_backend import OneAPIBackend
from hls4ml.backends.quartus.quartus_backend import QuartusBackend
from hls4ml.backends.symbolic.symbolic_backend import SymbolicExpressionBackend
from hls4ml.backends.vivado.vivado_backend import VivadoBackend
Expand All @@ -16,3 +17,4 @@
register_backend('Quartus', QuartusBackend)
register_backend('Catapult', CatapultBackend)
register_backend('SymbolicExpression', SymbolicExpressionBackend)
register_backend('oneAPI', OneAPIBackend)
File renamed without changes.
Empty file.
Loading

0 comments on commit 4518537

Please sign in to comment.