-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External weights vivado accelerator #646
base: main
Are you sure you want to change the base?
External weights vivado accelerator #646
Conversation
The VivadoAccelerator backend create a Vivado IP that can be easily integrated in a Zynq/ZynqMP setup. The input features, the output predictions, and the weights sit in main memory and get moved via AXI-Master interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, that's a nice feature! I have some general feedback here and some specific points that I'll address inline.
- Can we decouple ‘repgrommable weights’ from ‘AXI Master interface’? I’d like to be able to do both:
— AXI Master interface without reprogrammable weights
— AXI Stream interface with reprogrammable weights (maybe only the NN in/out on AXI Stream interface while weights use AXI Master?) - Each weight of the model will have its own top-level interface/port. How about a mode with a single 'weights' port with one type that can be configured (so could be some
ap_fixed
orfloat
,double
for example) with casting to the proper weights types in the HLS. This would be similar to what we allow now for inputs/outputs with VivadoAccelerator - Can we have a C driver that has some function taking the data as argument and returning the NN prediction? And some other function to reprogram the weights. We can call those from the standalone example that puts the data in header files, but they would be helpful for anyone integrating an hls4ml NN in a real application
- Can we have a Python driver for this? - I had some pynq Python for an AXI Master interface before, so I can help with this
- Can we have the design for pynq-z2 as well? - I can help with this also
It might be easier to split these contributions into different PRs, roughly:
- AXI Master interface
- C Driver
- Ultra96 support
- Reprogrammable weights
Also, Xilinx SDK is a legacy tool, superseded by Vitis. Can we do it with Vitis instead?
io_type = self.config.get_config_value('IOType') | ||
interface = self.config.get_config_value('AcceleratorConfig')['Interface'] if self.config.get_config_value('AcceleratorConfig') else None | ||
config_weights = (io_type == 'io_stream') and (interface == 'axi_master') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
- I don't think IOType of io_stream and Interface of axi_master should be the trigger for reprogrammable weights, I think it would be better as a new configuration parameter (of the AcceleratorConfig). Then there would probably need to be some
assert
somewhere to allow only combinations of those other parameters that have been implemented for (ie right now it also depends on the board) - As written this can't go here in
ModelGraph
as it's backend specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This was mostly a placeholder, and I was going to ask what should be better.
As in the main thread, do we already have a matrix or list of the existing possible combinations (what goes with what)? I am not sure how to properly set up the configuration/trigger. Do you have suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started putting together a document, which is far from being final. @thesps and @jmitrevs should be able to edit. Not sure if that may help or if it is better to keep it to the comments here. I bit of both I guess.
If I understand it right the additional configuration parameters (to enable programmable weights) should be passed via the function create_initial_config for the VivadoAccelerator
backend.
If you agree, then I would add:
configurable_weights
bool
, optional- weights are configurable at run time and thus they are exposed to the wrapper interface
- defaults to
False
weight_type
dict
, optional- a dictionary that specifies the data type for set of layer weights (can be
float
or anap_type
); if the type is not specified for a layer, then it defaults tofloat
- defaults to an empty dictionary that is all of the weights are
float
- note: VivadoAcceleratorBackend will round the number of bits used to the next power-of-2 value.
Please let me know if that makes sense.
io_type = self.config.get_config_value('IOType') | ||
interface = self.config.get_config_value('AcceleratorConfig')['Interface'] if self.config.get_config_value('AcceleratorConfig') else None | ||
config_weights = (io_type == 'io_stream') and (interface == 'axi_master') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
@@ -6,7 +6,7 @@ def match(self, node): | |||
cast = False | |||
if isinstance(node, Activation): | |||
cast = node.get_input_variable().type.precision != node.get_output_variable().type.precision | |||
return isinstance(node, Activation) and node.get_attr('activation') == 'linear' and not cast | |||
return isinstance(node, Activation) and node.get_attr('activation') == 'linear' # and not cast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? I don't think this should be included in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a quick and dirty hack to get some models to optimize better, but it really was meant only for that branch, where we had checked the correctness. I agree that it should not be in the PR.
#include <stdlib.h> | ||
#include <math.h> | ||
#include <cstdlib> | ||
#include <cmath> | ||
#include <cfloat> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to include cfloat
at a certain point, but I do not need it in the current version of the code.
When I was at it I changed to cmath
and cstdlib
, that I usually include that way when working in C++. We can discard this change.
rm -f $(DESIGN)_standalone/src/helloworld.c | ||
cd $(DESIGN)_standalone/src && ln -s ../../common/main.c main.c | ||
cd $(DESIGN)_standalone/src && ln -s ../../common/data.h data.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't try running anything yet, but it seems like these files might not have the same names in the driver & standalone example
io_type = model.config.get_config_value('IOType') | ||
interface = model.config.get_config_value('AcceleratorConfig')['Interface'] if model.config.get_config_value('AcceleratorConfig') else None | ||
config_weights = (io_type == 'io_stream') and (interface == 'axi_master') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before that I think a different trigger for reprogrammable weights is needed
newline = '' | ||
for v in model.get_weight_variables(): | ||
newline += indent + ', model_axi_t {name} [{shape}]\n'.format(name=v.name, shape=v.data_length) | ||
newline += indent + ', char load_weights' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not bool
for load_weights
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It came from a previous iteration. For a cleaner/safer C++ code, bool
is the right choice. In hardware, it would become a single-bit register if that was just in the module, while on an AXI interface you can only use 8, 16, 32, etc sized words.
def write_new_tar(self, model): | ||
os.remove(model.config.get_output_dir() + '.tar.gz') | ||
super(VivadoAcceleratorWriter, self).write_tar(model) | ||
|
||
|
||
def write_standalone_app(self, model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the standalone app should be put in the examples instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function and the write_header_file
can be either part of the example or come with the hls4ml code base.
f.close() | ||
fout.close() | ||
|
||
def write_header_file(model, X, y, y_keras, y_hls, n_samples, filename='data.h'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the comment above - maybe this should be an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am good with moving those functions to the examples, and later on, we can "promote" them to the hls4ml codebase if necessary.
@thesps thank you for all of the comments that help a lot! I will reply both here and to the other inlined questions.
We need the greater flexibility that you suggest. I agree. Do we have a matrix or list of the existing possible combinations (what goes with what)? We may also define how to pass this configuration information to the function
We can fine-tune the weights to properly-sized ports. Not sure if that would significantly increase the logic. As you noticed, for simplicity, we bring in all the weights as
Definitely. The current software application was mostly for proof-of-concept and debugging. We can break it down into a more organic library.
Yes. Python would require an OS (or a lighter-weight RTOS). We are not booting an OS right now, so a C/C++ bare-metal application was sufficient as we had for the MLPerf Tiny submission. I will comment a little more about this question.
Definitely. I have a few more boards in my working directory that I did not push yet. Essentially there are three classes of chips/boards:
I like the breakdown.
This is a good question, and @jmitrevs brought it up as well. I did not use Vitis seriously enough, and I assume it may come with a similar flow and equivalent tools (Vivado HLS -> Vivado -> Vivado SDK). Given Vitis, I am unsure how much we should push on the software side. Do you have any experience with the software stack in Vitis? In the longer term, I would also like to close the loop and create software applications/library for Linux. I started something based on Petalinux etc. but that once again may overlap what Vitis is doing. Similarly, there is an ongoing effort to use the PYNQ software stack and overlays. I am wondering if we should look at that as well (I do not have experience with it, but I guess there should a good degree of compatibility). |
Breaking down this draft in 4 PRs. Let's start with PR653 |
Extend the
VivadoAccelerator
backend and add programmable weights. An ideal setup is a Xilinx Zynq/ZynqMP board (ARM core + programmable logic).The backend generates code for Vivado HLS, Vivado, and Vivado SDK:
Type of change
Some changes to the signature of the function
convert_from_keras_model()
:and a new function
write_header_file()
to write a header file with an harcoded dataset:Tests
You can test it with the example at this repo: https://github.com/GiuseppeDiGuglielmo/test-hls4ml-backend
Right now, we support Ultra96v2, but more Zynq/ZynqMP boards can be added.