address first half of comments

sampsyo · Dec 20, 2023 · 6d89fea · 6d89fea
1 parent 31b0494
commit 6d89fea
Showing 1 changed file with 4 additions and 4 deletions.
diff --git a/content/blog/2023-12-09-hcl-amc/index.md b/content/blog/2023-12-09-hcl-amc/index.md
@@ -24,13 +24,13 @@ When CPU and GPU optimizations have been exhausted, custom hardware accelerators
 
 ### High-Level Synthesis
 
-High-Level Synthesis (HLS) is one such solution to solve the problem of hardware design productivity. The general idea is to raise the level of abstraction from the commonly-used register transfer level (RTL) (e.g. Verilog) to a higher-level language like C/C++. This allows the designer to focus more at the algorithm level and less on the hardware details, such as interfaces and timing. The "magic" of HLS compilers is that it can infer a gate-level mapping from the high-level language to the underlying hardware. The main steps of generating this mapping are scheduling, resource allocation, and binding. In the end, the outputted RTL code from HLS can be synthesized by downstream tools. However, current HLS tools greatly fall short in their promise to free designers from thinking at the architecture level. They rely on special directives (e.g., C++ pragmas) from the designer to guide the optimization process. Oftentimes, the designer must even rewrite their code to fit the HLS compiler's model of computation.
+High-Level Synthesis (HLS) is one such solution to solve the problem of hardware design productivity. The general idea is to raise the level of abstraction from the commonly-used register transfer level (RTL) (e.g., Verilog) to a higher-level language like C/C++. This allows the designer to focus more at the algorithm level and less on the hardware details, such as interfaces and timing. The "magic" of HLS compilers is that it can infer a gate-level mapping from the high-level language to the underlying hardware. The main steps of generating this mapping are scheduling, resource allocation, and binding. In the end, the outputted RTL code from HLS can be synthesized by downstream tools. However, current HLS tools greatly fall short in their promise to free designers from thinking at the architecture level. They rely on special directives (e.g., C++ pragmas) from the designer to guide the optimization process. Oftentimes, the designer must even rewrite their code to fit the HLS compiler's model of computation.
 
 ### MLIR and Incubator Projects
 
 Reinventing HLS with advanced compiler techniques is an active area of research. There are many outstanding HLS tools/frameworks such as [TAPA](https://tapa.readthedocs.io/en/release/overview/overview.html), [Dynamatic](https://dynamatic.epfl.ch/), and [HeteroCL](https://heterocl.csl.cornell.edu/). However, these tools are developed independently with different compilation flows, which brings difficulties of integrating them together. [MLIR](https://mlir.llvm.org/) is a new compiler design paradigm where the source language is compiled through multiple levels of modularized intermediate representations (IRs), also known as dialects. Dialects act like domain-specific languages (DSLs) and can capture the approprate details at each level of abstraction.
 
-The [CIRCT](https://circt.llvm.org/) project expands the MLIR-based development methodology for hardware design. It represents key components of hardware as MLIR dialects such as finite state machines (FSM), pipelines, and interface handshaking. HeteroCL has been migrated to the MLIR ecosystem as a dialect, with a new Python frontend called HCL. HCL decouples the interactions between the algorithm, hardware optimizations, and backend targets to enable productive design and testing. Lastly, Accelerator Memory Compiler (AMC) is an MLIR dialect for representing memory architecture. Its expressiveness is able to capture common memory organization strategies such as partitioning, banking, and arbitration. AMC can be further lowered to Calyx, which is also part of the CIRCT ecosystem. Calyx IR gives us a pathway to finally to synthesizable Verilog. The contribution of this project is that we integrated HCL with AMC to enable a Python frontend for AMC. This allows us to use HCL to describe the algorithm and AMC to describe the memory architecture. The resulting design can be compiled to Verilog and simulated with a single function call. In the end, we hope that this integration will enable a more productive design flow for hardware accelerators as well as help us find more bugs in AMC.
+The [CIRCT](https://circt.llvm.org/) project expands the MLIR-based development methodology for hardware design. It represents key components of hardware as MLIR dialects such as finite state machines (FSM), pipelines, and interface handshaking. HeteroCL has been migrated to the MLIR ecosystem as a dialect, with a new Python frontend called HCL. HCL decouples the interactions between the algorithm, hardware optimizations, and backend targets to enable productive design and testing. Lastly, Accelerator Memory Compiler (AMC) is an MLIR dialect for representing memory architecture. Its expressiveness is able to capture common memory organization strategies such as partitioning, banking, and arbitration. AMC can be further lowered to Calyx, which is also integrated with the CIRCT ecosystem. Finally, the Calyx compiler gives us a pathway to synthesizable Verilog. The contribution of this project is that we integrated HCL with AMC to enable a Python frontend for AMC. This allows us to use HCL to describe the algorithm and AMC to describe the memory architecture. The resulting design can be compiled to Verilog and simulated with a single function call. In the end, we hope that this integration will enable a more productive design flow for hardware accelerators as well as help us find more bugs in AMC.
 
 ## Design Example
 
@@ -55,7 +55,7 @@ def test_amc():
     f = s.build(target="amc")
     # Run the software simulation by invoking directly
     np_out = kernel(A, B)
-    # Now run the hardware simulation with AMC
+    # Now run the hardware simulation with AMC+Calyx
     hcl_out = f(A, B)
     np.testing.assert_array_equal(hcl_out, np_out)
 ```
@@ -204,7 +204,7 @@ Back to AMC, the custom dialect elaborates the *real* limiting resources of memo
   }
 ```
 
-The role of the AMC compiler is to take in a high-level description of memory organization (as seen above) and figure out how to best compile it to the target architecture. It accounts for some of the properties of underlying architecture, like BRAM size and port count, as well as the context in which the memory is being used. It is a very gradual lowering process, and the explanation of the whole pass pipeline won't even start to fit in this post. However, the following diagram may offer a rough idea of how the core MLIR and AMC dialects to Verilog:
+The role of the AMC compiler is to take in a high-level description of memory organization (as seen above) and figure out how to best compile it to the target architecture. It accounts for some of the properties of underlying architecture, like BRAM size and port count, as well as the general context in which the memory is being used. For example, suppose a 2D matrix is being accesssed along its columns. The compiler may bank the memory by the matrix's rows for higher throughput. The memory compilation is a very gradual lowering process, and the explanation of the whole pass pipeline won't even start to fit in this post. However, the following diagram may offer a rough idea of how the core MLIR and AMC dialects to Verilog:
 
 <center>
 <img src="amc_passes.png" alt="Diagram for AMC pass pipeline" title="AMC pass pipeline" style="zoom:30%;">