Note
For now only pre-approved collaborators can pull this repo._ Please ask bgrady-tt for access!
git clone [email protected]:bgrady-tt/tt-npe.git
cd tt-npe/
./build-noc-model.sh # setup and run cmake
Everything is installed to tt-npe/install/
, including:
- Shared library (
install/lib/libtt_npe.so
) - Headers C++ API (
install/include/
) - Python CLI using pybind11 (
install/bin/tt_npe.py
) - C++ CLI (
install/bin/tt_npe_run
)
tt-npe has two unit test suites; one for C++ code and one for Python.
$ tt_npe/scripts/run_ut.sh # can be run from any pwd
tt-npe simulates the behavior of an abstract NoC "workload" running on a virtual Tenstorrent device. A workload corresponds closely to a trace of all calls to the dataflow_api (i.e. noc_async**). In fact, we can generate workloads directly from NoC traces extracted from real devices (support and documentation for doing this in tt-metal is in progress).
tt-npe can work with both
- Predefined workloads defined in YAML files, potentially derived from real NoC traces
- Programmatically constructing a workload out of a
tt_npe::npeWorkload
data structure (npe.Workload
in Python).
Some examples of workload files included in the repo can be found in:
tt-npe/tt_npe/workload/noc_trace_yaml_workloads/
Run the following to add tt-npe install dir to your $PATH
:
cd tt-npe/
source ENV_SETUP # add <tt_npe_root>/install/bin/ to $PATH
Now run the following:
tt_npe.py -w tt_npe/workload/example.yaml
Note: the -w
argument is required, and specifies the YAML workload file to load.
Bandwidth derating caused by congestion between concurrent transfers is modelled by default. Congestion modelling can be disabled using --cong-model none
.
The -e
option dumps detailed information about simulation timeline (e.g. congestion and transfer state for each timestep) into a JSON file located at npe_stats.json
(by default). Future work is to load this data into a visualization tool, but it could be used for ad-hoc analysis as well.
See tt_npe.py --help
for more information about available options.
tt-npe workloads are comprimised as collections of Transfers
. Each Transfer
represents a series of back-to-back packets from one source to one or more destinations. This is roughly equivalent to a single call to the dataflow APIs noc_async_read
and noc_async_write
.
Transfers
are grouped hierarchically (see diagram). Each workload is a collection of Phases
, and each Phase
is a group of Transfers
.
For most modelling scenarios, putting all Transfers
in a single monolithic Phase
is the correct approach. The purpose of multiple Phases
is to express data dependencies and synchronization common in real workloads. However full support for this is not yet complete.
See the example script install/bin/programmatic_workload_generation.py
for an annotated example of generating and simulating a tt-npe NoC workload via
Python bindings.
Open tt_npe/doc/tt_npe_pybind.html
to see full documentation of the tt-npe Python API.
The C++ API requires:
- Including the header
install/include/npeAPI.hpp
- Linking to the shared lib
libtt_npe.so
See the example C++ based CLI source code within tt-npe/tt_npe/cli/
. This links libtt_npe.so
as a shared library, and serves as a reference for interacting with the API.
tt-npe does not currently model the following; features with a * are being prioritized.
- *Blackhole device support
- User defined data dependencies
- Ethernet
- Multichip traffic