Skip to content

Dragonfly Plus

Neil McGlohon edited this page Mar 5, 2018 · 1 revision

Dragonfly Plus

NOTE: The model and related files currently exist in a fork from the mainline CODES and will be merged shortly.

Introduction

This page is a basic introduction to running the Dragonfly+ (DF+/DFP) CODES model. Future edits will expand this to be a ground-up tutorial and introduction to the topology and its features.

Quick Start

The Dragonfly+ CODES model is heavily sourced from the Dragonfly-Custom model so the general workflow of running a simulation is similar. It can be broken down into four steps:

  1. Generate Topology
  2. Write Configuration File
  3. Write CODES Workload
  4. Run Simulation

1. Generate Topology

Similar to Dragonfly-Custom, the DFP topology is stored in binary files that are read at simulation runtime. There are two files that make up the router topology: inter and intra files.

inter

The inter-file is a binary file storing pairs of integers. Each pair of integers represents a, global, interconnection between two routers where each integer is the relative router ID of the endpoints of the edge in the network-graph. The range of these integers is from [0,total_routers_in_network).

intra

The intra-file is also a binary file storing pairs of integers. But these integers only range from [0,num_routers_per_group), i.e. the relative local router IDs of the connected routers. This file consists of all local connections within a group. Dragonfly Plus assumes that all groups intra-topologies are the same.

Generator Scripts

I have written python scripts to make the generation of these topology files simple. Stored in codes/scripts/dragonfly-plus there is a python script called dragonfly-plus-topo-gen-v2.py. The usage for this script is as follows: (note: "pg" == shorthand for 'per-group')

python3 dragonfly-plus-topo-gen-v2.py <num_groups> <num_spine_pg> <num_leaf_pg> <router_radix> <num_terminal_per_leaf> <intra_filename> <inter_filename> --<Loudness>

This script will create two files with supplied filenames for the respective links. Loudness is an optional parameter: --debug, --extra-loud, --loud, --standard, --quiet. Ranging from full output to stdout of actions that the script is making to zero output.

There is an additional option: --dry-run which will do minimal work to verify input and print out stats on the to-be-generated network. It will not generate the inter or intra connections or create the files. This is to allow for basic sanity checking of your topology before committing to the generation of very large networks.

There is also an older script (Version 1) which was not as full-featured and did not have the software structure to allow for more complex DFP topologies which Version 2 was explicitly designed to allow.

2. Write Configuration File

Again, Dragonfly+ follows the standard procedure for network configuration like other CODES model-net models. Most of the settings are identical in nature to that of Dragonfly-Custom with differences in how the topologies themselves are described.

Dragonfly+ Specific .conf Parameters

Dragonfly+ requires that these parameters be defined in the configuration file:

  • num_router_spine="" == The number of spine routers per group
  • num_router_leaf="" == The number of leaf routers per group

There are other Dragonfly+ specific parameters by design that have no corresponding implementation as of yet. Everything else follows the Dragonfly-Custom configuration file schema.

Routing Algorithm Selection
  • routing="" == The routing algorithm to be used by the routers

This parameter has currently four implemented options:

  • "minimal" == Always routes packets on minimal route
  • "non-minimal-spine" == Always routes packets to a spine in an intermediate group*
  • "non-minimal-leaf" == Always routes packets to a leaf in an intermediate group
  • "on-the-fly-adaptive" == Progressive Adaptive routing of packet between minimal and nonminimal routes

*Note: The non-minimal-spine routing algorithm is not guaranteed to work for all topologies. If every spine router does not have a connection to every other group, it is possible for there to not be a non-minimal-spine route between two terminals and the simulation will fail. Adaptive does not have this failure as it progressively determines where to route the packet based on the available

3. Write CODES Workload

CODES modelnet models generally need a workload of servers to send and receive packets throughout the network. Located in the src/network-workloads directory is the model-net-synthetic-dfly-plus.c source file for the currently implemented synthetic workload. It is a direct port from Dragonfly-Custom with only minor differences in configuration parsing.

4. Run Simulation

If you've followed the first three steps, all that is left is to run the simulation.

While in the build directory, executing the command below will start the simulation with the supplied configuration:

mpirun -n 4 ./bin/model-net-synthetic-dfly-plus --synch=3 -- ../src/network-workloads/conf/dragonfly-plus/modelnet-test-dragonfly-plus.conf

Clone this wiki locally