Skip to content

Latest commit

 

History

History
105 lines (75 loc) · 6.5 KB

README.md

File metadata and controls

105 lines (75 loc) · 6.5 KB

Component Developer Documentation

This repository is for developer documentation related to various VELOC / SCR components. It also contains documentation on policies that apply to all of the repositories.

Status

All open issues for the components can be viewed on the Components Project Board.

Repo GitHub Actions
KVTree Build&Test
AXL Build Only
SPath Build&Test
Shuffile Build Only
Redset Build Only
Rankstr Build&Test
ER Build Only

Components Diagram

Components Diagram

Component Descriptions

Basic Data Structures and Algorithms

KVTree: Recursive key-value structure

Documentation:

Each KVTree object contains a list of key/value pairs. Each key is a string, each value is another kvtree object. This is a nested data structures, similar to a python dict or perl hash. The library provides functions to serialize a kvtree object to / from a file. It also optionally provides MPI send / recv functions to transfer an object from one process to another.

spath: represent and manipulate file system paths

Documentation:

Create an spath object from a string. The library includes functions to extract components (such as dirname, basename). It can create an absolute path or compute a relative path from a source path to a destination path. It can also simplify a path (i.e., convert ../foo//bar to foo/bar).

rankstr: splits processes into groups based on a set of process which have the same input string

Rankstr uses bitonic sort for a scalable method to identify process groups. It is useful to create a communicator of ranks that all share the same storage device, then rank 0 in this communicator can create directory and inform others that dir has been created with barrier. It is also used to split processes into groups based on failure group (failure group of NODE --> splits MPI_COMM_WORLD into subgroups based on hostname).

File transfers between cache and parallel file system

AXL: Asynchronous transfer library

Documentation:

AXL is used to transfer a file from one path to another using synchronous and asynchronous methods. This can only be done between storage tiers, AXL does not (yet) support movement within a storage tier (such as between 2 compute nodes). Asynchronous methods include via pthreads, IBM BB API, Cray Datawarp. AXL will create directories for destination files.

Redundancy Encoding/Decoding and File Migration

Redset: Encode/decode a set of files with a redundancy method

Documentation:

Redset will create the redundancy data needed for a set of files. It can rebuild a file with provided redundancy information.

Shuffile: Shuffle files between MPI ranks

Documentation:

Files are registered with Used during restart, shuffile will move a file to the 'owning' MPI rank.

ER: Encode + Rebuild

ER provides an abstraction around shuffile and redset. During encode, ER calls redset to apply a redundancy scheme to a set of files, and then it calls shuffile to record which rank owns which files. During rebuild, ER first calls shuffile to move files back to their owning ranks, and then redset is called to rebuild any missing files.