ForeC Timing Analysis Process

Jump to bottom

Eugene edited this page Apr 16, 2024 · 3 revisions

Front-end

Read in compilation information of ELF binary file: core allocations, static thread schedule, core-specific thread hierarchy, abort scopes, shared variables, and instruction and memory latencies
Decompile ELF binary file using objdump, with source lines interleaved with their corresponding ASM
Lexical analysis and tokenisation using Flex
Construction of abstract syntax tree (AST) of ASM program using Bison
- C and ASM comments demarcate functions, threads, and scheduling routines from each other
Construction of control-flow graph (CFG) from AST
- Resolve branch and jump targets to their corresponding basic blocks
- Duplicate and inline functions to simplify the CFG traversal in later stages
- Compute the execution cost of each instruction, ignoring memory accesses

Middle-end

Merge CFG nodes to simplify CFG, which reduces overall analysis time
- Determine whether memory accesses are to local or global memories. Infer from second memory operand being r0 or a load or store follows from the instruction sequence [imm, add*, (sw|lw)*]. Pointers to pointers to memory cannot be resolved properly
- Nodes with local accesses can be merged without loss in precision
- Nodes with global accesses can only be merged with loss in precision. Each global access could be reached from different paths at different execution times that align differently to the shared TDMA bus. Need to assume worst-case bus alignment for the first global access
- Merged nodes accumulate the execution times of its constituents
Merge CFG branches to simplify CFG, which reduces overall analysis time
- Compute max execution time of both branches and assign to merged node
- Exponential decrease in paths to explore

Back-end

Initialise each core's execution time to zero and traverse the CFG according to static thread schedule and ForeC execution semantics (thread fork-join, end of global tick, and abort).
Attribute thread execution times to each core based on thread-to-core allocation
- Execution time of scheduling routines are attributed to participating cores
- When scheduling routines cause cores to synchronise, advance the execution times of the participating cores to the max time among them
Traversal strategies
- No data-flow analysis is performed in any of the strategies, so all branches are explored. All loops are assumed to always pause or are bounded
- Reachability (precise but slow): Symbolic execution of ForeC program. Explicit state space exploration
- Max-plus (more imprecise but faster): Simplify threads into a single node, with and without consideration for multicore execution