Reproducibility
Reproducibility intentions
We intend for WCT to produce "reproducible results". We have a strong and a weak form of our intention.
Strong reproducibility
Multiple executions of the identical code, configuration and input on identical hardware, OS and libraries, should produce bit-wise identical results (with caveats).
Some of the caveats:
- Use of the
TbbFlow
Reproducibility intentions
We intend for WCT to produce "reproducible results". We have a strong and a weak form of our intention.
Strong reproducibility
Multiple executions of the identical code, configuration and input on identical hardware, OS and libraries, should produce bit-wise identical results (with caveats).
Some of the caveats:
- Use of the
TbbFlow
execution engine may require that noIRandom
instance is shared. - Use of GPU or multi-threading outside of
TbbFlow
is out of scope for our intention.
Weak reproducibility
Multiple executions of identical configuration, input and versions of WCT and dependencies, built with multiple compilers, executed on multiple types and instances of hardware, including or excluding GPU and multi-threading, must not produce data with statistically significant deviations of comparison metrics.
It is recognized that testing for significant deviation is a costly exercise and it can be easy to characterize a deviation as significant when insufficient sampling is performed.
The life cycle of this milestone
This milestone collects issues that must be closed in order to consider WCT as "reproducible" in the strong and weak meanings. We expect this milestone may open and close over time as novel problems arise and are addressed.