From 1d3a1b2a276a89ab00810177ecca3258a67389dd Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Mon, 2 Dec 2024 16:00:34 +0000
Subject: [PATCH 01/11] paper draft

---
 paper.md | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 167 insertions(+)
 create mode 100644 paper.md

diff --git a/paper.md b/paper.md
new file mode 100644
index 00000000..e8f23062
--- /dev/null
+++ b/paper.md
@@ -0,0 +1,167 @@
+---
+title: 'The Causal Testing Framework'
+tags:
+  - Python
+  - causal testing
+  - causal inference
+  - causality
+  - software testing
+  - metamorphic testing
+authors:
+  - name: Michael Foster
+    orcid: 0000-0001-8233-9873
+    affiliation: 1
+    corresponding: true
+  - name: Christopher Wild
+    orcid: 0009-0009-1195-1497
+    affiliation: 1
+  - name: Farhad Allian
+    affiliation: 1
+  - name: Richard Somers
+    orcid: 0009-0009-1195-1497
+    affiliation: 1
+  - name: Neil Walkinshaw
+    orcid: 0000-0003-2134-6548
+    affiliation: 1
+  - name: Nicolas Lattimer
+    affiliation: 1
+affiliations:
+ - name: University of Sheffield, UK
+   index: 1
+date: 2 December 2024
+bibliography: paper.bib
+---
+
+# Summary
+Scientific models possess several properties that make them notoriously difficult to test, including a complex input space, long execution times, and non-determinism, rendering existing testing techniques impractical.
+In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal Inference has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse observational data instead of costly randomised trials.
+Causal Inference works by using domain knowledge to identify and mitigate for biases in the data, enabling them to answer causal questions that concern the effect of changing some feature on the observed outcome.
+The Causal Testing Framework is a software testing framework that uses Causal Inference techniques to establish causal effects between software variables from pre-existing runtime data rather than having to collect bespoke, highly curated datasets especially for testing.
+
+# Statement of need
+Metamorphic Testing is a popular technique for testing computational models (and other traditionally "hard to test" software).
+Test goals are expressed as _metamorphic relations_ that specify how changing an input in a particular way should affect the software output.
+Nondeterministic software can be tested using statistical metamorphic testing, which uses statistical tests over multiple executions of the software to determine whether the specified metamorphic relations hold.
+However, this requires the software to be executed repeatedly for each set of parameters of interest, so is computationally expensive, and is constrained to testing properties over software inputs that can be directly and precisely controlled.
+Statistical metamorphic testing cannot be used to test properties that relate internal variables or outputs to each other, since these cannot be controlled a priori.
+
+By employing domain knowledge in the form of a causal graph --- a lightweight model specifying the expected relationships between key software variables --- the Causal Testing Framework circumvents both of these problems by enabling models to be tested using pre-existing runtime data.
+The causal testing framework is written in python but is language agnostic in terms of the system under test.
+All that is required is a set of properties to be validated, a causal model, and a set of software runtime data.
+
+# Causal Testing
+Causal Testing has four main steps, outlined in \ref{fig:schematic}.
+Firstly, the user supplies a causal model, which takes the form of a directed acyclic graph (DAG) in which an edge $X \to Y$ represents variable $X$ having a direct causal effect on variable $Y$.
+Secondly, the user supplies a set of causal properties to be tested.
+Such properties can be generated from the causal DAG: for each $X \to Y$ edge, a test to validate the presence of a causal effect is generated, and for each missing edge, a test to validate independence is generated.
+The user may also refine tests to validate the nature of a particular relationship.
+Next, the user supplies a set of runtime data in the form of a table with each column representing a variable and rows containing the value of each variable for a particular run of the software.
+Finally, the Causal Testing Framework automatically validates the supplied causal properties by using the supplied causal DAG and data to calculate a causal effect estimate, and validating this against the expected causal relationship.
+
+![Causal Testing workflow.\label{fig:schematic}](images/schematic.png)
+
+## Test Adequacy
+Because the properties being tested are completely separate from the data used to validate them, traditional coverage-based metrics are not appropriate here.
+The Causal Testing Framework instead evaluates the adequacy of a particular dataset by calculating a statistical metric based on the stability of the causal effect estimate, with numbers closer to zero representing more adequate data.
+
+## Missing Variables
+Causal Testing works by using the supplied causal DAG to identify those variables which need to be statistically controlled for to remove their biassing effect on the causal estimate.
+This typically means we need to know their values.
+However, the Causal Testing Framework can still sometimes estimate unbiased causal effects using Instrumental Variables, an advanced Causal Inference technique.
+
+## Feedback
+Many scientific models involve iterating several interacting processes over time.
+These processes often feed into each other, and can create feedback cycles.
+Traditional Causal Inference cannot handle this, however the Causal Testing Framework uses another advanced Causal Inference technique, g-methods, to enable the estimation of causal effects even when there are feedback cycles between variables.
+
+# Citations
+
+Citations to entries in paper.bib should be in
+[rMarkdown](http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html)
+format.
+
+If you want to cite a software repository URL (e.g. something on GitHub without a preferred
+citation) then you can do it with the example BibTeX entry below for @fidgit.
+
+For a quick reference, the following citation commands can be used:
+- `@author:2001`  ->  "Author et al. (2001)"
+- `[@author:2001]` -> "(Author et al., 2001)"
+- `[@author1:2001; @author2:2001]` -> "(Author1 et al., 2001; Author2 et al., 2002)"
+
+# Figures
+
+Figures can be included like this:
+![Caption for example figure.\label{fig:example}](figure.png)
+and referenced from text using \autoref{fig:example}.
+
+Figure sizes can be customized by adding an optional second parameter:
+![Caption for example figure.](figure.png){ width=20% }
+
+# Acknowledgements
+
+We acknowledge contributions from Brigitta Sipocz, Syrtis Major, and Semyeong
+Oh, and support from Kathryn Johnston during the genesis of this project.
+
+# References
+
+Example paper.bib file:
+
+@article{Pearson:2017,
+  	url = {http://adsabs.harvard.edu/abs/2017arXiv170304627P},
+  	Archiveprefix = {arXiv},
+  	Author = {{Pearson}, S. and {Price-Whelan}, A.~M. and {Johnston}, K.~V.},
+  	Eprint = {1703.04627},
+  	Journal = {ArXiv e-prints},
+  	Keywords = {Astrophysics - Astrophysics of Galaxies},
+  	Month = mar,
+  	Title = {{Gaps in Globular Cluster Streams: Pal 5 and the Galactic Bar}},
+  	Year = 2017
+}
+
+@book{Binney:2008,
+  	url = {http://adsabs.harvard.edu/abs/2008gady.book.....B},
+  	Author = {{Binney}, J. and {Tremaine}, S.},
+  	Booktitle = {Galactic Dynamics: Second Edition, by James Binney and Scott Tremaine.~ISBN 978-0-691-13026-2 (HB).~Published by Princeton University Press, Princeton, NJ USA, 2008.},
+  	Publisher = {Princeton University Press},
+  	Title = {{Galactic Dynamics: Second Edition}},
+  	Year = 2008
+}
+
+@article{gaia,
+    author = {{Gaia Collaboration}},
+    title = "{The Gaia mission}",
+    journal = {Astronomy and Astrophysics},
+    archivePrefix = "arXiv",
+    eprint = {1609.04153},
+    primaryClass = "astro-ph.IM",
+    keywords = {space vehicles: instruments, Galaxy: structure, astrometry, parallaxes, proper motions, telescopes},
+    year = 2016,
+    month = nov,
+    volume = 595,
+    doi = {10.1051/0004-6361/201629272},
+    url = {http://adsabs.harvard.edu/abs/2016A%26A...595A...1G},
+}
+
+@article{astropy,
+    author = {{Astropy Collaboration}},
+    title = "{Astropy: A community Python package for astronomy}",
+    journal = {Astronomy and Astrophysics},
+    archivePrefix = "arXiv",
+    eprint = {1307.6212},
+    primaryClass = "astro-ph.IM",
+    keywords = {methods: data analysis, methods: miscellaneous, virtual observatory tools},
+    year = 2013,
+    month = oct,
+    volume = 558,
+    doi = {10.1051/0004-6361/201322068},
+    url = {http://adsabs.harvard.edu/abs/2013A%26A...558A..33A}
+}
+
+@misc{fidgit,
+  author = {A. M. Smith and K. Thaney and M. Hahnel},
+  title = {Fidgit: An ungodly union of GitHub and Figshare},
+  year = {2020},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  url = {https://github.com/arfon/fidgit}
+}

From f726d6db27137bec538a69a22d547c9353ba8083 Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Tue, 3 Dec 2024 11:28:11 +0000
Subject: [PATCH 02/11] added citations

---
 paper.bib |  61 ++++++++++++++++++++++++++++
 paper.md  | 118 +++++++++---------------------------------------------
 2 files changed, 80 insertions(+), 99 deletions(-)
 create mode 100644 paper.bib

diff --git a/paper.bib b/paper.bib
new file mode 100644
index 00000000..c2bbf409
--- /dev/null
+++ b/paper.bib
@@ -0,0 +1,61 @@
+@techreport{chen1998metamorphic,
+  author = {Chen, Tsong Y. and Cheung, Shing C. and Yiu, Shiu Ming},
+  institution = { The Hong Kong University of Science and Technology},
+  number = {HKUST-CS98-01},
+  title = {Metamorphic testing: A new approach for generating next test cases},
+  year = {1998}
+}
+
+@inproceedings{clark2023metamorphic,
+  author = {Clark, Andrew G. and Foster, Michael and Walkinshaw, Neil and Hierons, Robert M.},
+  booktitle = {2023 IEEE Conference on Software Testing, Verification and Validation (ICST)},
+  doi = {10.1109/ICST57152.2023.00023},
+  keywords = {Software testing;Java;Graphical models;Computer bugs;Software;Test pattern generators;Usability;Metamorphic testing;Causality;DAGs},
+  number = {},
+  pages = {153-164},
+  title = {Metamorphic Testing with Causal Graphs},
+  volume = {},
+  year = {2023}
+}
+
+@article{clark2023testing,
+  address = {New York, NY, USA},
+  articleno = {10},
+  author = {Clark, Andrew G. and Foster, Michael and Prifling, Benedikt and Walkinshaw, Neil and Hierons, Robert M. and Schmidt, Volker and Turner, Robert D.},
+  doi = {10.1145/3607184},
+  issn = {1049-331X},
+  issue_date = {January 2024},
+  journal = {ACM Trans. Softw. Eng. Methodol.},
+  keywords = {causal testing, causal inference, Software testing},
+  month = {nov},
+  number = {1},
+  numpages = {42},
+  publisher = {Association for Computing Machinery},
+  title = {Testing Causality in Scientific Modelling Software},
+  volume = {33},
+  year = {2023}
+}
+
+@inproceedings{foster2024adequacy,
+  author = {Foster, Michael and Wild, Christopher and Hierons, Robert M. and Walkinshaw, Neil},
+  booktitle = {2024 IEEE Conference on Software Testing, Verification and Validation (ICST)},
+  doi = {10.1109/ICST60714.2024.00023},
+  keywords = {Measurement;Software testing;Correlation;Systematics;Computational modeling;Software systems;Kurtosis;software testing;causal inference;test adequacy},
+  number = {},
+  pages = {161-172},
+  title = {Causal Test Adequacy},
+  volume = {},
+  year = {2024}
+}
+
+@inproceedings{guderlei2007smt,
+  author = {Guderlei, Ralph and Mayer, Johannes},
+  booktitle = {Seventh International Conference on Quality Software (QSIC 2007)},
+  doi = {10.1109/QSIC.2007.4385527},
+  keywords = {Software testing;Statistical analysis;Random variables;Investments;Context modeling;Collaborative software;Software quality;Statistical distributions;Error correction;Probability},
+  number = {},
+  pages = {404-409},
+  title = {Statistical Metamorphic Testing Testing Programs with Random Output by Means of Statistical Hypothesis Tests and Metamorphic Testing},
+  volume = {},
+  year = {2007}
+}
diff --git a/paper.md b/paper.md
index e8f23062..26324475 100644
--- a/paper.md
+++ b/paper.md
@@ -12,6 +12,9 @@ authors:
     orcid: 0000-0001-8233-9873
     affiliation: 1
     corresponding: true
+  - name: Andrew Clark
+    orcid: 0000-0002-6830-0566
+    affiliation: 2
   - name: Christopher Wild
     orcid: 0009-0009-1195-1497
     affiliation: 1
@@ -20,14 +23,20 @@ authors:
   - name: Richard Somers
     orcid: 0009-0009-1195-1497
     affiliation: 1
+  - name: Nicholas Lattimer
+    orcid: 0000-0001-5304-5585
+    affiliation: 1
   - name: Neil Walkinshaw
     orcid: 0000-0003-2134-6548
     affiliation: 1
-  - name: Nicolas Lattimer
+  - name: Rob Hierons
+    orcid: 0000-0003-2134-6548
     affiliation: 1
 affiliations:
  - name: University of Sheffield, UK
    index: 1
+ - name: Wherever Andy works now, UK
+   index: 2
 date: 2 December 2024
 bibliography: paper.bib
 ---
@@ -39,21 +48,21 @@ Causal Inference works by using domain knowledge to identify and mitigate for bi
 The Causal Testing Framework is a software testing framework that uses Causal Inference techniques to establish causal effects between software variables from pre-existing runtime data rather than having to collect bespoke, highly curated datasets especially for testing.
 
 # Statement of need
-Metamorphic Testing is a popular technique for testing computational models (and other traditionally "hard to test" software).
+Metamorphic Testing @[chen1998metamorphic] is a popular technique for testing computational models (and other traditionally "hard to test" software).
 Test goals are expressed as _metamorphic relations_ that specify how changing an input in a particular way should affect the software output.
-Nondeterministic software can be tested using statistical metamorphic testing, which uses statistical tests over multiple executions of the software to determine whether the specified metamorphic relations hold.
+Nondeterministic software can be tested using Statistical Metamorphic Testing @[guderlei2007smt], which uses statistical tests over multiple executions of the software to determine whether the specified metamorphic relations hold.
 However, this requires the software to be executed repeatedly for each set of parameters of interest, so is computationally expensive, and is constrained to testing properties over software inputs that can be directly and precisely controlled.
-Statistical metamorphic testing cannot be used to test properties that relate internal variables or outputs to each other, since these cannot be controlled a priori.
+Statistical Metamorphic Testing cannot be used to test properties that relate internal variables or outputs to each other, since these cannot be controlled a priori.
 
 By employing domain knowledge in the form of a causal graph --- a lightweight model specifying the expected relationships between key software variables --- the Causal Testing Framework circumvents both of these problems by enabling models to be tested using pre-existing runtime data.
-The causal testing framework is written in python but is language agnostic in terms of the system under test.
+The Causal Testing Framework is written in python but is language agnostic in terms of the system under test.
 All that is required is a set of properties to be validated, a causal model, and a set of software runtime data.
 
 # Causal Testing
-Causal Testing has four main steps, outlined in \ref{fig:schematic}.
+Causal Testing @[clark2023testing] has four main steps, outlined in \ref{fig:schematic}.
 Firstly, the user supplies a causal model, which takes the form of a directed acyclic graph (DAG) in which an edge $X \to Y$ represents variable $X$ having a direct causal effect on variable $Y$.
 Secondly, the user supplies a set of causal properties to be tested.
-Such properties can be generated from the causal DAG: for each $X \to Y$ edge, a test to validate the presence of a causal effect is generated, and for each missing edge, a test to validate independence is generated.
+Such properties can be generated from the causal DAG @[clark2023metamorphic]: for each $X \to Y$ edge, a test to validate the presence of a causal effect is generated, and for each missing edge, a test to validate independence is generated.
 The user may also refine tests to validate the nature of a particular relationship.
 Next, the user supplies a set of runtime data in the form of a table with each column representing a variable and rows containing the value of each variable for a particular run of the software.
 Finally, the Causal Testing Framework automatically validates the supplied causal properties by using the supplied causal DAG and data to calculate a causal effect estimate, and validating this against the expected causal relationship.
@@ -62,106 +71,17 @@ Finally, the Causal Testing Framework automatically validates the supplied causa
 
 ## Test Adequacy
 Because the properties being tested are completely separate from the data used to validate them, traditional coverage-based metrics are not appropriate here.
-The Causal Testing Framework instead evaluates the adequacy of a particular dataset by calculating a statistical metric based on the stability of the causal effect estimate, with numbers closer to zero representing more adequate data.
+The Causal Testing Framework instead evaluates the adequacy of a particular dataset by calculating a statistical metric @[foster2024adequacy] based on the stability of the causal effect estimate, with numbers closer to zero representing more adequate data.
 
 ## Missing Variables
 Causal Testing works by using the supplied causal DAG to identify those variables which need to be statistically controlled for to remove their biassing effect on the causal estimate.
 This typically means we need to know their values.
 However, the Causal Testing Framework can still sometimes estimate unbiased causal effects using Instrumental Variables, an advanced Causal Inference technique.
 
-## Feedback
+## Feedback Over Time
 Many scientific models involve iterating several interacting processes over time.
 These processes often feed into each other, and can create feedback cycles.
 Traditional Causal Inference cannot handle this, however the Causal Testing Framework uses another advanced Causal Inference technique, g-methods, to enable the estimation of causal effects even when there are feedback cycles between variables.
 
-# Citations
-
-Citations to entries in paper.bib should be in
-[rMarkdown](http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html)
-format.
-
-If you want to cite a software repository URL (e.g. something on GitHub without a preferred
-citation) then you can do it with the example BibTeX entry below for @fidgit.
-
-For a quick reference, the following citation commands can be used:
-- `@author:2001`  ->  "Author et al. (2001)"
-- `[@author:2001]` -> "(Author et al., 2001)"
-- `[@author1:2001; @author2:2001]` -> "(Author1 et al., 2001; Author2 et al., 2002)"
-
-# Figures
-
-Figures can be included like this:
-![Caption for example figure.\label{fig:example}](figure.png)
-and referenced from text using \autoref{fig:example}.
-
-Figure sizes can be customized by adding an optional second parameter:
-![Caption for example figure.](figure.png){ width=20% }
-
 # Acknowledgements
-
-We acknowledge contributions from Brigitta Sipocz, Syrtis Major, and Semyeong
-Oh, and support from Kathryn Johnston during the genesis of this project.
-
-# References
-
-Example paper.bib file:
-
-@article{Pearson:2017,
-  	url = {http://adsabs.harvard.edu/abs/2017arXiv170304627P},
-  	Archiveprefix = {arXiv},
-  	Author = {{Pearson}, S. and {Price-Whelan}, A.~M. and {Johnston}, K.~V.},
-  	Eprint = {1703.04627},
-  	Journal = {ArXiv e-prints},
-  	Keywords = {Astrophysics - Astrophysics of Galaxies},
-  	Month = mar,
-  	Title = {{Gaps in Globular Cluster Streams: Pal 5 and the Galactic Bar}},
-  	Year = 2017
-}
-
-@book{Binney:2008,
-  	url = {http://adsabs.harvard.edu/abs/2008gady.book.....B},
-  	Author = {{Binney}, J. and {Tremaine}, S.},
-  	Booktitle = {Galactic Dynamics: Second Edition, by James Binney and Scott Tremaine.~ISBN 978-0-691-13026-2 (HB).~Published by Princeton University Press, Princeton, NJ USA, 2008.},
-  	Publisher = {Princeton University Press},
-  	Title = {{Galactic Dynamics: Second Edition}},
-  	Year = 2008
-}
-
-@article{gaia,
-    author = {{Gaia Collaboration}},
-    title = "{The Gaia mission}",
-    journal = {Astronomy and Astrophysics},
-    archivePrefix = "arXiv",
-    eprint = {1609.04153},
-    primaryClass = "astro-ph.IM",
-    keywords = {space vehicles: instruments, Galaxy: structure, astrometry, parallaxes, proper motions, telescopes},
-    year = 2016,
-    month = nov,
-    volume = 595,
-    doi = {10.1051/0004-6361/201629272},
-    url = {http://adsabs.harvard.edu/abs/2016A%26A...595A...1G},
-}
-
-@article{astropy,
-    author = {{Astropy Collaboration}},
-    title = "{Astropy: A community Python package for astronomy}",
-    journal = {Astronomy and Astrophysics},
-    archivePrefix = "arXiv",
-    eprint = {1307.6212},
-    primaryClass = "astro-ph.IM",
-    keywords = {methods: data analysis, methods: miscellaneous, virtual observatory tools},
-    year = 2013,
-    month = oct,
-    volume = 558,
-    doi = {10.1051/0004-6361/201322068},
-    url = {http://adsabs.harvard.edu/abs/2013A%26A...558A..33A}
-}
-
-@misc{fidgit,
-  author = {A. M. Smith and K. Thaney and M. Hahnel},
-  title = {Fidgit: An ungodly union of GitHub and Figshare},
-  year = {2020},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  url = {https://github.com/arfon/fidgit}
-}
+This work was supported by the EPSRC CITCoM grant EP/T030526/1.

From 71503ea33f536cec20777fbdfdd205563a1a3ba2 Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Tue, 3 Dec 2024 11:37:35 +0000
Subject: [PATCH 03/11] added bob as an author

---
 paper.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/paper.md b/paper.md
index 26324475..ba89ea82 100644
--- a/paper.md
+++ b/paper.md
@@ -14,11 +14,14 @@ authors:
     corresponding: true
   - name: Andrew Clark
     orcid: 0000-0002-6830-0566
-    affiliation: 2
+    affiliation: 1
   - name: Christopher Wild
     orcid: 0009-0009-1195-1497
     affiliation: 1
   - name: Farhad Allian
+    orcid: 0000-0002-4569-0370
+    affiliation: 1
+  - name: Robert Turner
     affiliation: 1
   - name: Richard Somers
     orcid: 0009-0009-1195-1497
@@ -35,8 +38,6 @@ authors:
 affiliations:
  - name: University of Sheffield, UK
    index: 1
- - name: Wherever Andy works now, UK
-   index: 2
 date: 2 December 2024
 bibliography: paper.bib
 ---

From c07d17558f8f60a4415d68f4af5b3aa153e219b4 Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Tue, 3 Dec 2024 12:57:32 +0000
Subject: [PATCH 04/11] Added Bob's ORCID

---
 paper.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/paper.md b/paper.md
index ba89ea82..2ac1f1fb 100644
--- a/paper.md
+++ b/paper.md
@@ -22,6 +22,7 @@ authors:
     orcid: 0000-0002-4569-0370
     affiliation: 1
   - name: Robert Turner
+    orcid: 0000-0002-1353-1404
     affiliation: 1
   - name: Richard Somers
     orcid: 0009-0009-1195-1497

From 942aa9008be36ef1deaf29193f720962af20e649 Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Tue, 3 Dec 2024 13:14:34 +0000
Subject: [PATCH 05/11] added compilation workflow

---
 .github/workflows/joss.yaml | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)
 create mode 100644 .github/workflows/joss.yaml

diff --git a/.github/workflows/joss.yaml b/.github/workflows/joss.yaml
new file mode 100644
index 00000000..02e0d02a
--- /dev/null
+++ b/.github/workflows/joss.yaml
@@ -0,0 +1,24 @@
+name: JOSS article compilation
+on: [push]
+
+jobs:
+  paper:
+    runs-on: ubuntu-latest
+    name: Paper Draft
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+      - name: Build draft PDF
+        uses: openjournals/openjournals-draft-action@master
+        with:
+          journal: joss
+          # This should be the path to the paper within your repo.
+          paper-path: paper.md
+      - name: Upload
+        uses: actions/upload-artifact@v4
+        with:
+          name: paper
+          # This is the output path where Pandoc will write the compiled
+          # PDF. Note, this should be the same directory as the input
+          # paper.md
+          path: paper.pdf

From 772d5c916fe013250a930b0810ac1b5828fbc2a4 Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Tue, 3 Dec 2024 13:20:08 +0000
Subject: [PATCH 06/11] Moved paper into a directory

---
 .github/workflows/joss.yaml  | 11 ++++++++---
 paper.bib => paper/paper.bib |  0
 paper.md => paper/paper.md   |  0
 3 files changed, 8 insertions(+), 3 deletions(-)
 rename paper.bib => paper/paper.bib (100%)
 rename paper.md => paper/paper.md (100%)

diff --git a/.github/workflows/joss.yaml b/.github/workflows/joss.yaml
index 02e0d02a..81065606 100644
--- a/.github/workflows/joss.yaml
+++ b/.github/workflows/joss.yaml
@@ -1,5 +1,10 @@
 name: JOSS article compilation
-on: [push]
+on:
+  push:
+    paths:
+      - paper/**
+      - images/schematic.png
+      - .github/workflows/draft-pdf.yml
 
 jobs:
   paper:
@@ -13,7 +18,7 @@ jobs:
         with:
           journal: joss
           # This should be the path to the paper within your repo.
-          paper-path: paper.md
+          paper-path: paper/paper.md
       - name: Upload
         uses: actions/upload-artifact@v4
         with:
@@ -21,4 +26,4 @@ jobs:
           # This is the output path where Pandoc will write the compiled
           # PDF. Note, this should be the same directory as the input
           # paper.md
-          path: paper.pdf
+          path: paper/paper.pdf
diff --git a/paper.bib b/paper/paper.bib
similarity index 100%
rename from paper.bib
rename to paper/paper.bib
diff --git a/paper.md b/paper/paper.md
similarity index 100%
rename from paper.md
rename to paper/paper.md

From 85a1da313efa2ab3bfe0ff5163f2df4ae630a8c8 Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Tue, 3 Dec 2024 13:31:32 +0000
Subject: [PATCH 07/11] Fixed citations

---
 paper/paper.md | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/paper/paper.md b/paper/paper.md
index 2ac1f1fb..d1d89736 100644
--- a/paper/paper.md
+++ b/paper/paper.md
@@ -50,9 +50,9 @@ Causal Inference works by using domain knowledge to identify and mitigate for bi
 The Causal Testing Framework is a software testing framework that uses Causal Inference techniques to establish causal effects between software variables from pre-existing runtime data rather than having to collect bespoke, highly curated datasets especially for testing.
 
 # Statement of need
-Metamorphic Testing @[chen1998metamorphic] is a popular technique for testing computational models (and other traditionally "hard to test" software).
+Metamorphic Testing [@chen1998metamorphic] is a popular technique for testing computational models (and other traditionally "hard to test" software).
 Test goals are expressed as _metamorphic relations_ that specify how changing an input in a particular way should affect the software output.
-Nondeterministic software can be tested using Statistical Metamorphic Testing @[guderlei2007smt], which uses statistical tests over multiple executions of the software to determine whether the specified metamorphic relations hold.
+Nondeterministic software can be tested using Statistical Metamorphic Testing [@guderlei2007smt], which uses statistical tests over multiple executions of the software to determine whether the specified metamorphic relations hold.
 However, this requires the software to be executed repeatedly for each set of parameters of interest, so is computationally expensive, and is constrained to testing properties over software inputs that can be directly and precisely controlled.
 Statistical Metamorphic Testing cannot be used to test properties that relate internal variables or outputs to each other, since these cannot be controlled a priori.
 
@@ -61,19 +61,19 @@ The Causal Testing Framework is written in python but is language agnostic in te
 All that is required is a set of properties to be validated, a causal model, and a set of software runtime data.
 
 # Causal Testing
-Causal Testing @[clark2023testing] has four main steps, outlined in \ref{fig:schematic}.
+Causal Testing [@clark2023testing] has four main steps, outlined in \ref{fig:schematic}.
 Firstly, the user supplies a causal model, which takes the form of a directed acyclic graph (DAG) in which an edge $X \to Y$ represents variable $X$ having a direct causal effect on variable $Y$.
 Secondly, the user supplies a set of causal properties to be tested.
-Such properties can be generated from the causal DAG @[clark2023metamorphic]: for each $X \to Y$ edge, a test to validate the presence of a causal effect is generated, and for each missing edge, a test to validate independence is generated.
+Such properties can be generated from the causal DAG [@clark2023metamorphic]: for each $X \to Y$ edge, a test to validate the presence of a causal effect is generated, and for each missing edge, a test to validate independence is generated.
 The user may also refine tests to validate the nature of a particular relationship.
 Next, the user supplies a set of runtime data in the form of a table with each column representing a variable and rows containing the value of each variable for a particular run of the software.
 Finally, the Causal Testing Framework automatically validates the supplied causal properties by using the supplied causal DAG and data to calculate a causal effect estimate, and validating this against the expected causal relationship.
 
-![Causal Testing workflow.\label{fig:schematic}](images/schematic.png)
+![Causal Testing workflow.\label{fig:schematic}](../images/schematic.png)
 
 ## Test Adequacy
 Because the properties being tested are completely separate from the data used to validate them, traditional coverage-based metrics are not appropriate here.
-The Causal Testing Framework instead evaluates the adequacy of a particular dataset by calculating a statistical metric @[foster2024adequacy] based on the stability of the causal effect estimate, with numbers closer to zero representing more adequate data.
+The Causal Testing Framework instead evaluates the adequacy of a particular dataset by calculating a statistical metric [@foster2024adequacy] based on the stability of the causal effect estimate, with numbers closer to zero representing more adequate data.
 
 ## Missing Variables
 Causal Testing works by using the supplied causal DAG to identify those variables which need to be statistically controlled for to remove their biassing effect on the causal estimate.
@@ -87,3 +87,5 @@ Traditional Causal Inference cannot handle this, however the Causal Testing Fram
 
 # Acknowledgements
 This work was supported by the EPSRC CITCoM grant EP/T030526/1.
+
+# References

From be44f5d68c6e359ba828251bcdd10e1abe41783d Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Tue, 3 Dec 2024 13:57:01 +0000
Subject: [PATCH 08/11] Added related work and ongoing/future work sections

---
 paper/paper.bib | 43 +++++++++++++++++++++++++++++++++++++++++++
 paper/paper.md  | 12 ++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/paper/paper.bib b/paper/paper.bib
index c2bbf409..94747aa4 100644
--- a/paper/paper.bib
+++ b/paper/paper.bib
@@ -1,3 +1,14 @@
+@article{blobaum2024dowhy,
+  author = {Patrick Bl{{\"o}}baum and Peter G{{\"o}}tz and Kailash Budhathoki and Atalanti A. Mastakouri and Dominik Janzing},
+  journal = {Journal of Machine Learning Research},
+  number = {147},
+  pages = {1--7},
+  title = {DoWhy-GCM: An Extension of DoWhy for Causal Inference in Graphical Causal Models},
+  url = {http://jmlr.org/papers/v25/22-1258.html},
+  volume = {25},
+  year = {2024}
+}
+
 @techreport{chen1998metamorphic,
   author = {Chen, Tsong Y. and Cheung, Shing C. and Yiu, Shiu Ming},
   institution = { The Hong Kong University of Science and Technology},
@@ -59,3 +70,35 @@ @inproceedings{guderlei2007smt
   volume = {},
   year = {2007}
 }
+
+@misc{sharma2020dowhy,
+  archiveprefix = {arXiv},
+  author = {Amit Sharma and Emre Kiciman},
+  eprint = {2011.04216},
+  primaryclass = {stat.ME},
+  title = {DoWhy: An End-to-End Library for Causal Inference},
+  url = {https://arxiv.org/abs/2011.04216},
+  year = {2020}
+}
+
+@article{somers2024configuration,
+  doi = {10.2139/ssrn.4732706},
+  author = {Somers, Richard and Walkinshaw, Neil and Hierons, Robert and Elliott, Jackie and Iqbal, Ahmed and Walkinshaw, Emma},
+  publisher = {Elsevier BV},
+  title = {Configuration Testing of an Artificial Pancreas System Using a Digital Twin},
+  url = {http://dx.doi.org/10.2139/ssrn.4732706},
+  year = {2024}
+}
+
+@article{textor2017dagitty,
+  doi = {10.1093/ije/dyw341},
+  issn = {1464-3685},
+  author = {Textor, Johannes and van der Zander, Benito and Gilthorpe, Mark S. and Liśkiewicz, Maciej and Ellison, George T.H.},
+  journal = {International Journal of Epidemiology},
+  month = {jan},
+  pages = {dyw341},
+  publisher = {Oxford University Press (OUP)},
+  title = {Robust causal inference using directed acyclic graphs: the R package ‘dagitty’},
+  url = {http://dx.doi.org/10.1093/ije/dyw341},
+  year = {2017}
+}
diff --git a/paper/paper.md b/paper/paper.md
index d1d89736..3c90b30b 100644
--- a/paper/paper.md
+++ b/paper/paper.md
@@ -85,6 +85,18 @@ Many scientific models involve iterating several interacting processes over time
 These processes often feed into each other, and can create feedback cycles.
 Traditional Causal Inference cannot handle this, however the Causal Testing Framework uses another advanced Causal Inference technique, g-methods, to enable the estimation of causal effects even when there are feedback cycles between variables.
 
+# Related Work
+The Dagitty tool [@textor2017dagitty] is a browser-based environment for creating, editing, and analysing causal graphs.
+There is an R package for local use, but the tool does not aim to facilitate causal inference.
+For this, the doWhy [@sharma2020dowhy; @blobaum2024dowhy] is a python package which can be used to estimate causal effects from data.
+However, the package is intended for general causal inference.
+It does not explicitly support causal testing, nor does it support temporal feedback loops.
+
+# Ongoing and Future Research
+The Causal Testing Framework is the subject of several publications [@clark2023metamorphic; @clark2023testing; @foster2024adequacy; @somers2024configuration].
+We are also in the process of preparing scientific publications concerning how the Causal Testing Framework handles missing variables and feedback over time.
+Furthermore, we are working to develop a plug-in for the [DAFNI framework](https://www.dafni.ac.uk/) to enable national-scale infrastructure models to be easily tested.
+
 # Acknowledgements
 This work was supported by the EPSRC CITCoM grant EP/T030526/1.
 

From 7c9e05f4dd950f1a7307cd3aa60001d98b8df753 Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Tue, 3 Dec 2024 14:04:47 +0000
Subject: [PATCH 09/11] expanded tosem

---
 paper/paper.bib | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/paper/paper.bib b/paper/paper.bib
index 94747aa4..7bed499f 100644
--- a/paper/paper.bib
+++ b/paper/paper.bib
@@ -4,7 +4,6 @@ @article{blobaum2024dowhy
   number = {147},
   pages = {1--7},
   title = {DoWhy-GCM: An Extension of DoWhy for Causal Inference in Graphical Causal Models},
-  url = {http://jmlr.org/papers/v25/22-1258.html},
   volume = {25},
   year = {2024}
 }
@@ -36,8 +35,7 @@ @article{clark2023testing
   doi = {10.1145/3607184},
   issn = {1049-331X},
   issue_date = {January 2024},
-  journal = {ACM Trans. Softw. Eng. Methodol.},
-  keywords = {causal testing, causal inference, Software testing},
+  journal = {ACM Transactions on Software Engineering Methodology},
   month = {nov},
   number = {1},
   numpages = {42},
@@ -51,7 +49,6 @@ @inproceedings{foster2024adequacy
   author = {Foster, Michael and Wild, Christopher and Hierons, Robert M. and Walkinshaw, Neil},
   booktitle = {2024 IEEE Conference on Software Testing, Verification and Validation (ICST)},
   doi = {10.1109/ICST60714.2024.00023},
-  keywords = {Measurement;Software testing;Correlation;Systematics;Computational modeling;Software systems;Kurtosis;software testing;causal inference;test adequacy},
   number = {},
   pages = {161-172},
   title = {Causal Test Adequacy},
@@ -63,7 +60,6 @@ @inproceedings{guderlei2007smt
   author = {Guderlei, Ralph and Mayer, Johannes},
   booktitle = {Seventh International Conference on Quality Software (QSIC 2007)},
   doi = {10.1109/QSIC.2007.4385527},
-  keywords = {Software testing;Statistical analysis;Random variables;Investments;Context modeling;Collaborative software;Software quality;Statistical distributions;Error correction;Probability},
   number = {},
   pages = {404-409},
   title = {Statistical Metamorphic Testing Testing Programs with Random Output by Means of Statistical Hypothesis Tests and Metamorphic Testing},
@@ -86,7 +82,6 @@ @article{somers2024configuration
   author = {Somers, Richard and Walkinshaw, Neil and Hierons, Robert and Elliott, Jackie and Iqbal, Ahmed and Walkinshaw, Emma},
   publisher = {Elsevier BV},
   title = {Configuration Testing of an Artificial Pancreas System Using a Digital Twin},
-  url = {http://dx.doi.org/10.2139/ssrn.4732706},
   year = {2024}
 }
 
@@ -99,6 +94,5 @@ @article{textor2017dagitty
   pages = {dyw341},
   publisher = {Oxford University Press (OUP)},
   title = {Robust causal inference using directed acyclic graphs: the R package ‘dagitty’},
-  url = {http://dx.doi.org/10.1093/ije/dyw341},
   year = {2017}
 }

From 25d6914c3d957890799a3a432349effd69cbc30f Mon Sep 17 00:00:00 2001
From: Michael Foster <m.foster@sheffield.ac.uk>
Date: Thu, 5 Dec 2024 14:50:54 +0000
Subject: [PATCH 10/11] Integrated feedback from @f-alian and @AndrewC19

---
 paper/paper.md | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/paper/paper.md b/paper/paper.md
index 3c90b30b..27494c3b 100644
--- a/paper/paper.md
+++ b/paper/paper.md
@@ -45,9 +45,9 @@ bibliography: paper.bib
 
 # Summary
 Scientific models possess several properties that make them notoriously difficult to test, including a complex input space, long execution times, and non-determinism, rendering existing testing techniques impractical.
-In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal Inference has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse observational data instead of costly randomised trials.
-Causal Inference works by using domain knowledge to identify and mitigate for biases in the data, enabling them to answer causal questions that concern the effect of changing some feature on the observed outcome.
-The Causal Testing Framework is a software testing framework that uses Causal Inference techniques to establish causal effects between software variables from pre-existing runtime data rather than having to collect bespoke, highly curated datasets especially for testing.
+In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal Inference (CI) has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse observational data instead of costly randomised trials.
+CI works by using domain knowledge to identify and mitigate for biases in the data, enabling them to answer causal questions that concern the effect of changing some feature on the observed outcome.
+The Causal Testing Framework (CTF) is a software testing framework that uses CI techniques to establish causal effects between software variables from pre-existing runtime data rather than having to collect bespoke, highly curated datasets especially for testing.
 
 # Statement of need
 Metamorphic Testing [@chen1998metamorphic] is a popular technique for testing computational models (and other traditionally "hard to test" software).
@@ -56,46 +56,46 @@ Nondeterministic software can be tested using Statistical Metamorphic Testing [@
 However, this requires the software to be executed repeatedly for each set of parameters of interest, so is computationally expensive, and is constrained to testing properties over software inputs that can be directly and precisely controlled.
 Statistical Metamorphic Testing cannot be used to test properties that relate internal variables or outputs to each other, since these cannot be controlled a priori.
 
-By employing domain knowledge in the form of a causal graph --- a lightweight model specifying the expected relationships between key software variables --- the Causal Testing Framework circumvents both of these problems by enabling models to be tested using pre-existing runtime data.
-The Causal Testing Framework is written in python but is language agnostic in terms of the system under test.
+By employing domain knowledge in the form of a causal graph --- a lightweight model specifying the expected relationships between key software variables --- the CTF circumvents both of these problems by enabling models to be tested using pre-existing runtime data.
+The CTF is written in Python but is language agnostic in terms of the system under test.
 All that is required is a set of properties to be validated, a causal model, and a set of software runtime data.
 
 # Causal Testing
 Causal Testing [@clark2023testing] has four main steps, outlined in \ref{fig:schematic}.
-Firstly, the user supplies a causal model, which takes the form of a directed acyclic graph (DAG) in which an edge $X \to Y$ represents variable $X$ having a direct causal effect on variable $Y$.
+Firstly, the user supplies a causal model, which takes the form of a directed acyclic graph (DAG) where an edge $X \to Y$ represents variable $X$ having a direct causal effect on variable $Y$.
 Secondly, the user supplies a set of causal properties to be tested.
 Such properties can be generated from the causal DAG [@clark2023metamorphic]: for each $X \to Y$ edge, a test to validate the presence of a causal effect is generated, and for each missing edge, a test to validate independence is generated.
 The user may also refine tests to validate the nature of a particular relationship.
 Next, the user supplies a set of runtime data in the form of a table with each column representing a variable and rows containing the value of each variable for a particular run of the software.
-Finally, the Causal Testing Framework automatically validates the supplied causal properties by using the supplied causal DAG and data to calculate a causal effect estimate, and validating this against the expected causal relationship.
+Finally, the CTF automatically validates the causal properties by using the causal DAG and data to calculate a causal effect estimate, and validating this against the expected causal relationship.
 
 ![Causal Testing workflow.\label{fig:schematic}](../images/schematic.png)
 
 ## Test Adequacy
 Because the properties being tested are completely separate from the data used to validate them, traditional coverage-based metrics are not appropriate here.
-The Causal Testing Framework instead evaluates the adequacy of a particular dataset by calculating a statistical metric [@foster2024adequacy] based on the stability of the causal effect estimate, with numbers closer to zero representing more adequate data.
+The CTF instead evaluates the adequacy of a particular dataset by calculating a statistical metric [@foster2024adequacy] based on the stability of the causal effect estimate, with numbers closer to zero representing more adequate data.
 
 ## Missing Variables
-Causal Testing works by using the supplied causal DAG to identify those variables which need to be statistically controlled for to remove their biassing effect on the causal estimate.
+Causal Testing works by using the causal DAG to identify the variables that need to be statistically controlled for to remove their biassing effect on the causal estimate.
 This typically means we need to know their values.
-However, the Causal Testing Framework can still sometimes estimate unbiased causal effects using Instrumental Variables, an advanced Causal Inference technique.
+However, where such biassing variables are not recorded in the data, the Causal Testing Framework can still sometimes estimate unbiased causal effects by using Instrumental Variables, an advanced Causal Inference technique.
 
 ## Feedback Over Time
 Many scientific models involve iterating several interacting processes over time.
 These processes often feed into each other, and can create feedback cycles.
-Traditional Causal Inference cannot handle this, however the Causal Testing Framework uses another advanced Causal Inference technique, g-methods, to enable the estimation of causal effects even when there are feedback cycles between variables.
+Traditional CI cannot handle this, however the CTF uses a family of advanced CI techniques, called g-methods, to enable the estimation of causal effects even when there are feedback cycles between variables.
 
 # Related Work
 The Dagitty tool [@textor2017dagitty] is a browser-based environment for creating, editing, and analysing causal graphs.
-There is an R package for local use, but the tool does not aim to facilitate causal inference.
-For this, the doWhy [@sharma2020dowhy; @blobaum2024dowhy] is a python package which can be used to estimate causal effects from data.
-However, the package is intended for general causal inference.
-It does not explicitly support causal testing, nor does it support temporal feedback loops.
+There is also an R package for local use, but Dagitty cannot be used to estimate causal effects.
+For this, doWhy [@sharma2020dowhy; @blobaum2024dowhy] is a free, open source Python package, and [cStruture](https://cstructure.dev) is a paid low code CI platform.
+However, these packages are intended for general CI.
+Neither explicitly supports causal software testing, nor do they support temporal feedback loops.
 
 # Ongoing and Future Research
-The Causal Testing Framework is the subject of several publications [@clark2023metamorphic; @clark2023testing; @foster2024adequacy; @somers2024configuration].
-We are also in the process of preparing scientific publications concerning how the Causal Testing Framework handles missing variables and feedback over time.
-Furthermore, we are working to develop a plug-in for the [DAFNI framework](https://www.dafni.ac.uk/) to enable national-scale infrastructure models to be easily tested.
+The CTF is the subject of several publications [@clark2023metamorphic; @clark2023testing; @foster2024adequacy; @somers2024configuration].
+We are also in the process of preparing scientific publications concerning how the CTF handles missing variables and feedback over time.
+Furthermore, we are working to develop a plug-in for the [DAFNI platform](https://www.dafni.ac.uk/) to enable national-scale infrastructure models to be easily tested.
 
 # Acknowledgements
 This work was supported by the EPSRC CITCoM grant EP/T030526/1.

From 4c1410beb5c208907ce4b7bd4122748cac02337d Mon Sep 17 00:00:00 2001
From: Michael <jmafoster1@gmail.com>
Date: Fri, 10 Jan 2025 11:45:36 +0000
Subject: [PATCH 11/11] Nick and Neil's comments

---
 paper/paper.bib | 25 +++++++++++++++++++++++--
 paper/paper.md  | 16 ++++++++--------
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/paper/paper.bib b/paper/paper.bib
index 7bed499f..ba8b42ae 100644
--- a/paper/paper.bib
+++ b/paper/paper.bib
@@ -67,6 +67,27 @@ @inproceedings{guderlei2007smt
   year = {2007}
 }
 
+@book{hernan2020causal,
+  address = {Boca Raton, FL},
+  author = {Hern{\'a}n, Miguel A and Robins, James M},
+  publisher = {Chapman \& Hall/CRC},
+  title = {Causal {I}nference: {What} if},
+  year = {2020}
+}
+
+@book{pearl2009causality,
+  address = {Cambridge},
+  author = {Judea Pearl},
+  day = {14},
+  isbn = {9780521895606},
+  month = {09},
+  pagecount = {464},
+  publisher = {Cambridge university press},
+  subtitle = {Models, Reasoning, and Infernce},
+  title = {Causality},
+  year = {2009}
+}
+
 @misc{sharma2020dowhy,
   archiveprefix = {arXiv},
   author = {Amit Sharma and Emre Kiciman},
@@ -78,17 +99,17 @@ @misc{sharma2020dowhy
 }
 
 @article{somers2024configuration,
-  doi = {10.2139/ssrn.4732706},
   author = {Somers, Richard and Walkinshaw, Neil and Hierons, Robert and Elliott, Jackie and Iqbal, Ahmed and Walkinshaw, Emma},
+  doi = {10.2139/ssrn.4732706},
   publisher = {Elsevier BV},
   title = {Configuration Testing of an Artificial Pancreas System Using a Digital Twin},
   year = {2024}
 }
 
 @article{textor2017dagitty,
+  author = {Textor, Johannes and van der Zander, Benito and Gilthorpe, Mark S. and Liśkiewicz, Maciej and Ellison, George T.H.},
   doi = {10.1093/ije/dyw341},
   issn = {1464-3685},
-  author = {Textor, Johannes and van der Zander, Benito and Gilthorpe, Mark S. and Liśkiewicz, Maciej and Ellison, George T.H.},
   journal = {International Journal of Epidemiology},
   month = {jan},
   pages = {dyw341},
diff --git a/paper/paper.md b/paper/paper.md
index 27494c3b..a411aeb7 100644
--- a/paper/paper.md
+++ b/paper/paper.md
@@ -27,7 +27,7 @@ authors:
   - name: Richard Somers
     orcid: 0009-0009-1195-1497
     affiliation: 1
-  - name: Nicholas Lattimer
+  - name: Nicholas Latimer
     orcid: 0000-0001-5304-5585
     affiliation: 1
   - name: Neil Walkinshaw
@@ -45,7 +45,7 @@ bibliography: paper.bib
 
 # Summary
 Scientific models possess several properties that make them notoriously difficult to test, including a complex input space, long execution times, and non-determinism, rendering existing testing techniques impractical.
-In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal Inference (CI) has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse observational data instead of costly randomised trials.
+In fields such as epidemiology, where researchers seek answers to challenging causal questions, a statistical methodology known as Causal Inference (CI) [@pearl2009causality,@hernan2020causal] has addressed similar problems, enabling the inference of causal conclusions from noisy, biased, and sparse observational data instead of costly randomised trials.
 CI works by using domain knowledge to identify and mitigate for biases in the data, enabling them to answer causal questions that concern the effect of changing some feature on the observed outcome.
 The Causal Testing Framework (CTF) is a software testing framework that uses CI techniques to establish causal effects between software variables from pre-existing runtime data rather than having to collect bespoke, highly curated datasets especially for testing.
 
@@ -56,18 +56,18 @@ Nondeterministic software can be tested using Statistical Metamorphic Testing [@
 However, this requires the software to be executed repeatedly for each set of parameters of interest, so is computationally expensive, and is constrained to testing properties over software inputs that can be directly and precisely controlled.
 Statistical Metamorphic Testing cannot be used to test properties that relate internal variables or outputs to each other, since these cannot be controlled a priori.
 
-By employing domain knowledge in the form of a causal graph --- a lightweight model specifying the expected relationships between key software variables --- the CTF circumvents both of these problems by enabling models to be tested using pre-existing runtime data.
+By employing domain knowledge in the form of a causal graph --- a lightweight model specifying the expected relationships between key software variables --- the CTF overcomes the limitations of Statistical Metamorphic Testing by enabling models to be tested using pre-existing runtime data.
 The CTF is written in Python but is language agnostic in terms of the system under test.
 All that is required is a set of properties to be validated, a causal model, and a set of software runtime data.
 
 # Causal Testing
-Causal Testing [@clark2023testing] has four main steps, outlined in \ref{fig:schematic}.
-Firstly, the user supplies a causal model, which takes the form of a directed acyclic graph (DAG) where an edge $X \to Y$ represents variable $X$ having a direct causal effect on variable $Y$.
+Causal Testing [@clark2023testing] has four main steps, outlined in Figure \ref{fig:schematic}.
+Firstly, the user supplies a causal model, which takes the form of a directed acyclic graph (DAG) [@pearl2009causality] where an edge $X \to Y$ represents variable $X$ having a direct causal effect on variable $Y$.
 Secondly, the user supplies a set of causal properties to be tested.
 Such properties can be generated from the causal DAG [@clark2023metamorphic]: for each $X \to Y$ edge, a test to validate the presence of a causal effect is generated, and for each missing edge, a test to validate independence is generated.
 The user may also refine tests to validate the nature of a particular relationship.
 Next, the user supplies a set of runtime data in the form of a table with each column representing a variable and rows containing the value of each variable for a particular run of the software.
-Finally, the CTF automatically validates the causal properties by using the causal DAG and data to calculate a causal effect estimate, and validating this against the expected causal relationship.
+Finally, the CTF automatically validates the causal properties by using the causal DAG to identify a statistical estimand [@pearl2009causality] (essentially a set of features in the data which must be controlled for), calculate a causal effect estimate from the supplied data, and validating this against the expected causal relationship.
 
 ![Causal Testing workflow.\label{fig:schematic}](../images/schematic.png)
 
@@ -78,12 +78,12 @@ The CTF instead evaluates the adequacy of a particular dataset by calculating a
 ## Missing Variables
 Causal Testing works by using the causal DAG to identify the variables that need to be statistically controlled for to remove their biassing effect on the causal estimate.
 This typically means we need to know their values.
-However, where such biassing variables are not recorded in the data, the Causal Testing Framework can still sometimes estimate unbiased causal effects by using Instrumental Variables, an advanced Causal Inference technique.
+However, where such biassing variables are not recorded in the data, the Causal Testing Framework can still sometimes estimate unbiased causal effects by using Instrumental Variables [@hernan2020causal], an advanced Causal Inference technique.
 
 ## Feedback Over Time
 Many scientific models involve iterating several interacting processes over time.
 These processes often feed into each other, and can create feedback cycles.
-Traditional CI cannot handle this, however the CTF uses a family of advanced CI techniques, called g-methods, to enable the estimation of causal effects even when there are feedback cycles between variables.
+Traditional CI cannot handle this, however the CTF uses a family of advanced CI techniques, called g-methods [@hernan2020causal], to enable the estimation of causal effects even when there are feedback cycles between variables.
 
 # Related Work
 The Dagitty tool [@textor2017dagitty] is a browser-based environment for creating, editing, and analysing causal graphs.