Add a section on performance evaluation

TensorBFS · Sep 11, 2023 · 82e4877 · 82e4877
1 parent fc447dc
commit 82e4877
Show file tree

Hide file tree

Showing 19 changed files with 2,716 additions and 1 deletion.
diff --git a/paper/paper.bib b/paper/paper.bib
@@ -305,4 +305,39 @@ @software{Jutho2023
   version      = {v4.0.0},
   doi          = {10.5281/zenodo.8166121},
   url          = {https://doi.org/10.5281/zenodo.8166121}
-}
+}
+
+@online{marinescu2022merlin,
+  author = {Radu Marinescu},
+  title = {Merlin},
+  year = 2022,
+  url = {\url{https://www.ibm.com/opensource/open/projects/merlin/} [Accessed: 11 September 2023]},
+  urldate = {2023-09-11}
+}
+
+@article{mooij2010libdai,
+  author    = {Joris M. Mooij},
+  title     = {lib{DAI}: A Free and Open Source {C++} Library for Discrete Approximate Inference in Graphical Models},
+  journal   = {Journal of Machine Learning Research},
+  year      = 2010,
+  month     = Aug,
+  volume    = 11,
+  pages     = {2169-2173},
+  url       = "http://www.jmlr.org/papers/volume11/mooij10a/mooij10a.pdf"
+}
+
+@online{gal2010summary,
+  title = {Summary of the 2010 {UAI} approximate inference challenge},
+  year = 2010,
+  howpublished = {\url{https://www.cs.huji.ac.il/project/UAI10/summary.php} [Accessed: 11 September 2023]},
+  urldate = {2021-08-21},
+  author = {Elidan, Gal and Globerson, Amir},
+}
+
+@online{gogate2014uai,
+  title = {{UAI} 2014 {Probabilistic} {Inference} {Competition}},
+  year = 2014,
+  howpublished = {\url{https://www.ics.uci.edu/~dechter/softwares/benchmarks/Uai14/UAI_2014_Inference_Competition.pdf} [Accessed: 11 September 2023]},
+  urldate = {2023-09-11},
+  author = {Gogate, Vibhav},
+}
diff --git a/paper/paper.md b/paper/paper.md
@@ -129,6 +129,40 @@ networks. By harnessing the best of both worlds, `TensorInference.jl` aims to
 enhance the performance of probabilistic inference, thereby expanding the
 tractability spectrum of exact inference for more complex, real-world models.
 
+# Performance evaluation
+
+\autoref{fig:performance-evaluation} illustrates a comparison of the runtime
+performance of `TensorInference.jl` against `Merlin` [@marinescu2022merlin],
+`libDAI` [@mooij2010libdai], and `JunctionTrees.jl`
+[@roa2022partial;@roa2023scaling] libraries. We selected `Merlin` and `libDAI`
+based on the following criteria: open-source availability, extensive
+documentation, and representation of standard practices in the field. Both of
+these libraries have previously participated in UAI inference competitions
+[@gal2010summary;@gogate2014uai], achieving favorable results. Additionally,
+we included two versions of `JunctionTrees.jl`, the predecessor of
+`TensorInference.jl`. The first version does not employ tensor technology,
+while the second version optimizes individual sum-product computations using
+tensors-based technology.
+
+The benchmark problems are arranged along the x-axis in ascending order of
+complexity, measured by the induced tree width. On average,
+`TensorInference.jl` achieves a speedup of 11 times across all problems.
+Notably, for the 10 most complex problems, the average speedup increases to 63
+times, highlighting its superior scalability. It's worth noting that the
+`TensorInference.jl` method incurs a computational overhead that may result in
+a slowdown in probabilistic inference when the problem's complexity is
+relatively low compared to the other libraries. However, as the problem
+complexity increases, this overhead becomes negligible. In such cases, our
+method can often deliver performance improvements that are several orders of
+magnitude greater.
+
+![Speedup achieved by `TensorInference.jl`, relative to `Merlin`
+[@marinescu2022merlin], `libDAI` [@mooij2010libdai], and `JunctionTrees.jl`
+[@roa2022partial;@roa2023scaling] for the UAI 2014 inference competition
+benchmark problems. The experiments were conducted on an Intel Core i9--9900K
+CPU \@3.60GHz with 64 GB of RAM. \label{fig:performance-evaluation}
+](scripts/performance-evaluation/out/co23/2023-09-10--20-20-45/performance-evaluation.svg){width=80%}
+
 # Usage example
 
 The graph below corresponds to the *ASIA network* [@lauritzen1988local], a

diff --git a/paper/paper.pdf b/paper/paper.pdf
diff --git a/paper/scripts/performance-evaluation/Artifacts.toml b/paper/scripts/performance-evaluation/Artifacts.toml
@@ -0,0 +1,13 @@
+[uai2014]
+git-tree-sha1 = "199ed43697fe22447c6c64a939b222fd4073f2d0"
+
+    [[uai2014.download]]
+    sha256 = "5d93ced227cff3eb2ae7feb77dcb6c780212b47a0c0355dda8439de6f5b9d369"
+    url = "https://github.com/mroavi/uai-2014-inference-competition/raw/main/uai2014.tar.gz"
+
+[uai2014-mar]
+git-tree-sha1 = "480aabc22378f9edaa9cd24798de9f416c7d1a49"
+
+    [[uai2014.download]]
+    sha256 = "dd2265fe93eac73a3430f1d98bcd13162ca079f3a9cf7fa529b9c39c4534e671"
+    url = "https://gist.github.com/mroavi/8d38625bd8731cefc6788b941256cab3/raw/480aabc22378f9edaa9cd24798de9f416c7d1a49.tar.gz"
diff --git a/paper/scripts/performance-evaluation/Project.toml b/paper/scripts/performance-evaluation/Project.toml
@@ -0,0 +1,10 @@
+[deps]
+ArgParse = "c7e460c6-2fb9-53a9-8c5b-16f535851c63"
+Artifacts = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"
+BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
+CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
+DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
+JunctionTrees = "b732b382-80b5-46a8-aa9c-7d077ae04823"
+PGFPlotsX = "8314cec4-20b6-5062-9cdb-752b83310925"
+StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
+TensorInference = "c2297e78-99bd-40ad-871d-f50e56b81012"
diff --git a/paper/scripts/performance-evaluation/generate-graph.jl b/paper/scripts/performance-evaluation/generate-graph.jl
@@ -0,0 +1,213 @@
+# Color palettes: 
+# - https://github.com/matplotlib/matplotlib/issues/9460#issuecomment-875185352
+# - https://personal.sron.nl/~pault/#tab:blindvision
+# - https://yoshke.org/blog/colorblind-friendly-diagrams
+# - https://davidmathlogic.com/colorblind
+# - https://www.color-hex.com/color-palette/1018347
+
+using PGFPlotsX, CSV, DataFrames, Artifacts, StatsBase
+using JunctionTrees: get_td_soln
+
+push!(
+  PGFPlotsX.CUSTOM_PREAMBLE,
+  """
+\\usepackage[T1]{fontenc}
+\\usepackage{xcolor}
+\\usepackage{tikz}
+\\usepackage{pgfplots}
+\\usepackage{amsmath,amssymb}
+% Bright qualitative colour scheme that is colour-blind safe
+% https://personal.sron.nl/~pault/#tab:blindvision
+\\definecolor{c01}{HTML}{4477AA}
+\\definecolor{c02}{HTML}{EE6677}
+\\definecolor{c03}{HTML}{228833}
+\\definecolor{c04}{HTML}{CCBB44}
+\\definecolor{c05}{HTML}{66CCEE}
+\\definecolor{c06}{HTML}{AA3377}
+\\definecolor{c07}{HTML}{BBBBBB}
+\\definecolor{c08}{HTML}{BBBBBB}
+\\usepackage{fontsetup}
+\\setmonofont{Hack}
+"""
+)
+
+# data_file = ARGS[1]
+
+# DEBUG
+data_file = "./out/co23/2023-09-10--20-20-45/out.csv"
+
+df1 =
+  CSV.File(data_file) |> # read benchmark data from file
+  DataFrame |> # convert it to a data_file frame
+   x -> unstack(x, :problem, :library, :execution_time) |> # convert from long to wide format (create a column for each possible `library` value)
+   dropmissing # drop rows with missing values
+
+df2 =
+  map(x -> joinpath(artifact"uai2014-mar", x * ".tamaki.td"), df1.problem) |> # create absolute filepaths for each problem in `df`
+  x -> get_td_soln.(x) |> # get the tree decomposition solution line ([nbags, :largest_bag_size, nvars]) for each problem 
+  x -> DataFrame(x=[i for i in x]) |> # create data frame using the constructor for vector of vectors
+  x -> select(x, :x => AsTable) |> # https://www.juliabloggers.com/handling-vectors-of-vectors-in-dataframes-jl/
+  x -> rename(x, [:nbags, :largest_bag_size, :nvars]) # rename columns
+
+df =
+  hcat(df1, df2) |> # horizontally concat data frame
+  x -> sort(x, [:largest_bag_size, :nvars, :nbags]) |> # sort the rows based on the largest bag size
+  x -> transform(x, [:libdai, :TensorInference] => (./) => :libdai_ti_speedup) |>
+  x -> transform(x, [:merlin, :TensorInference] => (./) => :merlin_ti_speedup) |>
+  x -> transform(x, [:JunctionTrees_v1, :TensorInference] => (./) => :jtv1_ti_speedup) |>
+  x -> transform(x, [:JunctionTrees_v2, :TensorInference] => (./) => :jtv2_ti_speedup)
+
+labels =
+  df.problem |>
+  x -> match.(r"[a-zA-Z]+", x) |>
+  x -> getfield.(x, :match)
+
+labels_unique = unique(labels) |> sort
+xmax = maximum(df.largest_bag_size) + 1
+
+@pgf tp = Axis(
+  {
+    # title="TensorInference.jl Speedup",
+    xmin = 0,
+    xmax = xmax,
+    xlabel = "Largest cluster size",
+    xmajorgrids = true,
+    ymin = 0,
+    ymax = 1000000,
+    ymode = "log",
+    ytick = [1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4, 1e5],
+    ymajorgrids = true,
+    ylabel = "Run time speedup",
+    label_style = {font = raw"\footnotesize"},
+    tick_label_style = {font = raw"\footnotesize"},
+    "scatter/classes" = {
+      # Warning: These classes be defined in sorted to keep the correspondance with `labels_unique`
+      Alchemy = {
+        mark = "x",
+      },
+      CSP = {
+        mark = "+",
+      },
+      # DBN = {
+      #   mark = "square"
+      # },
+      Grids = {
+        mark = "asterisk",
+      },
+      ObjectDetection = {
+        mark = "-",
+      },
+      Pedigree = {
+        mark = "triangle",
+      },
+      Promedus = {
+        mark = "o",
+        # mark = "square",
+        # mark = "pentagon",
+      },
+      Segmentation = {
+        mark = "Mercedes star",
+      },
+      linkage = {
+        mark = "diamond",
+        # mark = "|",
+      },
+    },
+    legend_style = {
+      legend_columns = 3,
+      at = Coordinate(0.51, -0.4),
+      anchor = "south",
+      draw = "none",
+      font = raw"\footnotesize",
+      column_sep = 1.5,
+    },
+  },
+  Plot(
+    {
+      c01,
+      scatter,
+      "only marks",
+      "scatter src" = "explicit symbolic",
+      "legend image post style" = "black", "legend style" = {text = "black", font = raw"\footnotesize"},
+    },
+    Table(
+      {
+        meta = "label"
+      },
+      x=df.largest_bag_size,
+      y=df.libdai_ti_speedup,
+      label=labels,
+    ),
+  ),
+  Plot(
+    {
+      c02,
+      scatter,
+      "only marks",
+      "scatter src" = "explicit symbolic",
+    },
+    Table(
+      {
+        meta = "label"
+      },
+      x=df.largest_bag_size,
+      y=df.merlin_ti_speedup,
+      label=labels,
+    ),
+  ),
+  Plot(
+    {
+      c03,
+      scatter,
+      "only marks",
+      "scatter src" = "explicit symbolic",
+    },
+    Table(
+      {
+        meta = "label"
+      },
+      x=df.largest_bag_size,
+      y=df.jtv1_ti_speedup,
+      label=labels,
+    ),
+  ),
+  Plot(
+    {
+      c04,
+      scatter,
+      "only marks",
+      "scatter src" = "explicit symbolic",
+    },
+    Table(
+      {
+        meta = "label"
+      },
+      x=df.largest_bag_size,
+      y=df.jtv2_ti_speedup,
+      label=labels,
+    ),
+  ),
+  HLine({ dashed, black }, 1), # See: https://kristofferc.github.io/PGFPlotsX.jl/v1/examples/convenience/
+  Legend(labels_unique),
+  # Library legend (manually made with LaTeX code. See: https://kristofferc.github.io/PGFPlotsX.jl/v1/examples/latex/)
+  [raw"\node ",
+    {
+      draw = "black",
+      fill = "white",
+      font = raw"\scriptsize",
+      # pin = "outlier"
+    },
+    " at ",
+    Coordinate(5.5, 30000), # warning: hardcoded!
+    raw"{\shortstack[l] { $\textcolor{c01}{\blacksquare}$ libDAI \\ $\textcolor{c02}{\blacksquare}$ Merlin \\ $\textcolor{c03}{\blacksquare}$ JunctionTrees.jl-v1  \\ $\textcolor{c04}{\blacksquare}$ JunctionTrees.jl-v2}};"
+  ]
+)
+
+println("Geometric mean of the speedup: $(geomean(vcat(df.libdai_ti_speedup, df.merlin_ti_speedup, df.jtv1_ti_speedup, df.jtv2_ti_speedup)))")
+println("Geometric mean of the speedup of the last 10 problems: $(geomean(vcat(last(df.libdai_ti_speedup, 10), last(df.merlin_ti_speedup, 10), last(df.jtv1_ti_speedup, 10), last(df.jtv2_ti_speedup, 10))))")
+
+output_file = joinpath(dirname(data_file), "performance-evaluation.svg")
+pgfsave(output_file, tp; include_preamble=true, dpi=150)
+
+# DEBUG
+display(tp)
diff --git a/paper/scripts/performance-evaluation/out/co23/2023-09-10--20-20-45/libdai.log.txt b/paper/scripts/performance-evaluation/out/co23/2023-09-10--20-20-45/libdai.log.txt
@@ -0,0 +1,2 @@
+┌ Info: SubString{String}["Solving inference problem...", "Total process time: 8856.916000 ms.", "Used time: 9.85718 seconds.", ""]
+└ @ Main /home/20180043/repos/Probabilistic-Inference-in-the-Era-of-Tensor-Networks-and-Differential-Programming/scripts/benchmarks/mar/ti-vs-jtv1-jtv2-vs-merlin-vs-libdai/run-libdai.jl:40