From 22154a53abd16eeafdec0137c6b63e573a8f3d3b Mon Sep 17 00:00:00 2001
From: Brett Viren <brett.viren@gmail.com>
Date: Tue, 28 Mar 2023 11:27:03 -0400
Subject: [PATCH] Fix #202 (again again)

The shift was due to EmpiricalNoiseModel failing to follow the rules
and return a spectrum of the configured size.  Instead, it returns a
size based on "best fft size" near the requested.  The
IncoherentAddNoise was too trusting and so effectively stretched the
spectrum by the difference.  IncoherentAddNoise now accepts the wrong
size spectrum and truncates the generated noise waveform to the
appropriate size.
---
 gen/docs/noise.org                      | 52 ++++++++++++-----
 gen/src/AddNoise.cxx                    | 21 +++++--
 gen/src/GroupNoiseModel.cxx             | 17 +++++-
 gen/test/test-addnoise.bats             | 77 +++++++++++++++++++++++--
 gen/test/test-addnoise.jsonnet          | 59 ++++++++++++-------
 gen/test/test-noise-groups-pdsp.jsonnet | 14 +++++
 6 files changed, 191 insertions(+), 49 deletions(-)
 create mode 100644 gen/test/test-noise-groups-pdsp.jsonnet

diff --git a/gen/docs/noise.org b/gen/docs/noise.org
index 0a5a81366..e2ff99372 100644
--- a/gen/docs/noise.org
+++ b/gen/docs/noise.org
@@ -2,33 +2,53 @@
 
 * Overview
 
-WCT provides a single "noise model" which perhaps is best labeled a "sampled spectrum" model.
-It is rather simple but covers all known forms of noise. 
+WCT provides a "sampled spectrum noise model" which is parametererized by two types of information:
 
-The model is parameterized by three pieces of information, all of which are provided by the user:
+- A set of "median noise spectra".
 
-- one or more "mean noise spectra"
+- An associative mapping between channel and spectrum.
 
-- a mapping from channel to spectrum
+The WCT implementation is structured into two layers.
 
-- a mapping strategy
+1. A "model" component provides a median noise spectrum given a key (channel or group).
 
-The kernel of the model performs a sampling of a spectrum to generate a voltage-level waveform.
-The sampling has two steps.  First, each frequency bin of the mean spectrum is fluctuated according to the Rayleigh distribution.  To this fluctuated spectrum an inverse discrete Fourier transform is applied to produce a novel voltage-level waveform.
+2. An "adder" component generates a noise waveform and associates it to one or more channels.
 
-The resulting waveform is then added to one or more channels via a mapping and tha mapping strategy.
+* Noise generation
 
-There are two mapping strategies:
+The kernel of the noise generation returns a fluctuated noise waveform given a real-valued median amplitude spectrum.  Median and not mean is the proper statistic to supply.
+The sampling of the distribution characterized by the median spectrum has two steps. 
 
-- the "incoherent" strategy adds a single waveform to a single channel via the mapping. A channel is mapped directly to a spectrum from which the waveform was generated.
+1. Each frequency bin of the median spectrum is fluctuated to produce a complex amplitude sample.
 
-- the "coherent" strategy adds a single waveform to each in a group of channels. A channel is mapped to a "group" and the "group" is mapped to a spectrum.
+2. An inverse discrete Fourier transform is applied to the spectrum of samples to produce the noise waveform.
 
-There is a WCT graph component that implements each strategy: ~IncoherentAddNoise~ and ~CoherentAddNoise~, respectively.[fn:old] 
+The fluctuation at each frequency is produced by twice sampling a Gaussian distribution of mean zero and sigma equal to the median.  These samples form the real and imaginary part of a complex valued spectral sample.  Equivalently, such samples have Rayleigh-distributed complex amplitude and uniform-distributed complex phase.  This sampling motivates supplying a median and not a mean spectrum.
 
-A WCT graph may, of course, apply multiple instances of either or both components.  This allows both types of noises to be added and, for example, interleaving of different types of coherent noise.
+* Noise adders
 
-Note, it is left to the user to provide properly normalized spectra so that when combined there is no "double counting" of noise power.
+There are two types of noise adders:
 
+- ~IncoherentAddNoise~ :: adds an independent noise waveform to each channel.  The ~AddNoise~ type is an alias for this type.
+- ~CoherentAddNoise~ :: adds an independent noise waveform to each in a group of channels.
+
+* Noise spectrum models
+
+There are two types of noise spectrum models:
+
+** ~EmpiricalNoiseModel~
+
+This spectral model associates an initial median spectrum and a set of post-hoc transformations of that spectrum to a channel.  It expects as input a set of initial median spectra each of which are identified by a plane number and a wire length and summary information.  The post-hoc transformations are the following:
+
+- The initial spectrum for a given channel is formed as the interpolation between provided spectra based on their total wire lengths and that of the channel.
+- An additional white-noise component may be specified for each input spectrum, this too is subject to wire-length interpolation.
+- A pair of gain and shaping time values may be associated with an input spectrum and a different pair with each channel and the spectrum will be transformed accordingly. 
+
+The ~EmpiricalNoiseModel~ spectrum model may only be used with the ~IncoherentAddNoise~ noise adder.
+
+** ~GroupNoiseModel~
+
+This spectral model associates a channel to a group and a group to a spectrum.  No post-hoc wire-length interpolations nor any transformations are performed on the input spectra.
+
+This spectrum model may be used with ~IncoherentAddNoise~ and ~CoherentAddNoise~
 
-[fn:old] An older ~AddNoise~ was used.  This component type is aliased to ~IncoherentAddNoise~.
diff --git a/gen/src/AddNoise.cxx b/gen/src/AddNoise.cxx
index ab6fd0880..ea8b097a0 100644
--- a/gen/src/AddNoise.cxx
+++ b/gen/src/AddNoise.cxx
@@ -59,6 +59,7 @@ bool Gen::IncoherentAddNoise::operator()(const input_pointer& inframe, output_po
 
     // Limit number of warnings below
     static bool warned = false;
+    static bool warned2 = false;
 
     // Make waveforms of size nsample from each model, adding only
     // ncharge of their element to the trace charge.  This
@@ -76,27 +77,37 @@ bool Gen::IncoherentAddNoise::operator()(const input_pointer& inframe, output_po
 
         for (auto& [mtn, model] : m_models) {
             const auto& spec = model->channel_spectrum(chid);
+            const size_t nspec = spec.size();
 
-            if (spec.empty()) {
+            if (! nspec) {
                 continue;       // channel not in model
             }
+
             // The model spec size may differ than expected nsamples.
             // We could interpolate to correct for that which would
             // slow things down.  Better to correct the model(s) code
             // and configuration.
-            if (not warned and spec.size() != m_nsamples) {
+            if (not warned and nspec != m_nsamples) {
                 log->warn("model {} produced {} samples instead of expected {}, future warnings muted",
-                          mtn, spec.size(), m_nsamples);
+                          mtn, nspec, m_nsamples);
                 warned = true;
             }
 
-            real_vector_t sigmas(m_nsamples);
-            for (size_t ind=0; ind<m_nsamples; ++ind) {
+            const size_t nsigmas = nspec;
+            // const nsigmas = m_nsamples;
+            real_vector_t sigmas(nsigmas);
+            for (size_t ind=0; ind < nsigmas; ++ind) {
                 sigmas[ind] = spec[ind]*sqrt2opi;
             }
+
             auto wave = rwgen.wave(sigmas);
+            if (not warned2 and wave.size() < ncharge) {
+                log->warn("undersized noise {} for input waveform {}, future warnings muted", wave.size(), ncharge);
+                warned2 = true;
+            }
             wave.resize(ncharge);
             Waveform::increase(charge, wave);
+
         }
 
         auto trace = make_shared<SimpleTrace>(chid, intrace->tbin(), charge);
diff --git a/gen/src/GroupNoiseModel.cxx b/gen/src/GroupNoiseModel.cxx
index 87e69c930..c21e157dd 100644
--- a/gen/src/GroupNoiseModel.cxx
+++ b/gen/src/GroupNoiseModel.cxx
@@ -96,9 +96,18 @@ void Gen::GroupNoiseModel::configure(const WireCell::Configuration& cfg)
     
     // Read in, resample and intern the spectra.
     m_grp2amp.clear();
+    int count=0; 
     for (const auto& jso : jspectra) {
-        auto jgroup = jso["group"].isNull() ? jso["groupID"] : jso["group"];
-        const int group = jgroup.asInt();
+        // Some noise files lack explicit group ID so simply count groups.
+        int group = count;
+        auto jgroup = jso["group"];
+        if (jgroup.isNull()) {
+            jgroup = jso["groupID"];
+        }
+        if (jgroup.isInt()) {
+            group = jgroup.asInt();
+        }
+        ++count;
 
         // The original waveform size and sampling period
         const size_t norig = jso["nsamples"].asInt();
@@ -107,11 +116,13 @@ void Gen::GroupNoiseModel::configure(const WireCell::Configuration& cfg)
             ++errors;
             continue;
         }
-        const double porig = jso["period"].asInt();
+        const double porig = jso["period"].asDouble();
 
         const auto jfreqs = jso["freqs"];
         const auto jamps = jso["amps"];
         const size_t npts = jfreqs.size();
+        log->debug("loaded group:{} nsamples:{} period:{} nfreqs:{} namps:{}",
+                   group, norig, porig, jfreqs.size(), jamps.size());
 
         const double Fnyquist = 1.0/(2*porig);
         const double Frayleigh = 1.0/(norig*porig);
diff --git a/gen/test/test-addnoise.bats b/gen/test/test-addnoise.bats
index 7b3583089..3a099d8dd 100644
--- a/gen/test/test-addnoise.bats
+++ b/gen/test/test-addnoise.bats
@@ -1,18 +1,85 @@
 
-# The intention is to run this test in multiple releases and compare across releases.
-@test "generate simple noise for comparison with older releases" {
+file_base () {
+    local noise="$1" ; shift
+    local nsamples="${1:-6000}"
 
     local ver="$(wire-cell --version)"
-    local outfile="test-addnoise-${ver}.tar.bz2"
+    local base="$(basename ${BATS_TEST_FILENAME} .bats)"
+    echo "test-addnoise-${ver}-${noise}-${nsamples}"
+}    
+
+make_noise () {
+    local noise="$1" ; shift
+    local nsamples="${1:-6000}"
+
+    local base="$(file_base $noise $nsamples)"
 
-    run wire-cell -l test-addnoise.log -L debug \
-        -A outfile="$outfile" -c gen/test/test-addnoise.jsonnet
+    local cfgfile="${BATS_TEST_FILENAME%.bats}.jsonnet"
 
+    local outfile="${base}.tar.gz"
+    local logfile="${base}.log"
+
+    rm -f "$logfile"            # appends otherwise
+    if [ -s "$outfile" ] ; then
+        echo "already have $outfile" 1>&3
+        return
+    fi
+    run wire-cell -l "$logfile" -L debug \
+        -A nsamples=$nsamples -A noise=$noise -A outfile="$outfile" \
+        -c "$cfgfile"
     echo "$output"
     [[ "$status" -eq 0 ]]
     [[ -s "$outfile" ]]
 }
 
 
+# The intention is to run this test in multiple releases and compare across releases.
+@test "generate simple noise for comparison with older releases" {
+
+
+    
+    for nsamples in 6000
+    do
+        for noise in empno 
+        do
+            make_noise $noise $nsamples
+            local base="$(file_base $noise $nsamples)"
+
+            wirecell-plot comp1d -o comp1d-addnoise-${ver}-${noise}-${nsamples}.pdf \
+                          --markers 'o + x .' -t '*' -n spec \
+                          --chmin 0 --chmax 800 -s --transform ac \
+                          "${base}.tar.gz"
+        done
+    done
+    # wirecell-plot comp1d -o comp1d-addnoise-${ver}-all-all.pdf \
+    #               --markers 'o + x .' -t '*' -n spec \
+    #               --chmin 0 --chmax 800 -s --transform ac \
+    #               "test-addnoise-${ver}-empno-4096.tar.gz" \
+    #               "test-addnoise-${ver}-inco-4096.tar.gz" \
+    #               "test-addnoise-${ver}-empno-6000.tar.gz" \
+    #               "test-addnoise-${ver}-inco-6000.tar.gz"
+
+}
+
+
 
+@test "inco and empno with external spectra file is identical" {
+
+    for nsamples in 4095 4096 6000
+    do
+        for noise in empno inco
+        do
+            make_noise $noise $nsamples
+        done
+        
+        wirecell-plot comp1d \
+                      -o "$(file_base inco+empno $nsamples)-comp1d.pdf" \
+                      --markers 'o .' -t '*' -n spec \
+                      --chmin 0 --chmax 800 -s --transform ac \
+                      "$(file_base inco $nsamples).tar.gz" \
+                      "$(file_base empno $nsamples).tar.gz"
+    
+    done
+    
+}
 
diff --git a/gen/test/test-addnoise.jsonnet b/gen/test/test-addnoise.jsonnet
index 7829e02c4..83aef0da3 100644
--- a/gen/test/test-addnoise.jsonnet
+++ b/gen/test/test-addnoise.jsonnet
@@ -3,7 +3,6 @@
 local wc = import "wirecell.jsonnet";
 local pg = import "pgraph.jsonnet";
 
-local nsamples_generate = 4096;
 local tick = 0.5*wc.us;
 
 // services
@@ -42,36 +41,53 @@ local absurd = pg.pnode({
         nchannels: 2560,
     }}, nin=0, nout=1);
 
-local reframer = pg.pnode({
+local reframer(nsamples) = pg.pnode({
     type: 'Reframer',
     data: {
-        nticks: nsamples_generate,
+        nticks: nsamples,
         anode: wc.tn(anode),
     }}, nin=1, nout=1, uses=[anode]);
 
-local empno = {
-    type: "EmpiricalNoiseModel",
-    name: "empno",
-    data: {
-        anode: wc.tn(anode),
-        chanstat: "",
-        spectra_file: "protodune-noise-spectra-v1.json.bz2",
-        nsamples: nsamples_generate,
-        period: tick,
-        wire_length_scale: 1*wc.cm,
-    }, uses: [anode]
+local pdsp_noise_file = "protodune-noise-spectra-v1.json.bz2";
+
+local noise_models(nsamples) = {
+
+    empno: {
+        type: "EmpiricalNoiseModel",
+        name: "empno",
+        data: {
+            anode: wc.tn(anode),
+            chanstat: "",
+            spectra_file: pdsp_noise_file,
+            nsamples: nsamples,
+            period: tick,
+            wire_length_scale: 1*wc.cm,
+        }, uses: [anode]
+    },
+    inco: {
+        type: "GroupNoiseModel",
+        name: "inco",
+        data: {
+            // This can also be given as a JSON/Jsonnet file
+            spectra: pdsp_noise_file,
+            groups: import "test-noise-groups-pdsp.jsonnet",
+            nsamples: nsamples,
+            tick: tick,
+        }
+    }
 };
 
-local addnoise =
+
+local addnoise(model, nsamples) =
     pg.pnode({
         type: 'AddNoise',
         name: "",
         data: {
             dft: wc.tn(svcs.dft),
             rng: wc.tn(svcs.rng),
-            model: wc.tn(empno),
-            nsamples: nsamples_generate,
-        }}, nin=1, nout=1, uses=[empno, svcs.dft, svcs.rng]);
+            model: wc.tn(model),
+            nsamples: nsamples,
+        }}, nin=1, nout=1, uses=[model, svcs.dft, svcs.rng]);
 
 local digi = pg.pnode({
     type: "Digitizer",
@@ -86,7 +102,10 @@ local digi = pg.pnode({
     }
 }, nin=1, nout=1, uses=[anode]);
 
-function(outfile="test-addnoise.tar.bz2")
+function(outfile="test-addnoise.tar.bz2", noise="empno", nsamples=4096)
+    local int_nsamples = if std.type(nsamples) == "number" then nsamples else std.parseInt(nsamples);
+
+    local nmodel = noise_models(int_nsamples)[noise];
 
     local sink = pg.pnode({
         type: "FrameFileSink",
@@ -96,7 +115,7 @@ function(outfile="test-addnoise.tar.bz2")
             digitize: true,
         },
     }, nin=1, nout=0);
-    local graph = pg.pipeline([absurd, reframer, addnoise, digi, sink]);
+    local graph = pg.pipeline([absurd, reframer(int_nsamples), addnoise(nmodel, int_nsamples), digi, sink]);
     local app = {
         type: 'Pgrapher',
         data: {
diff --git a/gen/test/test-noise-groups-pdsp.jsonnet b/gen/test/test-noise-groups-pdsp.jsonnet
new file mode 100644
index 000000000..9f387cf77
--- /dev/null
+++ b/gen/test/test-noise-groups-pdsp.jsonnet
@@ -0,0 +1,14 @@
+// coherent group noise for PDSP
+local one(gid, start, num) = {
+    plane: "p" + std.toString(gid),
+    groupID: gid,
+    channels: std.range(start, start+num-1),
+};
+[
+    one(1, 0, 400),
+    one(2, 400, 400),
+    one(3, 800, 400),
+    one(4, 1200, 400),
+    one(5, 1600, 480),
+    one(6, 2080, 480),    
+]