Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock-in #173

Closed
wants to merge 3 commits into from
Closed

Lock-in #173

wants to merge 3 commits into from

Conversation

matthuszagh
Copy link
Contributor

@matthuszagh matthuszagh commented Nov 19, 2020

This PR adds lock-in amplifier functionality to stabilizer. There are 3 primary public functions that a processing sequence in stabilizer might use. prefilt performs demodulation and decimation. postfilt_iq performs the same steps as prefilt, but additionally performs biquad IIR filtering before decimation. This generates the in-phase (I) and quadrature (Q) signal components (also frequently called X and Y). postfilt_at performs the same steps as postfilt_iq, but additionally converts the I and Q components to magnitude and phase values. There is an additional public function, arr, which is a const function for computing the ARR register overflow value from the ADC sampling period and number of ADC samples in a batch. There is also a public struct TimeStamp. A mutable array of 2 of these is used to keep track of the last 2 external reference clock counter values.

Notable Changes

  1. I separated the IIR code into its own separate subproject to use it in the lock-in code.

TODO

  • Currently, the postfilt functions filter before decimating. However, @jordens discussed performing a boxcar average and then filtering. The code is in place (but commented out) to do this. It also yields significant (~30% for 16 ADC batch samples) performance gains in terms of total processing latency since filtering only needs to be performed on 1 sample rather than the full batch.
  • low_pass.rs provides integration tests that test the lock-in functionality in a traditional lock-in low-pass filtering case. Part of this test involves computing acceptable tolerances for the amplitude and phase values for various configurations (noise frequencies and amplitudes, filter cutoff frequency, etc.). Although these computations seem to provide reasonable tolerance values, I'm not absolutely convinced that they're technically correct. Any feedback on this is appreciated.
  • I've included annotations giving timing latencies for major contributors in the DSP processing chain (based on a GPIO toggle). I've left these in because I expect they may be helpful in the review process. However, they'll probably be removed before the actual merge.
  • I've added several global constants (e.g., for the ADC batch size). These will be defined as part of PR #165 and can be removed from the lockin code at that point.

Finally, it's worth noting that this code would benefit from const generics. This would allow testing for different batch and output sizes and would no longer require using global constants. Unfortunately, we have to wait several months before const generics will be stabilized.

Related:

@matthuszagh matthuszagh changed the title Add lock-in DSP code Lock-in Nov 19, 2020
Copy link
Member

@ryan-summers ryan-summers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only did a structural pass at this point, but it seems like it may make more sense to make a single dsp/digital_signal_processing crate at the root level that contains iir and the lockin functionality.

@@ -40,6 +40,7 @@ embedded-hal = "0.2.4"
nb = "1.0.0"
asm-delay = "0.9.0"
enum-iterator = "0.6.0"
iir = { path = "iir" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @ryan-summers mentions, I would also lean towards putting all the generic DSP code into a crate. iir should ultimately receive tests like the other things.

@@ -0,0 +1,86 @@
/* origin: FreeBSD /usr/src/lib/msun/src/s_cosf.c */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying these over from libm should be avoided and will probably bit-rot quickly because there are no tests and there is no wider usage and little maintenance/debugging going on. Let's get a solution that works in the long run.

Could you look into alternative solutions? I.e. adding them to libm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've raised an issue in libm about this.


/// Compute the reference signal phase increment for each ADC sample
/// and use them to demodulate the ADC inputs.
macro_rules! prefilt_no_decimate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see better function name choices. Fewer abbreviations (prefilt, demod) and more well-known DSP terms.
I would not use the function name to document what a function doesn't do.

/// `i` - In-phase signal.
/// `q` - Quadrature signal.
fn iq_to_a(i: f32, q: f32) -> f32 {
2. * sqrt(pow2(i) + pow2(q))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. * sqrt(pow2(i) + pow2(q))
sqrt(pow2(i) + pow2(q))

The factor of two is wrong here.
The lack of gain you are seeing needs to be compensated in the IIR filter (gain=2). The apparent amplitude loss arises from the IIR filter removing the sum frequency signal.
Imagine the demodulation frequency is zero. Specifically in that case you need to have the overall gain configurable as you would let the IIR filter have gain=1.

And this is merely absolute() or rms(). No need to invent a new name.

/// Arithmetic modulo operation. This requires that the dividend is
/// non-negative and the divisor is positive. Although this may work
/// in some other cases, these cases are not supported.
fn modulo<T: SubAssign + Ord + Copy + From<u8>>(dividend: T, divisor: T) -> T {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most naive modulo implementation and probably the slowest. Isn't the usual x - y * (x/y) already much better?

/// and use them to demodulate the ADC inputs.
macro_rules! prefilt_no_decimate {
( $adc_samples:expr, $ref_tstamps:expr, $valid_tstamps:expr, $phi:expr, $tadc:expr, $fscale:expr, $tstamps_mem:expr) => {{
record_new_tstamps($ref_tstamps, $valid_tstamps, $tstamps_mem);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shifts the timestamps into a two-deep fifo. Maybe with more concise terminology this can be formulated better.

///
/// * `tstamps_mem` - Recorded TimeStamps.
/// * `overflow_count` - Max timestamp value.
fn tstamps_diff(tstamps_mem: &[TimeStamp; 2], overflow_count: u32) -> u32 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these were just i32, it would be a simple difference, right? Inline then.

let empty_sequences =
tstamps_mem[1].sequences_old - tstamps_mem[0].sequences_old - 1;

rem0 + rem1 + overflow_count * empty_sequences as u32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you write that out on paper and simplify it, this becomes clear and easier to read. It also becomes obvious that here you are implementing a number representation and math on it.

/// * `ref_tstamps` - New timestamp values.
/// * `valid_tstamps` - Number of valid timestamps.
/// * `tstamps_mem` - Last 2 recorded timestamps.
fn record_new_tstamps(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two-timestamp storage should probably be just [Option<i32>; 2].

Comment on lines +355 to +373
let tref_count: u32 = tstamps_diff(tstamps_mem, overflow_count);
let mut thetas: [f32; SAMPLE_BUFFER_SIZE] = [0.; SAMPLE_BUFFER_SIZE];
let mut theta_count: u32;

// 68ns
if tstamps_mem[0].sequences_old == 0 {
theta_count = tref_count - first_t;
} else {
theta_count = tstamps_diff(
&[
TimeStamp {
count: 0,
sequences_old: 0,
},
tstamps_mem[0],
],
overflow_count,
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate/document this? I have a hard time understanding.

use core::ops::SubAssign;
use core::f32::consts::PI;

extern crate libm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extern crate limb is considered unidiomatic with rust 2018, since libm is inherently included because of Cargo.toml

/// Compute the reference signal phase increment for each ADC sample
/// and use them to demodulate the ADC inputs.
macro_rules! prefilt_no_decimate {
( $adc_samples:expr, $ref_tstamps:expr, $valid_tstamps:expr, $phi:expr, $tadc:expr, $fscale:expr, $tstamps_mem:expr) => {{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a macro? It seems like this could just as validly be a function? We can likely rely on the optimizer to inline for us if we're worried about performance.

Copy link
Contributor Author

@matthuszagh matthuszagh Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a macro because it can return prematurely when ts_valid_count < 2. That return needs to be from the function in which the macro is invoked. Is there a better way to handle this? I could, of course, check the return in the containing function, but that seems like an unnecessary check. In any event, the organization is going to change quite a bit, so this will probably go away anyway.

Copy link
Member

@ryan-summers ryan-summers Nov 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highly recommend instead using a Result() return value. The function that calls this can then check the return value for an Ok(data) or an Err(Error::NotReady). It cleans up the code a lot here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a major structural rewrite of this code (in line with your suggestions) and so this (and a bunch of other parts of this code) have been changed. I'm still cleaning it up, but hopefully some commits will be ready to push soon.

);
let mut sines: [f32; SAMPLE_BUFFER_SIZE] = [0.; SAMPLE_BUFFER_SIZE];
let mut cosines: [f32; SAMPLE_BUFFER_SIZE] = [0.; SAMPLE_BUFFER_SIZE];
for i in 0..SAMPLE_BUFFER_SIZE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for theta in thetas.iter() {
    // ...
}

should be sufficient, shouldn't it?

///
/// * 2.0us to compute ADC phase values
/// * 3.4us to compute sin and cos
pub fn prefilt(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would actually benefit a lot by creating a DspFilter object and then processing() ADC samples to generate I/Q.

Sample API:

pub struct DspFilter {
    phase_offset: u32,
    adc_cycles: u32,
    frequency_scale: u32,
    timestamps: [TimeStamp; 2].
}

impl DspFilter {
    pub fn prefilter(adc_samples: &[i16], timestamps: &[u16]) -> Result<([f32; OUTPUT_BUFFER_SIZE], [f32; OUTPUT_BUFFER_SIZE]), Error> {
        // ...
    }

    // Etcetera
}

I believe Rust may be able to analyze things better and recognize that certain parameters are actually const as well.


let mut n: usize = 0;
let mut k: usize = 0;
while n < SAMPLE_BUFFER_SIZE {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do something like

for (src, dest) in i.iter().step_by(n_k).zip(res_i.iter_mut()) {
    *dest = src;
}

) -> [f32; SAMPLE_BUFFER_SIZE] {
let overflow_count: u32 = tadc * SAMPLE_BUFFER_SIZE as u32;
let tref_count: u32 = tstamps_diff(tstamps_mem, overflow_count);
let mut thetas: [f32; SAMPLE_BUFFER_SIZE] = [0.; SAMPLE_BUFFER_SIZE];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place variable declarations immediately where they're used so referencing what they are is simpler.

thetas[0] = real_phase(theta_count, fscale, tref_count, phi);
for i in 1..SAMPLE_BUFFER_SIZE {
theta_count += tadc;
thetas[i] = real_phase(theta_count, fscale, tref_count, phi);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thetas.iter().enumerate().map(|(i, theta)| {
    real_phase(theta_count + i * tadc, fscale, tref_count, phi)
});

or something similar

);
}

thetas[0] = real_phase(theta_count, fscale, tref_count, phi);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend a different name than thetas. Instead, why not make it sample_phase or something similar, so it's clear what it's referring to?

#[path = "k_cosf.rs"]
mod k_cosf;
#[path = "rem_pio2f.rs"]
mod rem_pio2f;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making a trig/ folder with all of the routines in it would be more idiomatic and help to keep a clean directory structure as well.

/// * `theta` - Angle for which sine is computed. Must be in range
/// [0,2*PI).
pub fn sin(theta: f32) -> f32 {
return sinf(theta);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why all the wrappers? If you don't like the sinf name, you can just do

use sinf::sinf as sin;

@jordens
Copy link
Member

jordens commented Nov 22, 2020

As mentioned in the review, I'd suggest the following:

  • Prepare a minimal PR moving the IIR code into a dsp crate.
  • Rewrite your DSP code using the review comments above. That should shrink it dramatically. Split it into manageable parts if possible. Make a (or multiple) PR adding it, using libm for trig.
  • Factor out the libm/trig impl discussion into a separate issue.

@matthuszagh
Copy link
Contributor Author

Thanks for the additional information @jordens. I've created a PR moving the iir module to a dsp crate here.

I've done a full rewrite of this PR based on the feedback. As you expected, this shrinks the code quite a bit. I'll push this soon, once I've had some more time to clean it up and review it. This will use real phase and libm.

Once I've pushed the changes to this PR, I'll open a separate issue to discuss the demodulation phase + trig issue (integer vs real phase, trig implementation considerations, and whether a PLL might be cheaper), with associated analysis.

bors bot added a commit that referenced this pull request Nov 22, 2020
174: move iir to new dsp crate r=jordens a=matthuszagh

As mentioned [here](#173 (comment)).

Co-authored-by: Matt Huszagh <[email protected]>
@matthuszagh matthuszagh mentioned this pull request Nov 24, 2020
@matthuszagh
Copy link
Contributor Author

Closed, in favor of #177.

@matthuszagh matthuszagh deleted the lockin branch December 5, 2020 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants