Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve arm_correlate_q7 for CM0 #178

Merged
merged 2 commits into from
Jun 24, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 6 additions & 8 deletions Source/FilteringFunctions/arm_correlate_q7.c
Original file line number Diff line number Diff line change
Expand Up @@ -921,15 +921,15 @@ void arm_correlate_q7(
const q7_t *pIn2 = pSrcB + (srcBLen - 1U); /* InputB pointer */
q31_t sum; /* Accumulator */
uint32_t i = 0U, j; /* Loop counters */
uint32_t inv = 0U; /* Reverse order flag */
int32_t inc = 1; /* Destination address modifier */
uint32_t tot = 0U; /* Length */

/* The algorithm implementation is based on the lengths of the inputs. */
/* srcB is always made to slide across srcA. */
/* So srcBLen is always considered as shorter or equal to srcALen */
/* But CORR(x, y) is reverse of CORR(y, x) */
/* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
/* and a varaible, inv is set to 1 */
/* and a varaible, inc is set to -1 */
/* If lengths are not equal then zero pad has to be done to make the two
* inputs of same length. But to improve the performance, we include zeroes
* in the output instead of zero padding either of the the inputs*/
Expand Down Expand Up @@ -968,8 +968,8 @@ void arm_correlate_q7(
srcALen = srcBLen;
srcBLen = j;

/* Setting the reverse flag */
inv = 1;
/* Filling destination in reverse order */
inc = -1;
}

/* Loop to calculate convolution for output length number of times */
Expand All @@ -990,10 +990,8 @@ void arm_correlate_q7(
}

/* Store the output in the destination buffer */
if (inv == 1)
*pDst-- = (q7_t) __SSAT((sum >> 7U), 8U);
else
*pDst++ = (q7_t) __SSAT((sum >> 7U), 8U);
*pDst = (q7_t) __SSAT((sum >> 7U), 8U);
pDst += inc;
}

#endif /* #if !defined(ARM_MATH_CM0_FAMILY) */
Expand Down
Loading