You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a similar vein to #2086, I also had a question about dpnp's performance with respect to for loops, going in and out of the array context. I tested dpnp on a simple covariance matrix computation, that loops over the columns and computes the covariance matrix. However, dpnp's single-threaded performance is significantly slower than NumPy's performance; one guess is that this could be due to repeatedly going in and out of sycl's queue, but I was wondering if you had more insights on this, or if there was a way to fix the performance of this.
Here is the covariance scalability plot (number of threads vs. running time) comparing dpnp and NumPy:
As before, the test environment is an Intel Xeon Platinum 8380 processor, with 80 threads. The tests were run taking the average (median) over 10 runs and discarding the first run (so the cache is warm).
For the inputs, I used M = 3200, N = 4000, where float_N = np.float64(N) and data = np.fromfunction(lambda i, j: (i * j) / M, (N, M), dtype=np.float64). For dpnp, I ran data= dpnp.asarray(data, device="cpu") (and float_N = dpnp.asarray(float_N, device="cpu"), although this didn't seem to make a difference either way) before starting the tests (not included in the timing results). The code for the covariance computation is as follows (for dpnp; the NumPy code is the exact same, except using numpy):
mean = dpnp.mean(data, axis=0)
data -= mean
cov = dpnp.zeros((M, M), dtype=data.dtype, device=device)
for i in range(M):
cov[i:M, i] = cov[i, i:M] = data[:, i] @ data[:, i:M] / (float_N - 1.0)
cov.sycl_queue.wait()
Any help is much appreciated -- thanks!
Best,
Jessica
The text was updated successfully, but these errors were encountered:
Hi,
In a similar vein to #2086, I also had a question about dpnp's performance with respect to for loops, going in and out of the array context. I tested dpnp on a simple covariance matrix computation, that loops over the columns and computes the covariance matrix. However, dpnp's single-threaded performance is significantly slower than NumPy's performance; one guess is that this could be due to repeatedly going in and out of sycl's queue, but I was wondering if you had more insights on this, or if there was a way to fix the performance of this.
Here is the covariance scalability plot (number of threads vs. running time) comparing dpnp and NumPy:
As before, the test environment is an Intel Xeon Platinum 8380 processor, with 80 threads. The tests were run taking the average (median) over 10 runs and discarding the first run (so the cache is warm).
For the inputs, I used
M = 3200
,N = 4000
, wherefloat_N = np.float64(N)
anddata = np.fromfunction(lambda i, j: (i * j) / M, (N, M), dtype=np.float64)
. For dpnp, I randata= dpnp.asarray(data, device="cpu")
(andfloat_N = dpnp.asarray(float_N, device="cpu")
, although this didn't seem to make a difference either way) before starting the tests (not included in the timing results). The code for the covariance computation is as follows (for dpnp; the NumPy code is the exact same, except using numpy):Any help is much appreciated -- thanks!
Best,
Jessica
The text was updated successfully, but these errors were encountered: