Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel API #24

Open
Tracked by #31
Nugine opened this issue Nov 27, 2022 · 6 comments
Open
Tracked by #31

Parallel API #24

Nugine opened this issue Nov 27, 2022 · 6 comments

Comments

@Nugine
Copy link
Owner

Nugine commented Nov 27, 2022

Improve throughput by using multi-threading (rayon).

In most cases, the simd functions are fast enough that we may not benefit from multi-threading. And some functions are not divisible.

Crates with parallel api:

@Nugine
Copy link
Owner Author

Nugine commented Nov 27, 2022

Parallel threshold

$$ \frac{n}{pv}+c<\frac{n}{v} $$

$$ n > \frac{p}{p-1}cv $$

$n$: input size (B)
$p$: num threads
$v$: throughput (B/ns)
$c$: threading overhead (ns)

Parallel time rate (smaller is better)

$$ \frac{n/(pv)+c}{n/v} < 1 $$

$$ \frac{1}{p}+\frac{cv}{n} < 1 $$

When handling large input, the throughput may be limited by cache miss and page faults. In other words, the throughput hits memory bound. So the theoretical parallel time rate is smaller than the practial one.

@Nugine
Copy link
Owner Author

Nugine commented Nov 29, 2022

just bench static-experimental --bench base64 --plotting-backend disabled -- 'base64-encode/base64-simd'

base64-encode (GiB/s)

16 32 64 256 1024 4096 65536 262144 524288 1048576
base64-simd/auto 1.827 1.977 3.503 8.008 11.826 12.980 13.494 12.852 12.751 12.723
base64-simd/parallel 1.174 0.004 0.008 0.029 0.114 0.433 5.623 16.196 24.001 31.267

@Nugine Nugine mentioned this issue Feb 5, 2023
15 tasks
@pickfire
Copy link

Would it be useful to have option to parse multiple items, which the cache and instruction level parallelism can probably help here.

Use case will be something like being able to use it in polars to read a column of uuid.

@Nugine
Copy link
Owner Author

Nugine commented Jun 29, 2024

Would it be useful to have option to parse multiple items, which the cache and instruction level parallelism can probably help here.

Use case will be something like being able to use it in polars to read a column of uuid.

Could you explain this use case in more detail? How can we accelerate it by multithreading?

@pickfire
Copy link

pickfire commented Jul 25, 2024

Not really multi-threading but aligning on cache lines and being able to process multiple items at once maybe can make it faster? When we need to process batches of it, maybe there are better way to process items in bulk compared to processing each item one by one?

@Nugine
Copy link
Owner Author

Nugine commented Jul 25, 2024

Not really multi-threading but aligning on cache lines and being able to process multiple items at once maybe can make it faster? When we need to process batches of it, maybe there are better way to process items in bulk compared to processing each item one by one?

Sounds interesting. We can discuss it in another issue #45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants