edgecases: disadvantages for parallelization? #3113

hpvd · 2024-03-22T10:55:37Z

hpvd
Mar 22, 2024

your work looks more than interesting...

would be great to understand if your parallel approach is always of great benefit
or if there are some cases like in neo4j s new parallel approach (v5.13+) where their full parallelization is much slower than their standard ones...

Example:

...the query, which matches the graph for the tags most commonly used on the most viewed questions in the Stack Overflow database:

MATCH (q:Question)
WITH q
ORDER BY q.views DESC
LIMIT 1000
MATCH (q)-[:TAGGED]->(t:Tag)
RETURN t.name AS tag, count(*) AS count
ORDER BY count DESC
LIMIT 10

https://neo4j.com/developer-blog/speed-up-queries-neo4j-parallel-runtime/

Answered by semihsalihoglu-uw

Mar 22, 2024

Hi @hpvd,

Thanks for your interest. In general the answer is, yes, parallelism should always help. We adopt morsel-driven parallelism approach to parallelize queries: https://db.in.tum.de/~leis/papers/morsels.pdf. This is a state-of-the-art approach adopted in many DBMSs. But overall this is a hard question to answer because there is likely to be some queries where the overhead of trying to parallelize can make some queries slower but I don't ever expect this to be a major slow down frankly. But if they are, these are likely performance bugs we can fix.

The other thing is that some parts of queries are not parallelized, so even if you have 64 threads, we will run in intentionally with 1 t…

View full answer

semihsalihoglu-uw · 2024-03-22T11:09:54Z

semihsalihoglu-uw
Mar 22, 2024
Maintainer

Hi @hpvd,

Thanks for your interest. In general the answer is, yes, parallelism should always help. We adopt morsel-driven parallelism approach to parallelize queries: https://db.in.tum.de/~leis/papers/morsels.pdf. This is a state-of-the-art approach adopted in many DBMSs. But overall this is a hard question to answer because there is likely to be some queries where the overhead of trying to parallelize can make some queries slower but I don't ever expect this to be a major slow down frankly. But if they are, these are likely performance bugs we can fix.

The other thing is that some parts of queries are not parallelized, so even if you have 64 threads, we will run in intentionally with 1 thread. For example if you had a query that ordered by and picked the top 10 nodes and performed further computation, we would run the latter parts of the computation single threaded to maintain the order. Something like:

MATCH (a:Person)
WITH a ORDER BY a.age LIMIT 10
MATCH (a)-[:Knows]->(b:Students)
RETURN *;

The second MATCH after WITH line will be single threaded because we assume the user wants to keep the order (at least for now).

Hope this helps.

1 reply

hpvd Mar 22, 2024
Author

many thanks for your detailed answer. The sorting topic seems also slowing down neo4js parallelization...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

edgecases: disadvantages for parallelization? #3113

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

edgecases: disadvantages for parallelization? #3113

hpvd Mar 22, 2024

Replies: 1 comment · 1 reply

semihsalihoglu-uw Mar 22, 2024 Maintainer

hpvd Mar 22, 2024 Author

hpvd
Mar 22, 2024

Replies: 1 comment 1 reply

semihsalihoglu-uw
Mar 22, 2024
Maintainer

hpvd Mar 22, 2024
Author