-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Query Returns Resultset Non-Deterministically To Order By #928
Comments
Hi, @antoniopetrole, as we have cherry-picked a lot of fixes from GPDB, have you tried on main branch? |
And could you provide a plan of explain(verbose) of the unexpected case? I want to ensure that my settings match those of your environment including the settings. |
Hi, i got error with the DDL file:
Did you have other settings using pg_dump or sth else? select version(); |
I have debugged the issue for a while, and after resolving the copied data problem, I found that I could not reproduce the instability you mentioned. PG planner: QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (actual time=132.835..199.761 rows=32228 loops=1)
Output: a.col1, a.col2, a.col3, a.col4, a.col8, a.unid, CASE WHEN (a.unid = (lag(a.unid) OVER (?))) THEN 0 ELSE 1 END, row_number() OVER (?)
Order By: a.col1, a.col2, a.col3, a.col8
-> Gather Motion 3:1 (slice1; segments: 3) (actual time=132.766..156.002 rows=32228 loops=1)
Output: a.col1, a.col2, a.col3, a.col8, a.col4, a.unid, (lag(a.unid) OVER (?))
Merge Key: a.col1, a.col2, a.col3, a.col8
-> WindowAgg (actual time=128.535..155.693 rows=10940 loops=1)
Output: a.col1, a.col2, a.col3, a.col8, a.col4, a.unid, lag(a.unid) OVER (?)
Partition By: a.col1, a.col2, a.col3
Order By: a.col8
-> Sort (actual time=128.483..130.768 rows=10940 loops=1)
Output: a.col1, a.col2, a.col3, a.col8, a.col4, a.unid
Sort Key: a.col1, a.col2, a.col3, a.col8
Sort Method: quicksort Memory: 4703kB
Max Memory: 1584kB Avg Memory: 1567kB (3 segments)
work_mem: 2795kB Segments: 3 Max: 944kB (segment 1) Workfile: (0 spilling)
-> Redistribute Motion 3:3 (slice2; segments: 3) (actual time=87.689..109.282 rows=10940 loops=1)
Output: a.col1, a.col2, a.col3, a.col8, a.col4, a.unid
Hash Key: a.col1, a.col2, a.col3
-> Subquery Scan on a (actual time=86.416..96.809 rows=11012 loops=1)
Output: a.col1, a.col2, a.col3, a.col8, a.col4, a.unid
-> HashAggregate (actual time=86.412..93.920 rows=11012 loops=1)
Output: fc.col1, fc.col2, fc.col3, fc.col4, fc.col8, ((((((((((((COALESCE(fc.col4, 0))::text || '|'::text) || (COALESCE(fc.col5, 0))::text) || '|'::text) || (COALESCE(fc.col10, 0))::text) || '|'::text) || (COALESCE(fc.col11, 0))::text) || '|'::text) || (COALESCE(fc.col12, 0))::text) || '|'::text) || (COALESCE(fc.col13, 0))::text))
Group Key: fc.col1, fc.col2, fc.col3, fc.col4, fc.col8, ((((((((((((COALESCE(fc.col4, 0))::text || '|'::text) || (COALESCE(fc.col5, 0))::text) || '|'::text) || (COALESCE(fc.col10, 0))::text) || '|'::text) || (COALESCE(fc.col11, 0))::text) || '|'::text) || (COALESCE(fc.col12, 0))::text) || '|'::text) || (COALESCE(fc.col13, 0))::text))
work_mem: 4345kB Segments: 3 Max: 1449kB (segment 0) Workfile: (0 spilling)
-> Redistribute Motion 3:3 (slice3; segments: 3) (actual time=14.417..70.630 rows=11012 loops=1)
Output: fc.col1, fc.col2, fc.col3, fc.col4, fc.col8, ((((((((((((COALESCE(fc.col4, 0))::text || '|'::text) || (COALESCE(fc.col5, 0))::text) || '|'::text) || (COALESCE(fc.col10, 0))::text) || '|'::text) || (COALESCE(fc.col11, 0))::text) || '|'::text) || (COALESCE(fc.col12, 0))::text) || '|'::text) || (COALESCE(fc.col13, 0))::text))
Hash Key: fc.col1, fc.col2, fc.col3, fc.col4, fc.col8, ((((((((((((COALESCE(fc.col4, 0))::text || '|'::text) || (COALESCE(fc.col5, 0))::text) || '|'::text) || (COALESCE(fc.col10, 0))::text) || '|'::text) || (COALESCE(fc.col11, 0))::text) || '|'::text) || (COALESCE(fc.col12, 0))::text) || '|'::text) || (COALESCE(fc.col13, 0))::text))
-> Append (actual time=14.664..64.440 rows=10826 loops=1)
-> Hash Join (actual time=3.843..3.862 rows=0 loops=1)
Output: fc.col1, fc.col2, fc.col3, fc.col4, fc.col8, (((((((((((COALESCE(fc.col4, 0))::text || '|'::text) || (COALESCE(fc.col5, 0))::text) || '|'::text) || (COALESCE(fc.col10, 0))::text) || '|'::text) || (COALESCE(fc.col11, 0))::text) || '|'::text) || (COALESCE(fc.col12, 0))::text) || '|'::text) || (COALESCE(fc.col13, 0))::text)
Inner Unique: true
Hash Cond: ((vc.col1 = fc.col1) AND (vc.col2 = fc.col2) AND (vc.col3 = fc.col3))
work_mem: 1537kB Segments: 3 Max: 513kB (segment 0) Workfile: (0 spilling)
-> Seq Scan on antonio.t3 vc (actual time=0.045..0.593 rows=2811 loops=1)
Output: vc.col1, vc.col2, vc.col3
-> Hash (actual time=0.058..0.064 rows=6 loops=1)
Output: fc.col1, fc.col2, fc.col3, fc.col4, fc.col8, fc.col5, fc.col10, fc.col11, fc.col12, fc.col13
Buckets: 65536 Batches: 1 Memory Usage: 513kB
-> Redistribute Motion 3:3 (slice4; segments: 3) (actual time=0.020..0.029 rows=6 loops=1)
Output: fc.col1, fc.col2, fc.col3, fc.col4, fc.col8, fc.col5, fc.col10, fc.col11, fc.col12, fc.col13
Hash Key: fc.col1
-> Seq Scan on antonio.t1 fc (actual time=0.125..0.129 rows=3 loops=1)
Output: fc.col1, fc.col2, fc.col3, fc.col4, fc.col8, fc.col5, fc.col10, fc.col11, fc.col12, fc.col13
Filter: ((fc.col8 >= '2024-11-24 10:00:00'::timestamp without time zone) AND (fc.col8 <= '2024-11-24 18:00:00'::timestamp without time zone))
-> Hash Join (actual time=11.059..58.976 rows=10826 loops=1)
Output: fc_1.col1, fc_1.col2, fc_1.col3, fc_1.col4, fc_1.col5, (((((((((((COALESCE(fc_1.col4, 0))::text || '|'::text) || (COALESCE(fc_1.col6, 0))::text) || '|'::text) || (COALESCE(fc_1.col7, 0))::text) || '|'::text) || (COALESCE(fc_1.col8, 0))::text) || '|'::text) || (COALESCE(fc_1.col9, 0))::text) || '|'::text) || (COALESCE(fc_1.col10, 0))::text)
Hash Cond: ((fc_1.col1 = vc_1.col1) AND (fc_1.col2 = vc_1.col2) AND (fc_1.col3 = vc_1.col3))
work_mem: 4213kB Segments: 3 Max: 1405kB (segment 0) Workfile: (0 spilling)
Extra Text: (seg1) Hash chain length 1.0 avg, 3 max, using 8549 of 131072 buckets.
-> Seq Scan on antonio.t2 fc_1 (actual time=0.148..8.369 rows=10826 loops=1)
Output: fc_1.col1, fc_1.col2, fc_1.col3, fc_1.col4, fc_1.col5, fc_1.col6, fc_1.col7, fc_1.col8, fc_1.col9, fc_1.col10
Filter: ((fc_1.col5 >= '2024-11-24 10:00:00'::timestamp without time zone) AND (fc_1.col5 <= '2024-11-24 18:00:00'::timestamp without time zone))
-> Hash (actual time=8.376..8.377 rows=8850 loops=1)
Output: vc_1.col1, vc_1.col2, vc_1.col3
Buckets: 131072 Batches: 1 Memory Usage: 1405kB
-> Broadcast Motion 3:3 (slice5; segments: 3) (actual time=0.080..3.566 rows=8850 loops=1)
Output: vc_1.col1, vc_1.col2, vc_1.col3
-> Seq Scan on antonio.t3 vc_1 (actual time=0.154..1.838 rows=3354 loops=1)
Output: vc_1.col1, vc_1.col2, vc_1.col3
Settings: optimizer = 'off'
Planning Time: 12.011 ms
(slice0) Executor memory: 273K bytes.
(slice1) Executor memory: 1446K bytes avg x 3x(0) workers, 1458K bytes max (seg1). Work_mem: 944K bytes max.
(slice2) Executor memory: 1109K bytes avg x 3x(0) workers, 1126K bytes max (seg1). Work_mem: 1449K bytes max.
(slice3) Executor memory: 1563K bytes avg x 3x(0) workers, 1566K bytes max (seg0). Work_mem: 1405K bytes max.
(slice4) Executor memory: 114K bytes avg x 3x(0) workers, 114K bytes max (seg0).
(slice5) Executor memory: 113K bytes avg x 3x(0) workers, 113K bytes max (seg0).
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 240.694 ms
(71 rows) ORCA: QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (actual time=85.412..155.284 rows=32228 loops=1)
Output: t1.col1, t1.col2, t1.col3, t1.col4, t1.col8, ((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text)), CASE WHEN (((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text)) = (lag(((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text))) OVER (?))) THEN 0 ELSE 1 END, row_number() OVER (?)
Order By: t1.col1, t1.col2, t1.col3, t1.col8
-> Gather Motion 3:1 (slice1; segments: 3) (actual time=85.337..111.593 rows=32228 loops=1)
Output: t1.col1, t1.col2, t1.col3, t1.col4, t1.col8, ((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text)), (lag(((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text))) OVER (?))
Merge Key: t1.col1, t1.col2, t1.col3, t1.col8
-> WindowAgg (actual time=79.945..126.618 rows=12097 loops=1)
Output: t1.col1, t1.col2, t1.col3, t1.col4, t1.col8, ((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text)), lag(((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text))) OVER (?)
Partition By: t1.col1, t1.col2, t1.col3
Order By: t1.col8
-> GroupAggregate (actual time=79.908..98.587 rows=12097 loops=1)
Output: t1.col1, t1.col2, t1.col3, t1.col4, t1.col8, ((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text))
Group Key: t1.col1, t1.col2, t1.col3, t1.col4, t1.col8, ((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text))
-> Sort (actual time=79.876..82.555 rows=12097 loops=1)
Output: t1.col1, t1.col2, t1.col3, t1.col4, t1.col8, ((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text))
Sort Key: t1.col1, t1.col2, t1.col3, t1.col8, t1.col4, ((((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text))
Sort Method: quicksort Memory: 3923kB
Max Memory: 1424kB Avg Memory: 1307kB (3 segments)
work_mem: 2015kB Segments: 3 Max: 757kB (segment 1) Workfile: (0 spilling)
-> Append (actual time=4.668..56.421 rows=12097 loops=1)
-> Hash Join (actual time=3.029..3.038 rows=0 loops=1)
Output: t1.col1, t1.col2, t1.col3, t1.col4, t1.col8, (((((((((((COALESCE(t1.col4, 0))::text || '|'::text) || (COALESCE(t1.col5, 0))::text) || '|'::text) || (COALESCE(t1.col10, 0))::text) || '|'::text) || (COALESCE(t1.col11, 0))::text) || '|'::text) || (COALESCE(t1.col12, 0))::text) || '|'::text) || (COALESCE(t1.col13, 0))::text)
Hash Cond: ((t3.col1 = t1.col1) AND (t3.col2 = t1.col2) AND (t3.col3 = t1.col3))
work_mem: 1537kB Segments: 3 Max: 513kB (segment 0) Workfile: (0 spilling)
-> Seq Scan on antonio.t3 (actual time=0.056..0.579 rows=2811 loops=1)
Output: t3.col1, t3.col2, t3.col3
-> Hash (actual time=0.053..0.057 rows=6 loops=1)
Output: t1.col1, t1.col2, t1.col3, t1.col4, t1.col5, t1.col8, t1.col10, t1.col11, t1.col12, t1.col13
Buckets: 65536 Batches: 1 Memory Usage: 513kB
-> Redistribute Motion 3:3 (slice2; segments: 3) (actual time=0.020..0.027 rows=6 loops=1)
Output: t1.col1, t1.col2, t1.col3, t1.col4, t1.col5, t1.col8, t1.col10, t1.col11, t1.col12, t1.col13
Hash Key: t1.col1
-> Seq Scan on antonio.t1 (actual time=0.039..0.043 rows=3 loops=1)
Output: t1.col1, t1.col2, t1.col3, t1.col4, t1.col5, t1.col8, t1.col10, t1.col11, t1.col12, t1.col13
Filter: ((t1.col8 >= '2024-11-24 10:00:00'::timestamp without time zone) AND (t1.col8 <= '2024-11-24 18:00:00'::timestamp without time zone))
-> Hash Join (actual time=3.662..53.586 rows=12097 loops=1)
Output: t2.col1, t2.col2, t2.col3, t2.col4, t2.col5, (((((((((((COALESCE(t2.col4, 0))::text || '|'::text) || (COALESCE(t2.col6, 0))::text) || '|'::text) || (COALESCE(t2.col7, 0))::text) || '|'::text) || (COALESCE(t2.col8, 0))::text) || '|'::text) || (COALESCE(t2.col9, 0))::text) || '|'::text) || (COALESCE(t2.col10, 0))::text)
Hash Cond: ((t2.col1 = t3_1.col1) AND (t2.col2 = t3_1.col2) AND (t2.col3 = t3_1.col3))
work_mem: 3453kB Segments: 3 Max: 1169kB (segment 1) Workfile: (0 spilling)
Extra Text: (seg1) Hash chain length 1.0 avg, 2 max, using 3318 of 131072 buckets.
-> Redistribute Motion 3:3 (slice3; segments: 3) (actual time=0.077..5.809 rows=12097 loops=1)
Output: t2.col1, t2.col2, t2.col3, t2.col4, t2.col5, t2.col6, t2.col7, t2.col8, t2.col9, t2.col10
Hash Key: t2.col1
-> Seq Scan on antonio.t2 (actual time=0.066..5.137 rows=10826 loops=1)
Output: t2.col1, t2.col2, t2.col3, t2.col4, t2.col5, t2.col6, t2.col7, t2.col8, t2.col9, t2.col10
Filter: ((t2.col5 >= '2024-11-24 10:00:00'::timestamp without time zone) AND (t2.col5 <= '2024-11-24 18:00:00'::timestamp without time zone))
-> Hash (actual time=2.834..2.835 rows=3354 loops=1)
Output: t3_1.col1, t3_1.col2, t3_1.col3
Buckets: 131072 Batches: 1 Memory Usage: 1169kB
-> Seq Scan on antonio.t3 t3_1 (actual time=0.066..0.683 rows=3354 loops=1)
Output: t3_1.col1, t3_1.col2, t3_1.col3
Settings: optimizer = 'on'
Planning Time: 338.510 ms
(slice0) Executor memory: 256K bytes.
(slice1) Executor memory: 1310K bytes avg x 3x(0) workers, 1324K bytes max (seg0). Work_mem: 1169K bytes max.
(slice2) Executor memory: 114K bytes avg x 3x(0) workers, 114K bytes max (seg0).
(slice3) Executor memory: 114K bytes avg x 3x(0) workers, 114K bytes max (seg0).
Memory used: 128000kB
Optimizer: GPORCA
Execution Time: 180.439 ms
(60 rows)
I executed a script that runs the query 100 times for each planner and compared the results. All runs returned identical results. Here’s the script I used: #!/bin/bash
# Variables
SQL_FILE="problem_query.sql"
RESULT_DIR="results"
NUM_RUNS=1
# Create the results directory if it doesn't exist
mkdir -p "$RESULT_DIR"
# Clean the results directory
rm -f "$RESULT_DIR/*"
# Execute the SQL file multiple times
for ((i=1; i<=NUM_RUNS; i++))
do
RESULT_FILE="$RESULT_DIR/result_${i}.txt"
psql -f "$SQL_FILE" -o "$RESULT_FILE"
done
# Check if all results are the same
first_result=$(cat "$RESULT_DIR/result_1.txt")
for ((i=2; i<=NUM_RUNS; i++)); do
current_result=$(cat "$RESULT_DIR/result_${i}.txt")
if [ "$first_result" != "$current_result" ]; then
echo "Results differ in file: result_${i}.txt"
exit 1
fi
done
# If all results are the same
echo "All results are the same." I have attached the results for your review. Please feel free to use my script with your SQL files to test on the main branch. We have implemented numerous fixes from cherry-picked commits from GPDB, so it’s possible that the issue has been resolved(if it really was). ![]() orca_analyze.sql.txt Just a reminder to use the main branch built on the officially supported OS, as other builds are not maintained by the kernel team (e.g., bootcamp or Docker). |
@avamingli hey! So I know for sure this is an issue on 1.6.0, I'm going to take the time today to test this on the latest main branch. I suspect based on what you're saying that it maybe had been fixed. However I have to stress that I couldn't get this bug to appear on my docker bootcamp instances and I think it has something to do with the number of segments and/or segment hosts. However on both our prod and dev clusters I'm able to reliably produce this. So I'll spin up a cluster with the latest branch and see if I can reproduce this or not. And as far as the data issue I suspect it's because perhaps the schema Antonio wasn't created ahead of time? I vaguely recall seeing a similar error when that was the case for me. Apologies for that Edit: Not going to test this today as more important priorities have come up, but I will hopefully be testing this next week (as well as some other bugs i haven't reported yet) |
Apache Cloudberry version
Cloudberry 1.6.0 (pre apache release)
What happened
One of our users ran into a query with our data that has some really odd behavior. Whenever the query is ran, the order by clause returns data non-deterministically. For example, the value for col1 should return the value 9217 for the first 36 rows. However on Cloudberry everytime you run it, it seems to return out of order in different ways each time you run it. Sometimes it will only be the first 3 rows that have col1 = 9217, the next run it might be 1 or 10. I made a test case data set you with the query that you can test this against and have included it attached to this issue. I also tested this on Postgres and on Postgres it behaves exactly the same and as expected every time. I tried testing this on my docker multinode bootcamp cluster to no avail, however I was able to recreate it both in our production and dev environments using this test dataset. I imagine the # of segments has some influence on this in some way.
orderby_query.zip
What you think should happen instead
Results should be deterministic and follow the ORDER BY declaration like in Postgres
How to reproduce
To recreate, just run
psql -f problem_ddl_cloudberry.sql
and thenpsql -f problem_query.sql
and on each run you'll see different results. I also included the Postgres DDL version of this as well. A few notes I've taken that might be helpful to knowExamples Runs On Cloudberry (focus on how many rows represent col1 = 9217)
Operating System
Rocky Linux 8.10 (Green Obsidian)
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: