PS-9391 Fixes replication break error because of HASH_SCAN #5528

VarunNagaraju · 2024-12-26T10:22:50Z

https://perconadev.atlassian.net/browse/PS-9391

Problem

When a replica's slave_rows_search_algorithms is set to HASH_SCAN, the
replication may break with HA_ERR_KEY_NOT_FOUND.

Analysis

When a replica's slave_rows_search_algorithms is set to HASH_SCAN,
it prepares a unique key list for all the rows in a particular Row_event.
The same unique key list will later be used to retrieve all tuples associated
to each key in the list from storage engine. In the case of multiple updates
targeted at the same row like how it is shown in the testcase, it may happen
that this unique key list filled with entries which don't exist yet in the table.
This is a problem when there is an intermediate update which changes the value
of the index column to a lesser value than the original entry and that changed
value is used in another update as shown in the second part of the testcase.
It is an issue because the unique key list is a std::set which internally
sorts it's entries. When this sorting happens, the first entry of the list
could potentially be a value which doesn't exist in the table and the when
it is searched in next_record_scan() method, it fails returning
HA_ERR_KEY_NOT_FOUND error.

Solution

Instead of using std::set to store the distinct keys, a combination of
unordered_set and a list is used to preserve the original order of
updates and avoid duplicates at the same time which prevents the side
effects of sorting.

github-actions

⚠️ Clang-Tidy found issue(s) with the introduced code (1/1)

sql/log_event.cc

VarunNagaraju · 2024-12-27T04:15:59Z

https://ps80.cd.percona.com/view/8.0/job/percona-server-8.0-pipeline-parallel-mtr/847/

sql/log_event.h

sql/log_event.cc

VarunNagaraju · 2025-01-08T06:51:35Z

https://ps80.cd.percona.com/view/8.0/job/percona-server-8.0-pipeline-parallel-mtr/849/

percona-ysorokin

LGTM with one m_distinct_key_id = 0 comment addressed

sql/log_event.cc

https://perconadev.atlassian.net/browse/PS-9391 Problem ======= When a replica's slave_rows_search_algorithms is set to HASH_SCAN, the replication may break with HA_ERR_KEY_NOT_FOUND. Analysis ======== When a replica's slave_rows_search_algorithms is set to HASH_SCAN, it prepares a unique key list for all the rows in a particular Row_event. The same unique key list will later be used to retrieve all tuples associated to each key in the list from storage engine. In the case of multiple updates targeted at the same row like how it is shown in the testcase, it may happen that this unique key list filled with entries which don't exist yet in the table. This is a problem when there is an intermediate update which changes the value of the index column to a lesser value than the original entry and that changed value is used in another update as shown in the second part of the testcase. It is an issue because the unique key list is a std::set which internally sorts it's entries. When this sorting happens, the first entry of the list could potentially be a value which doesn't exist in the table and the when it is searched in next_record_scan() method, it fails returning HA_ERR_KEY_NOT_FOUND error. Solution ======== Instead of using std::set to store the distinct keys, a combination of unordered_set and a list is used to preserve the original order of updates and avoid duplicates at the same time which prevents the side effects of sorting.

VarunNagaraju · 2025-01-20T14:11:50Z

https://ps80.cd.percona.com/job/percona-server-8.0-param-parallel-mtr/78/

github-actions bot reviewed Dec 26, 2024

View reviewed changes

sql/log_event.cc Outdated Show resolved Hide resolved

VarunNagaraju force-pushed the PS-9391 branch from c95cb14 to 783aab3 Compare January 1, 2025 16:27

percona-ysorokin requested changes Jan 6, 2025

View reviewed changes

sql/log_event.h Show resolved Hide resolved

sql/log_event.h Outdated Show resolved Hide resolved

sql/log_event.cc Outdated Show resolved Hide resolved

sql/log_event.cc Show resolved Hide resolved

sql/log_event.cc Outdated Show resolved Hide resolved

VarunNagaraju force-pushed the PS-9391 branch from 783aab3 to 92d965a Compare January 7, 2025 07:20

VarunNagaraju requested a review from percona-ysorokin January 13, 2025 05:13

percona-ysorokin approved these changes Jan 13, 2025

View reviewed changes

sql/log_event.cc Show resolved Hide resolved

VarunNagaraju force-pushed the PS-9391 branch from 92d965a to f54b7f4 Compare January 17, 2025 13:19

VarunNagaraju marked this pull request as ready for review January 20, 2025 08:42

VarunNagaraju force-pushed the PS-9391 branch 3 times, most recently from 4ab9bcc to 6ab96de Compare January 20, 2025 09:36

VarunNagaraju force-pushed the PS-9391 branch from 6ab96de to 1f26d08 Compare January 20, 2025 09:48

VarunNagaraju merged commit 45a1bae into percona:8.0 Jan 20, 2025
27 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PS-9391 Fixes replication break error because of HASH_SCAN #5528

PS-9391 Fixes replication break error because of HASH_SCAN #5528

VarunNagaraju commented Dec 26, 2024

github-actions bot left a comment

VarunNagaraju commented Dec 27, 2024

VarunNagaraju commented Jan 8, 2025

percona-ysorokin left a comment

VarunNagaraju commented Jan 20, 2025

PS-9391 Fixes replication break error because of HASH_SCAN #5528

PS-9391 Fixes replication break error because of HASH_SCAN #5528

Conversation

VarunNagaraju commented Dec 26, 2024

Problem

Analysis

Solution

github-actions bot left a comment

Choose a reason for hiding this comment

VarunNagaraju commented Dec 27, 2024

VarunNagaraju commented Jan 8, 2025

percona-ysorokin left a comment

Choose a reason for hiding this comment

VarunNagaraju commented Jan 20, 2025