Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: optimize guc mechanism #918

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hw118118
Copy link
Contributor

@hw118118 hw118118 commented Feb 11, 2025

Add new guc type Undispatch that doesn't dipatch from master to primary. for a guc, it's one of sync, unsync,undispatch.

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@my-ship-it my-ship-it requested a review from weinan003 February 11, 2025 10:51
@yjhjstz
Copy link
Member

yjhjstz commented Feb 11, 2025

hi, what's the diff of unsync and undispatch ?

@avamingli
Copy link
Contributor

avamingli commented Feb 11, 2025

hi, what's the diff of unsync and undispatch ?

Same question, when developers add a new GUC, which array should it be placed in?
Please ensure this information is clear for developers in the README, comments, or commit messages at a minimum.
It would be helpful to include the necessary context to explain the motivation behind this decision and the approach you've chosen.

@hw118118
Copy link
Contributor Author

hw118118 commented Feb 12, 2025

hi, what's the diff of unsync and undispatch ?

Same question, when developers add a new GUC, which array should it be placed in? Please ensure this information is clear for developers in the README, comments, or commit messages at a minimum. It would be helpful to include the necessary context to explain the motivation behind this decision and the approach you've chosen.

Caues it is ambiguous for GUC manage currently and it doesn't have unified standard, especially sync GUCs, unsync GUCs and SET GUC command. So baseed on unified principle, it is necessary to optimize the mangement of GUCs.

Indeed, all GUC variables are divided into three categories now, sync GUC/unsync GUC/undispatch GUC. Any GUC is one of three types

  1. sync GUC: that need to sync master with primary when excute statements.
  2. unsync GUC: that needn't sync master with primary when excute statements, but it will be dispatched when excute set guc command.
  3. undispatch GUC: neither sync master with primary nor be dispatched when excute set guc command.

At the same time, all GUC variables are placeed three files, all sync GUC variables put into sync_guc_name.h, all unsync GUC variables put into unsync_guc_name.h, all undispatch GUC variables put into undispatch_guc_name.h in init stage.

When add a new GUC variable, it can assign one of GUC_GPDB_NEED_SYNC, GUC_GPDB_NO_SYNC, GUC_GPDB_NO_DISPATCH that corresponds to sync GUC or unsync GUC or undispatch GUC. If none of them is set, it will be treated as default unsync GUC.

@hw118118 hw118118 force-pushed the optimize_guc_mechanism branch from 4700d7b to eb42dbb Compare February 12, 2025 03:06
@avamingli
Copy link
Contributor

  • unsync GUC: that needn't sync master with primary when excute statements, but it will be dispatched when excute set guc command.
  • undispatch GUC: neither sync master with primary nor be dispatched when excute set guc command.

What's meaning of the 'primary' here? primary nodes on segments?
Isn't it a dispatch when sync master with primary?

@hw118118
Copy link
Contributor Author

  • unsync GUC: that needn't sync master with primary when excute statements, but it will be dispatched when excute set guc command.
  • undispatch GUC: neither sync master with primary nor be dispatched when excute set guc command.

What's meaning of the 'primary' here? primary nodes on segments? Isn't it a dispatch when sync master with primary?

yes, 'primary' means segment node. It is also a dispatch when sync master with primary, only difference is that it will carry some GUC infomation(sync GUCs) when excute query.

@hw118118 hw118118 force-pushed the optimize_guc_mechanism branch from eb42dbb to 0ebcc4b Compare February 12, 2025 09:25
@avamingli
Copy link
Contributor

  • unsync GUC: that needn't sync master with primary when excute statements, but it will be dispatched when excute set guc command.
  • undispatch GUC: neither sync master with primary nor be dispatched when excute set guc command.

What's meaning of the 'primary' here? primary nodes on segments? Isn't it a dispatch when sync master with primary?

yes, 'primary' means segment node. It is also a dispatch when sync master with primary, only difference is that it will carry some GUC infomation(sync GUCs) when excute query.

Still confused, could you provide examples to show the differences, ex in a 3-segments cluster with mirror and standby each.

And what's output of your new undispatch guc for goconfig?

gpconfig -s gp_enable_global_deadlock_detector
Values on all segments are consistent
GUC              : gp_enable_global_deadlock_detector
Coordinator value: off
Segment     value: off

Please make clear to each case, Coordinator value, Segment value, and ... perhaps your undisaptched? value?

Again, I still don't know which array to put in when a new GUC added.

Add new guc type Undispatch that doesn't dipatch from master to primary.
for a guc, it's one of sync, unsync,undispatch.
@hw118118 hw118118 force-pushed the optimize_guc_mechanism branch from 0ebcc4b to f6197b3 Compare February 12, 2025 10:37
@hw118118
Copy link
Contributor Author

  • unsync GUC: that needn't sync master with primary when excute statements, but it will be dispatched when excute set guc command.
  • undispatch GUC: neither sync master with primary nor be dispatched when excute set guc command.

What's meaning of the 'primary' here? primary nodes on segments? Isn't it a dispatch when sync master with primary?

yes, 'primary' means segment node. It is also a dispatch when sync master with primary, only difference is that it will carry some GUC infomation(sync GUCs) when excute query.

Still confused, could you provide examples to show the differences, ex in a 3-segments cluster with mirror and standby each.

And what's output of your new undispatch guc for goconfig?

gpconfig -s gp_enable_global_deadlock_detector
Values on all segments are consistent
GUC              : gp_enable_global_deadlock_detector
Coordinator value: off
Segment     value: off

Please make clear to each case, Coordinator value, Segment value, and ... perhaps your undisaptched? value?

Again, I still don't know which array to put in when a new GUC added.

Use undispatch GUC enable_hashagg as a example.
default :
GUC : enable_hashagg
Coordinator value: on
Segment value: on

When set the GUC enable_hashagg to off by psql Coordinator, the result is:
GUC : enable_hashagg
Coordinator value: off
Segment value: on

If the GUC enable_hashagg is unsync GUC, it will be same as default.

Finally, for a new GUC added, you should decide to which type that new GUC belongs to firstly. Then
If use DefineCustomXXVariable to add new GUC, you can assign a flag (GUC_GPDB_NEED_SYNC, GUC_GPDB_NO_SYNC, GUC_GPDB_NO_DISPATCH) ,default is GUC_GPDB_NO_SYNC when none is set.
If directly add a new GUC in the kernel, you should put the new GUC into one file of {sync_guc_name.h, unsync_guc_name.h, undispatch_guc_name.h) or set flag as above.

@avamingli
Copy link
Contributor

Use undispatch GUC enable_hashagg as a example. default : GUC : enable_hashagg Coordinator value: on Segment value: on

When set the GUC enable_hashagg to off by psql Coordinator, the result is: GUC : enable_hashagg Coordinator value: off Segment value: on

If the GUC enable_hashagg is unsync GUC, it will be same as default.

That's a confusion, the undispatch GUC has the same output value on master and segments with unsync.
We usually use gpconfig -s to know about a GUC sync or unsync. Now a new type comes in, how to distinguish them?

@hw118118
Copy link
Contributor Author

hw118118 commented Feb 13, 2025

Use undispatch GUC enable_hashagg as a example. default : GUC : enable_hashagg Coordinator value: on Segment value: on
When set the GUC enable_hashagg to off by psql Coordinator, the result is: GUC : enable_hashagg Coordinator value: off Segment value: on
If the GUC enable_hashagg is unsync GUC, it will be same as default.

That's a confusion, the undispatch GUC has the same output value on master and segments with unsync. We usually use gpconfig -s to know about a GUC sync or unsync. Now a new type comes in, how to distinguish them?

No, the undispatch GUC has the diffrent output value on master and segments with unsync as above example. Also, gpconfig -s is a kind of static check method of current node, actually it doen't distinguish sync or unsync, you can choose a sync GUC and a unsync GUC. it has same output result.

Only when we set GUC on Coordinator, Coordinator and Segment has diffrent state for diffrent GUC types, sync/unsync/undispatch

@hw118118 hw118118 marked this pull request as draft February 13, 2025 02:18
@avamingli
Copy link
Contributor

Also, gpconfig -s is a kind of static check method of current node, actually it doen't distinguish sync or unsync, you can choose a sync GUC and a unsync GUC. it has same output result.

We do it every day during develop, you break our common sense, try turn off optimizer to see the output.
If unsync and sync have same output, what's the difference between them on master and segments.

Still not clear what does undispatch mean.

@avamingli
Copy link
Contributor

Also, gpconfig -s is a kind of static check method of current node, actually it doen't distinguish sync or unsync, you can choose a sync GUC and a unsync GUC. it has same output result.

We do it every day during develop, you break our common sense, try turn off optimizer to see the output. If unsync and sync have same output, what's the difference between them on master and segments.

Still not clear what does undispatch mean.

IMO, I think most developers do, unsync only take effects on master when exec a SET, while sync GUCs will dispatch to segments as well.And we use gpcondif -s to check the effect.
I don't talk to utility mode which is a emergency measures, in most of time we SET GUCs on master.
Correct me if i'm wrong.

And if you don't think it's right, please give a strong reason considering your new type undispatch.

@hw118118
Copy link
Contributor Author

effect

I make a summary. In a normal cluster, there are many segments , at the same time, there is a master node. When we set a GUC on the master, what changes in all QE processes. And SYNC GUC/UNSYNC GUC/UNDISPATCH GUC defines the action.

For UNSYNC GUC, except QD process, it will be dispatched to all idle QE processes of all segments in current session.
For SYNC GUC, its dispatch mechism is same with UNSYNC GUC, but it will add two sync strategies. One is it will sync GUC with all OE processes in the next query when current transaction has falied. One is new QE process can be consistency with QD.
For UNDISPATCH GUC, it doesn't dispatch to all segments, it only take effect on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants