-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sai slow logging WIP DO NOT REVIEW OR MERGE #1285
base: main
Are you sure you want to change the base?
Conversation
Configuration/tooling needs to be finished; tests should be updated as per the comments. Any comment starting with KATE: is a placeholder for update to be done
// a structure that keeps the N slowest queries - don't use getter/setter so that it's not exposed over JMX | ||
private final SaiSlowestQueriesQueue slowestQueries; | ||
// KATE: make this false by default and decide on yaml property, -D flag, or JMX to enable; hook setEnabled() to change | ||
private final AtomicBoolean enabled = new AtomicBoolean(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to allow to disable this at runtime.
Simpler is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it is a -D flag or yaml property. However, I would like to say that it is a matter of adding two more methods to make this particular flag runtime configurable. It does not also require extensive testing. My preference is always to have some extremely quick way to start/stop using new things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick way to start/stop using new things
I understand your point, but if you want to go this way then we have to add an HTTP endpoint in CNDB in order to toggle this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we talked in Slack, yes, I am aware we will need HTTP point to be added when we are done here. I haven't proposed it yet as first I wanted to be sure what we really want. The PR is for early feedback
|
||
public synchronized void setEnabled(boolean newValue) | ||
{ | ||
boolean oldValue = this.enabled.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to use synchronized here, AtomicBoolean should already have the method to execute conditionally a function and set a new value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hint: use getAndSet
} | ||
} | ||
|
||
private void resizeMinMaxHeap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to make this resizable at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm not so sure here. I can imagine you may wish to generally keep this feature off (for performance / resource usage reasons), or record just a few slow queries, but then if running into trouble you may want to enable it / keep more slow queries when debugging a particular performance problem? Considering this is not adding a lot of code complexity, it is a nice usability feature to be able to tweak settings without restarting the nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not keep code that we won't use.
as said in other comments:
- we can restart the JVM to change the value (and I am not sure that we will often change it)
- we should add new http endpoints to change the configuration and add docs about how to use this, for now we don't have a framework to handle this easily
if we want to keep the code to do the dynamic reconfig I am fine with it, but we cannot left the code there without the full story end-to-end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to keep the code to do the dynamic reconfig I am fine with it, but we cannot left the code there without the full story end-to-end.
I agree and this was never suggested. I did not have things exposed yet because this was not finished and I wanted to be sure that is what we want first before exposing config in nodetool, JMX, HTTP endpoint
* Resize the priority queue. It will now hold up to newMaxSize slow queries | ||
* @param newMaxSize the new max size of the priority queue | ||
*/ | ||
void resize(int newMaxSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it simple. No need for this
@@ -127,6 +127,7 @@ public UnfilteredPartitionIterator search(ReadExecutionController executionContr | |||
// Can't check for `command.isTopK()` because the planner could optimize sorting out | |||
if (plan.ordering() != null) | |||
{ | |||
queryContext.setOptimizedPlan(plan.toStringRecursive()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this String may contain user data then we have to redact it otherwise we must ask to the customer the permission to use this tool that means that we will use it only in tedt and dev end and it would be a pity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my understanding was at first we want the slowest queries in test/dev to easy identify and debug them while stress testing.
I will look into the redacting, thanks.
@@ -345,6 +345,7 @@ public boolean isSatisfiedBy(ByteBuffer columnValue) | |||
/** | |||
* Returns the lower bound of the expression as a ByteComparable with an encoding based on the version and the | |||
* validator. | |||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those were not intentionally added... something from the IDE formatting, I will revert
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert those lines later
/** | ||
* Returns the redacted string representation of this node only (used in slow query logging) | ||
*/ | ||
public final String toStringRedacted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just add a flag for redacted or not in toStringRecursive()
- toStringRecursive(boolean redacted)
I am tempted to do the same with toString
, just add a flag, but I think we should keep the toString override as we may not have coverage to find all the places where it may pop up and print ugly things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toStringRecursive
is for human readable stuff. I don't think we need a "redacted" flag here - I mean, I can't see a good reason for non-redacted strings. We have toString
and debuggers for that. The same rules apply for logging - why would we log full vectors of size 1500. And we definitely should not log passwords or other stuff from sensitive columns (which is an interesting problem on its own: how do we reliably tell if a value contains sensitive data that must not be logged?)
Maybe it should be named toStringPretty
?
And if you insist on really pushing the formatting options down the call chain, maybe we should have a nicer argument than a boolean - e.g. what about toStringPretty(FormattingOptions options)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you are saying here.
Do you say even the current toStringRecursive
should be redacted? It seems we print the trees only in trace logs and we never enable trace in prod in CNDB. Though, I am not sure it is not a problem for HCD. I think it is a good idea to ask Pushkala. Here, we log values even from prepared statements.
toStringPretty(FormattingOptions options)
What would be FormattingOptions
? I think I am not sure what you have in mind.
@@ -48,7 +48,7 @@ | |||
<pattern>%-5level [%thread] %date{ISO8601} %F:%L - %msg%n</pattern> | |||
</encoder> | |||
<filter class="ch.qos.logback.classic.filter.ThresholdFilter"> | |||
<level>DEBUG</level> | |||
<level>TRACE</level> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for testing, revert later
public class QueryPlanRedactionTest extends SAITester | ||
{ | ||
@Before | ||
public void setup() | ||
{ | ||
requireNetwork(); | ||
} | ||
|
||
private static final String KEYSPACE = "prepared_stmt_cleanup"; | ||
private static final String createKsStatement = "CREATE KEYSPACE " + KEYSPACE + | ||
" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };"; | ||
|
||
@Test | ||
public void testQuery() | ||
{ | ||
String selectCQL2 = "SELECT * FROM t WHERE val : 'missing' OR (val : ? AND val : ?)"; | ||
Session session = getSession(ProtocolVersion.V5); | ||
session.execute(createKsStatement); | ||
session.execute("Use prepared_stmt_cleanup"); | ||
session.execute("CREATE TABLE t (id text PRIMARY KEY, val text)"); | ||
session.execute("CREATE CUSTOM INDEX ON t(val) USING 'StorageAttachedIndex' WITH OPTIONS = {" + | ||
"'index_analyzer': '{\n" + | ||
"\t\"tokenizer\":{\"name\":\"ngram\", \"args\":{\"minGramSize\":\"2\", \"maxGramSize\":\"3\"}}," + | ||
"\t\"filters\":[{\"name\":\"lowercase\"}]\n" + | ||
"}'," + | ||
"'query_analyzer': '{\n" + | ||
"\t\"tokenizer\":{\"name\":\"whitespace\"},\n" + | ||
"\t\"filters\":[{\"name\":\"porterstem\"}]\n" + | ||
"}'};"); | ||
|
||
PreparedStatement preparedSelect2 = session.prepare(selectCQL2); | ||
session.execute(preparedSelect2.bind("'quick'", "'dog'")).all().size(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a proper test, it was just to help me print while working on the patch. Proper testing will be added when we agree on the actual queue (latest vs slowest, singleton, etc)
This prints the following example for redacting in the logs:
TRACE [ReadStage-1] 2024-09-18 18:50:32,200 Plan.java:361 - Optimizing plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
│ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
└─ Intersection (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
│ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
└─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
TRACE [ReadStage-1] 2024-09-18 18:50:32,201 Plan.java:372 - Candidate query plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
└─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: 0.000000000) (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
└─ Fetch (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
└─ Everything (keys: 0.0, cost/key: 0.0, cost: Infinity..Infinity)
TRACE [MemtableFlushWriter:2] 2024-09-18 18:50:32,201 LifecycleTransaction.java:239 - Committing transaction over [] staged: [obsolete: [], update: []], logged: [obsolete: [], update: []]
TRACE [ReadStage-1] 2024-09-18 18:50:32,202 Plan.java:372 - Candidate query plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
│ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
└─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
TRACE [ReadStage-1] 2024-09-18 18:50:32,202 Plan.java:372 - Candidate query plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
└─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: 0.000000000) (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
└─ Fetch (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
└─ Everything (keys: 0.0, cost/key: 0.0, cost: Infinity..Infinity)
TRACE [ReadStage-1] 2024-09-18 18:50:32,202 Plan.java:379 - Optimized plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
│ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
└─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
TRACE [MemtableFlushWriter:2] 2024-09-18 18:50:32,202 LogTransaction.java:455 - Completing txn [nb_txn_flush_68b82e70-7610-11ef-a854-a9f24348a0d4.log in /Users/ekaterina.dimitrova/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3] with last record COMMIT:[,0,0][2613697770]
TRACE [MemtableFlushWriter:2] 2024-09-18 18:50:32,202 LogTransaction.java:320 - Removing files for transaction [nb_txn_flush_68b82e70-7610-11ef-a854-a9f24348a0d4.log in /Users/ekaterina.dimitrova/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3]
TRACE [ReadStage-1] 2024-09-18 18:50:32,203 QueryController.java:369 - Query execution plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
└─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
│ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
└─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
@@ -576,6 +578,34 @@ public void prettyPrint() | |||
" predicate: RANGE(pred4)\n", prettyStr); | |||
} | |||
|
|||
@Test | |||
public void prettyPrintRedacted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mocking got on the way so I will probably remove this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good start!
At a high level, I got one major concern: we're treating every query execution individually and not aggregating them in any way. So I'm afraid, it is quite likely the whole slow log will fill up with multiple executions of a single slow query (maybe with different parameters; but a query based on the same CQL string template) and all other slightly faster queries will be pushed out. That would severely limit usefulness of this feature.
Maybe consider grouping them by prepared statement ID and keep at most one (or a few?) most costly execution plans per prepared statement id in the log? you could then also keep the max / average runtime per prepared statement.
Other than that, I left some minor comments / suggestions in details of my review.
if (!redacted) | ||
sb.append(expressions.get(i)); | ||
else | ||
sb.append(expressions.get(i).toStringRedacted()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use sb.append(expressions.get(i).toStringInternal(redacted)
.
if (!redacted) | ||
sb.append(children.get(i)); | ||
else | ||
sb.append(children.get(i).toStringRedacted()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
|
||
public synchronized void setEnabled(boolean newValue) | ||
{ | ||
boolean oldValue = this.enabled.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hint: use getAndSet
{ | ||
throw new IllegalArgumentException("sai_slow_log_num_slowest_queries must be > 0"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: you can make 0 mean "disabled". Then you don't need a separate enabled/disabled configuration variable and this check wouldn't be needed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to, but it turned out MinMaxPriorityQueue
has this restriction:
https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/MinMaxPriorityQueue.java#L200
} | ||
} | ||
|
||
private void resizeMinMaxHeap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm not so sure here. I can imagine you may wish to generally keep this feature off (for performance / resource usage reasons), or record just a few slow queries, but then if running into trouble you may want to enable it / keep more slow queries when debugging a particular performance problem? Considering this is not adding a lot of code complexity, it is a nice usability feature to be able to tweak settings without restarting the nodes.
*/ | ||
public void addAsync(SaiSlowLog.SlowSaiQuery slowSaiQuery) | ||
{ | ||
// if the buffer is full - tough luck; we loose some slowSaiQueries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimization idea: If you published the fastest (not the slowest) time of a query in the queue seen so far in an atomic variable, then you could consult that variable and discard any queries faster than that immediately. No point in pushing queries which have no chance of getting recorded. That would likely remove a lot of load from that queue.
We should also wakeup the queue processor task here, so we don't have to accumulate queries for 1s.
That would likely decrease the probability of overflowing the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally missed here adding the lower bound check that we should not log anything below, let's say, 200ms (I think this is the default in DSE, for example).
ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries.size()); | ||
queriesList.addAll(slowestQueries); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries.size()); | |
queriesList.addAll(slowestQueries); | |
ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries); |
@@ -81,13 +83,19 @@ public QueryContext(long executionQuotaMs) | |||
{ | |||
this.executionQuotaNano = TimeUnit.MILLISECONDS.toNanos(executionQuotaMs); | |||
this.queryStartTimeNanos = System.nanoTime(); | |||
this.optimizedPlan = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Technically would better like a null
here to represent "no plan yet". Empty string would suggest something went wrong with plan toStringRecursive
.
/** | ||
* Returns the redacted string representation of this node only (used in slow query logging) | ||
*/ | ||
public final String toStringRedacted() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toStringRecursive
is for human readable stuff. I don't think we need a "redacted" flag here - I mean, I can't see a good reason for non-redacted strings. We have toString
and debuggers for that. The same rules apply for logging - why would we log full vectors of size 1500. And we definitely should not log passwords or other stuff from sensitive columns (which is an interesting problem on its own: how do we reliably tell if a value contains sensitive data that must not be logged?)
Maybe it should be named toStringPretty
?
And if you insist on really pushing the formatting options down the call chain, maybe we should have a nicer argument than a boolean - e.g. what about toStringPretty(FormattingOptions options)
?
public String toStringRedacted() | ||
{ | ||
return "<redacted>"; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Why are we redacting all parameters? Parameter values contain important information about the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can keep the numbers, I think that the only "privacy sensitive" data are strings or collections/structures that may contain strings.
we can keep the numbers and vectors for instance, but if we don't know the datatype then it is fine to redact everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happend with query with binding variables ? are they represented here with the actual value ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happend with query with binding variables ? are they represented here with the actual value ?
Yes, they are; the "test" that I added was intentional with a prepared statement.
maybe we can keep the numbers, I think that the only "privacy sensitive" data are strings or collections/structures that may contain strings.
we can keep the numbers and vectors for instance, but if we don't know the datatype then it is fine to redact everything
That sounds like a good idea. I will run it through Pushkala just in case
public String toStringRedacted() | ||
{ | ||
return "<redacted>"; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can keep the numbers, I think that the only "privacy sensitive" data are strings or collections/structures that may contain strings.
we can keep the numbers and vectors for instance, but if we don't know the datatype then it is fine to redact everything
public String toStringRedacted() | ||
{ | ||
return "<redacted>"; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happend with query with binding variables ? are they represented here with the actual value ?
// a structure that keeps the N slowest queries - don't use getter/setter so that it's not exposed over JMX | ||
private final SaiSlowestQueriesQueue slowestQueries; | ||
// KATE: make this false by default and decide on yaml property, -D flag, or JMX to enable; hook setEnabled() to change | ||
private final AtomicBoolean enabled = new AtomicBoolean(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick way to start/stop using new things
I understand your point, but if you want to go this way then we have to add an HTTP endpoint in CNDB in order to toggle this flag.
} | ||
} | ||
|
||
private void resizeMinMaxHeap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not keep code that we won't use.
as said in other comments:
- we can restart the JVM to change the value (and I am not sure that we will often change it)
- we should add new http endpoints to change the configuration and add docs about how to use this, for now we don't have a framework to handle this easily
if we want to keep the code to do the dynamic reconfig I am fine with it, but we cannot left the code there without the full story end-to-end.
return; | ||
|
||
// KATE: Remove below line to avoid logging every query; it is used just during development/testing | ||
logger.debug("Recording slow SAI query"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also in CNDB the logger is by default configured to run at DEBUG level, if you want to not print something by default you have to use TRACE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is an option but in this case I don't think I have to keep this one. It was really just to see what I am doing in the logs as I had some bug
private final long duration; | ||
private final String optimizedPlan; | ||
private final String tracingSessionId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
public List<SlowSaiQuery> retrieveRecentSlowestCqlQueries() | ||
{ | ||
List<SlowSaiQuery> queries = slowestQueries.getAndReset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we reset the queue ? what's the use case ?
this API will be used from Autobot, with a few users (maybe 10 people on a incident call) who are looking at the problem
if the first gets the list and then the list is lost then it is enough to refresh the page and you would have lost this precious information about the system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset is needed in order to get most recent slow queries.
But I see your point, so I would suggest we have both - getAndReset
and only get.
WDYT?
Otherwise, we get stuck with old slow queries until next restart. During debugging, we need recent data.
Configuration/tooling needs to be finished; tests should be updated as per the comments. Any comment starting with KATE: is a placeholder for update to be done.
Looking only for general approach feedback. More details on the patch can be found in the related issue.