Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sai slow logging WIP DO NOT REVIEW OR MERGE #1285

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ekaterinadimitrova2
Copy link
Collaborator

@ekaterinadimitrova2 ekaterinadimitrova2 commented Sep 16, 2024

Configuration/tooling needs to be finished; tests should be updated as per the comments. Any comment starting with KATE: is a placeholder for update to be done.

Looking only for general approach feedback. More details on the patch can be found in the related issue.

Configuration/tooling needs to be finished; tests should be updated as per the comments. Any comment starting with KATE: is a placeholder for update to be done
@ekaterinadimitrova2 ekaterinadimitrova2 self-assigned this Sep 16, 2024
// a structure that keeps the N slowest queries - don't use getter/setter so that it's not exposed over JMX
private final SaiSlowestQueriesQueue slowestQueries;
// KATE: make this false by default and decide on yaml property, -D flag, or JMX to enable; hook setEnabled() to change
private final AtomicBoolean enabled = new AtomicBoolean(true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to allow to disable this at runtime.
Simpler is better

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it is a -D flag or yaml property. However, I would like to say that it is a matter of adding two more methods to make this particular flag runtime configurable. It does not also require extensive testing. My preference is always to have some extremely quick way to start/stop using new things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick way to start/stop using new things

I understand your point, but if you want to go this way then we have to add an HTTP endpoint in CNDB in order to toggle this flag.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we talked in Slack, yes, I am aware we will need HTTP point to be added when we are done here. I haven't proposed it yet as first I wanted to be sure what we really want. The PR is for early feedback


public synchronized void setEnabled(boolean newValue)
{
boolean oldValue = this.enabled.get();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use synchronized here, AtomicBoolean should already have the method to execute conditionally a function and set a new value

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint: use getAndSet

}
}

private void resizeMinMaxHeap()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to make this resizable at runtime

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not so sure here. I can imagine you may wish to generally keep this feature off (for performance / resource usage reasons), or record just a few slow queries, but then if running into trouble you may want to enable it / keep more slow queries when debugging a particular performance problem? Considering this is not adding a lot of code complexity, it is a nice usability feature to be able to tweak settings without restarting the nodes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not keep code that we won't use.

as said in other comments:

  • we can restart the JVM to change the value (and I am not sure that we will often change it)
  • we should add new http endpoints to change the configuration and add docs about how to use this, for now we don't have a framework to handle this easily

if we want to keep the code to do the dynamic reconfig I am fine with it, but we cannot left the code there without the full story end-to-end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to keep the code to do the dynamic reconfig I am fine with it, but we cannot left the code there without the full story end-to-end.
I agree and this was never suggested. I did not have things exposed yet because this was not finished and I wanted to be sure that is what we want first before exposing config in nodetool, JMX, HTTP endpoint

* Resize the priority queue. It will now hold up to newMaxSize slow queries
* @param newMaxSize the new max size of the priority queue
*/
void resize(int newMaxSize)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it simple. No need for this

@@ -127,6 +127,7 @@ public UnfilteredPartitionIterator search(ReadExecutionController executionContr
// Can't check for `command.isTopK()` because the planner could optimize sorting out
if (plan.ordering() != null)
{
queryContext.setOptimizedPlan(plan.toStringRecursive());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this String may contain user data then we have to redact it otherwise we must ask to the customer the permission to use this tool that means that we will use it only in tedt and dev end and it would be a pity

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my understanding was at first we want the slowest queries in test/dev to easy identify and debug them while stress testing.

I will look into the redacting, thanks.

@@ -345,6 +345,7 @@ public boolean isSatisfiedBy(ByteBuffer columnValue)
/**
* Returns the lower bound of the expression as a ByteComparable with an encoding based on the version and the
* validator.
*
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those were not intentionally added... something from the IDE formatting, I will revert

Comment on lines +568 to +569


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert those lines later

/**
* Returns the redacted string representation of this node only (used in slow query logging)
*/
public final String toStringRedacted()
Copy link
Collaborator Author

@ekaterinadimitrova2 ekaterinadimitrova2 Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just add a flag for redacted or not in toStringRecursive() - toStringRecursive(boolean redacted)
I am tempted to do the same with toString, just add a flag, but I think we should keep the toString override as we may not have coverage to find all the places where it may pop up and print ugly things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toStringRecursive is for human readable stuff. I don't think we need a "redacted" flag here - I mean, I can't see a good reason for non-redacted strings. We have toString and debuggers for that. The same rules apply for logging - why would we log full vectors of size 1500. And we definitely should not log passwords or other stuff from sensitive columns (which is an interesting problem on its own: how do we reliably tell if a value contains sensitive data that must not be logged?)

Maybe it should be named toStringPretty?
And if you insist on really pushing the formatting options down the call chain, maybe we should have a nicer argument than a boolean - e.g. what about toStringPretty(FormattingOptions options)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you are saying here.
Do you say even the current toStringRecursive should be redacted? It seems we print the trees only in trace logs and we never enable trace in prod in CNDB. Though, I am not sure it is not a problem for HCD. I think it is a good idea to ask Pushkala. Here, we log values even from prepared statements.

toStringPretty(FormattingOptions options)
What would be FormattingOptions? I think I am not sure what you have in mind.

@@ -48,7 +48,7 @@
<pattern>%-5level [%thread] %date{ISO8601} %F:%L - %msg%n</pattern>
</encoder>
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>DEBUG</level>
<level>TRACE</level>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for testing, revert later

Comment on lines +29 to +62
public class QueryPlanRedactionTest extends SAITester
{
@Before
public void setup()
{
requireNetwork();
}

private static final String KEYSPACE = "prepared_stmt_cleanup";
private static final String createKsStatement = "CREATE KEYSPACE " + KEYSPACE +
" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };";

@Test
public void testQuery()
{
String selectCQL2 = "SELECT * FROM t WHERE val : 'missing' OR (val : ? AND val : ?)";
Session session = getSession(ProtocolVersion.V5);
session.execute(createKsStatement);
session.execute("Use prepared_stmt_cleanup");
session.execute("CREATE TABLE t (id text PRIMARY KEY, val text)");
session.execute("CREATE CUSTOM INDEX ON t(val) USING 'StorageAttachedIndex' WITH OPTIONS = {" +
"'index_analyzer': '{\n" +
"\t\"tokenizer\":{\"name\":\"ngram\", \"args\":{\"minGramSize\":\"2\", \"maxGramSize\":\"3\"}}," +
"\t\"filters\":[{\"name\":\"lowercase\"}]\n" +
"}'," +
"'query_analyzer': '{\n" +
"\t\"tokenizer\":{\"name\":\"whitespace\"},\n" +
"\t\"filters\":[{\"name\":\"porterstem\"}]\n" +
"}'};");

PreparedStatement preparedSelect2 = session.prepare(selectCQL2);
session.execute(preparedSelect2.bind("'quick'", "'dog'")).all().size();
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a proper test, it was just to help me print while working on the patch. Proper testing will be added when we agree on the actual queue (latest vs slowest, singleton, etc)

This prints the following example for redacting in the logs:

TRACE [ReadStage-1] 2024-09-18 18:50:32,200 Plan.java:361 - Optimizing plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
 └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
     └─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
         └─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
             ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
             │  predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
             └─ Intersection (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
                 ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
                 │  predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
                 └─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
                    predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}

TRACE [ReadStage-1] 2024-09-18 18:50:32,201 Plan.java:372 - Candidate query plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
 └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: 0.000000000) (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
     └─ Fetch (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
         └─ Everything (keys: 0.0, cost/key: 0.0, cost: Infinity..Infinity)

TRACE [MemtableFlushWriter:2] 2024-09-18 18:50:32,201 LifecycleTransaction.java:239 - Committing transaction over [] staged: [obsolete: [], update: []], logged: [obsolete: [], update: []]
TRACE [ReadStage-1] 2024-09-18 18:50:32,202 Plan.java:372 - Candidate query plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
 └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
     └─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
         └─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
             ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
             │  predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
             └─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
                predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}

TRACE [ReadStage-1] 2024-09-18 18:50:32,202 Plan.java:372 - Candidate query plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
 └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: 0.000000000) (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
     └─ Fetch (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity)
         └─ Everything (keys: 0.0, cost/key: 0.0, cost: Infinity..Infinity)

TRACE [ReadStage-1] 2024-09-18 18:50:32,202 Plan.java:379 - Optimized plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
 └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
     └─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
         └─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
             ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
             │  predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
             └─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
                predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}

TRACE [MemtableFlushWriter:2] 2024-09-18 18:50:32,202 LogTransaction.java:455 - Completing txn [nb_txn_flush_68b82e70-7610-11ef-a854-a9f24348a0d4.log in /Users/ekaterina.dimitrova/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3] with last record COMMIT:[,0,0][2613697770]
TRACE [MemtableFlushWriter:2] 2024-09-18 18:50:32,202 LogTransaction.java:320 - Removing files for transaction [nb_txn_flush_68b82e70-7610-11ef-a854-a9f24348a0d4.log in /Users/ekaterina.dimitrova/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3]
TRACE [ReadStage-1] 2024-09-18 18:50:32,203 QueryController.java:369 - Query execution plan:
Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
 └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
     └─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0)
         └─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
             ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
             │  predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}
             └─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0)
                predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}

@@ -576,6 +578,34 @@ public void prettyPrint()
" predicate: RANGE(pred4)\n", prettyStr);
}

@Test
public void prettyPrintRedacted()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mocking got on the way so I will probably remove this one

Copy link

@pkolaczk pkolaczk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good start!

At a high level, I got one major concern: we're treating every query execution individually and not aggregating them in any way. So I'm afraid, it is quite likely the whole slow log will fill up with multiple executions of a single slow query (maybe with different parameters; but a query based on the same CQL string template) and all other slightly faster queries will be pushed out. That would severely limit usefulness of this feature.

Maybe consider grouping them by prepared statement ID and keep at most one (or a few?) most costly execution plans per prepared statement id in the log? you could then also keep the max / average runtime per prepared statement.

Other than that, I left some minor comments / suggestions in details of my review.

Comment on lines +502 to +506
if (!redacted)
sb.append(expressions.get(i));
else
sb.append(expressions.get(i).toStringRedacted());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use sb.append(expressions.get(i).toStringInternal(redacted).

Comment on lines +513 to +516
if (!redacted)
sb.append(children.get(i));
else
sb.append(children.get(i).toStringRedacted());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above


public synchronized void setEnabled(boolean newValue)
{
boolean oldValue = this.enabled.get();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hint: use getAndSet

Comment on lines +87 to +89
{
throw new IllegalArgumentException("sai_slow_log_num_slowest_queries must be > 0");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: you can make 0 mean "disabled". Then you don't need a separate enabled/disabled configuration variable and this check wouldn't be needed too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to, but it turned out MinMaxPriorityQueue has this restriction:
https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/MinMaxPriorityQueue.java#L200

}
}

private void resizeMinMaxHeap()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not so sure here. I can imagine you may wish to generally keep this feature off (for performance / resource usage reasons), or record just a few slow queries, but then if running into trouble you may want to enable it / keep more slow queries when debugging a particular performance problem? Considering this is not adding a lot of code complexity, it is a nice usability feature to be able to tweak settings without restarting the nodes.

*/
public void addAsync(SaiSlowLog.SlowSaiQuery slowSaiQuery)
{
// if the buffer is full - tough luck; we loose some slowSaiQueries

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization idea: If you published the fastest (not the slowest) time of a query in the queue seen so far in an atomic variable, then you could consult that variable and discard any queries faster than that immediately. No point in pushing queries which have no chance of getting recorded. That would likely remove a lot of load from that queue.

We should also wakeup the queue processor task here, so we don't have to accumulate queries for 1s.
That would likely decrease the probability of overflowing the buffer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally missed here adding the lower bound check that we should not log anything below, let's say, 200ms (I think this is the default in DSE, for example).

Comment on lines +140 to +141
ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries.size());
queriesList.addAll(slowestQueries);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries.size());
queriesList.addAll(slowestQueries);
ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries);

@@ -81,13 +83,19 @@ public QueryContext(long executionQuotaMs)
{
this.executionQuotaNano = TimeUnit.MILLISECONDS.toNanos(executionQuotaMs);
this.queryStartTimeNanos = System.nanoTime();
this.optimizedPlan = "";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Technically would better like a null here to represent "no plan yet". Empty string would suggest something went wrong with plan toStringRecursive.

/**
* Returns the redacted string representation of this node only (used in slow query logging)
*/
public final String toStringRedacted()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toStringRecursive is for human readable stuff. I don't think we need a "redacted" flag here - I mean, I can't see a good reason for non-redacted strings. We have toString and debuggers for that. The same rules apply for logging - why would we log full vectors of size 1500. And we definitely should not log passwords or other stuff from sensitive columns (which is an interesting problem on its own: how do we reliably tell if a value contains sensitive data that must not be logged?)

Maybe it should be named toStringPretty?
And if you insist on really pushing the formatting options down the call chain, maybe we should have a nicer argument than a boolean - e.g. what about toStringPretty(FormattingOptions options)?

Comment on lines +816 to +820
public String toStringRedacted()
{
return "<redacted>";
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why are we redacting all parameters? Parameter values contain important information about the query.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can keep the numbers, I think that the only "privacy sensitive" data are strings or collections/structures that may contain strings.
we can keep the numbers and vectors for instance, but if we don't know the datatype then it is fine to redact everything

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happend with query with binding variables ? are they represented here with the actual value ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happend with query with binding variables ? are they represented here with the actual value ?
Yes, they are; the "test" that I added was intentional with a prepared statement.

maybe we can keep the numbers, I think that the only "privacy sensitive" data are strings or collections/structures that may contain strings.
we can keep the numbers and vectors for instance, but if we don't know the datatype then it is fine to redact everything

That sounds like a good idea. I will run it through Pushkala just in case

Comment on lines +816 to +820
public String toStringRedacted()
{
return "<redacted>";
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can keep the numbers, I think that the only "privacy sensitive" data are strings or collections/structures that may contain strings.
we can keep the numbers and vectors for instance, but if we don't know the datatype then it is fine to redact everything

Comment on lines +816 to +820
public String toStringRedacted()
{
return "<redacted>";
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happend with query with binding variables ? are they represented here with the actual value ?

// a structure that keeps the N slowest queries - don't use getter/setter so that it's not exposed over JMX
private final SaiSlowestQueriesQueue slowestQueries;
// KATE: make this false by default and decide on yaml property, -D flag, or JMX to enable; hook setEnabled() to change
private final AtomicBoolean enabled = new AtomicBoolean(true);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick way to start/stop using new things

I understand your point, but if you want to go this way then we have to add an HTTP endpoint in CNDB in order to toggle this flag.

}
}

private void resizeMinMaxHeap()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not keep code that we won't use.

as said in other comments:

  • we can restart the JVM to change the value (and I am not sure that we will often change it)
  • we should add new http endpoints to change the configuration and add docs about how to use this, for now we don't have a framework to handle this easily

if we want to keep the code to do the dynamic reconfig I am fine with it, but we cannot left the code there without the full story end-to-end.

return;

// KATE: Remove below line to avoid logging every query; it is used just during development/testing
logger.debug("Recording slow SAI query");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in CNDB the logger is by default configured to run at DEBUG level, if you want to not print something by default you have to use TRACE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is an option but in this case I don't think I have to keep this one. It was really just to see what I am doing in the logs as I had some bug

Comment on lines +130 to +132
private final long duration;
private final String optimizedPlan;
private final String tracingSessionId;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


public List<SlowSaiQuery> retrieveRecentSlowestCqlQueries()
{
List<SlowSaiQuery> queries = slowestQueries.getAndReset();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we reset the queue ? what's the use case ?

this API will be used from Autobot, with a few users (maybe 10 people on a incident call) who are looking at the problem
if the first gets the list and then the list is lost then it is enough to refresh the page and you would have lost this precious information about the system

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset is needed in order to get most recent slow queries.
But I see your point, so I would suggest we have both - getAndReset and only get. WDYT?
Otherwise, we get stuck with old slow queries until next restart. During debugging, we need recent data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants