Sai slow logging WIP DO NOT REVIEW OR MERGE #1285

ekaterinadimitrova2 · 2024-09-16T21:36:58Z

Configuration/tooling needs to be finished; tests should be updated as per the comments. Any comment starting with KATE: is a placeholder for update to be done.

Looking only for general approach feedback. More details on the patch can be found in the related issue.

Configuration/tooling needs to be finished; tests should be updated as per the comments. Any comment starting with KATE: is a placeholder for update to be done

eolivelli · 2024-09-18T07:13:34Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+    // a structure that keeps the N slowest queries - don't use getter/setter so that it's not exposed over JMX
+    private final SaiSlowestQueriesQueue slowestQueries;
+    // KATE: make this false by default and decide on yaml property, -D flag, or JMX to enable; hook setEnabled() to change
+    private final AtomicBoolean enabled = new AtomicBoolean(true);


No need to allow to disable this at runtime.
Simpler is better

So, it is a -D flag or yaml property. However, I would like to say that it is a matter of adding two more methods to make this particular flag runtime configurable. It does not also require extensive testing. My preference is always to have some extremely quick way to start/stop using new things.

quick way to start/stop using new things

I understand your point, but if you want to go this way then we have to add an HTTP endpoint in CNDB in order to toggle this flag.

As we talked in Slack, yes, I am aware we will need HTTP point to be added when we are done here. I haven't proposed it yet as first I wanted to be sure what we really want. The PR is for early feedback

eolivelli · 2024-09-18T07:14:36Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+
+    public synchronized void setEnabled(boolean newValue)
+    {
+        boolean oldValue = this.enabled.get();


No need to use synchronized here, AtomicBoolean should already have the method to execute conditionally a function and set a new value

Hint: use getAndSet

eolivelli · 2024-09-18T07:15:13Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+        }
+    }
+
+    private void resizeMinMaxHeap()


No need to make this resizable at runtime

Well, I'm not so sure here. I can imagine you may wish to generally keep this feature off (for performance / resource usage reasons), or record just a few slow queries, but then if running into trouble you may want to enable it / keep more slow queries when debugging a particular performance problem? Considering this is not adding a lot of code complexity, it is a nice usability feature to be able to tweak settings without restarting the nodes.

I would prefer to not keep code that we won't use.

as said in other comments:

we can restart the JVM to change the value (and I am not sure that we will often change it)

we should add new http endpoints to change the configuration and add docs about how to use this, for now we don't have a framework to handle this easily

if we want to keep the code to do the dynamic reconfig I am fine with it, but we cannot left the code there without the full story end-to-end.

if we want to keep the code to do the dynamic reconfig I am fine with it, but we cannot left the code there without the full story end-to-end.
I agree and this was never suggested. I did not have things exposed yet because this was not finished and I wanted to be sure that is what we want first before exposing config in nodetool, JMX, HTTP endpoint

eolivelli · 2024-09-18T07:16:35Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowestQueriesQueue.java

+     * Resize the priority queue. It will now hold up to newMaxSize slow queries
+     * @param newMaxSize the new max size of the priority queue
+     */
+    void resize(int newMaxSize)


Let's keep it simple. No need for this

eolivelli · 2024-09-18T07:18:12Z

src/java/org/apache/cassandra/index/sai/plan/StorageAttachedIndexSearcher.java

@@ -127,6 +127,7 @@ public UnfilteredPartitionIterator search(ReadExecutionController executionContr
        // Can't check for `command.isTopK()` because the planner could optimize sorting out
        if (plan.ordering() != null)
        {
+            queryContext.setOptimizedPlan(plan.toStringRecursive());


If this String may contain user data then we have to redact it otherwise we must ask to the customer the permission to use this tool that means that we will use it only in tedt and dev end and it would be a pity

Well, my understanding was at first we want the slowest queries in test/dev to easy identify and debug them while stress testing.

I will look into the redacting, thanks.

ekaterinadimitrova2 · 2024-09-18T23:25:56Z

src/java/org/apache/cassandra/index/sai/plan/Expression.java

@@ -345,6 +345,7 @@ public boolean isSatisfiedBy(ByteBuffer columnValue)
    /**
     * Returns the lower bound of the expression as a ByteComparable with an encoding based on the version and the
     * validator.
+     *


Those were not intentionally added... something from the IDE formatting, I will revert

ekaterinadimitrova2 · 2024-09-18T23:26:19Z

src/java/org/apache/cassandra/index/sai/plan/Expression.java

+
+

Revert those lines later

ekaterinadimitrova2 · 2024-09-18T23:27:33Z

src/java/org/apache/cassandra/index/sai/plan/Plan.java

+    /**
+     * Returns the redacted string representation of this node only (used in slow query logging)
+     */
+    public final String toStringRedacted()


just add a flag for redacted or not in toStringRecursive() - toStringRecursive(boolean redacted)
I am tempted to do the same with toString, just add a flag, but I think we should keep the toString override as we may not have coverage to find all the places where it may pop up and print ugly things.

toStringRecursive is for human readable stuff. I don't think we need a "redacted" flag here - I mean, I can't see a good reason for non-redacted strings. We have toString and debuggers for that. The same rules apply for logging - why would we log full vectors of size 1500. And we definitely should not log passwords or other stuff from sensitive columns (which is an interesting problem on its own: how do we reliably tell if a value contains sensitive data that must not be logged?)

Maybe it should be named toStringPretty?
And if you insist on really pushing the formatting options down the call chain, maybe we should have a nicer argument than a boolean - e.g. what about toStringPretty(FormattingOptions options)?

I don't understand what you are saying here.
Do you say even the current toStringRecursive should be redacted? It seems we print the trees only in trace logs and we never enable trace in prod in CNDB. Though, I am not sure it is not a problem for HCD. I think it is a good idea to ask Pushkala. Here, we log values even from prepared statements.

toStringPretty(FormattingOptions options)
What would be FormattingOptions? I think I am not sure what you have in mind.

ekaterinadimitrova2 · 2024-09-18T23:28:49Z

test/conf/logback-test.xml

@@ -48,7 +48,7 @@
      <pattern>%-5level [%thread] %date{ISO8601} %F:%L - %msg%n</pattern>
    </encoder>
    <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
-      <level>DEBUG</level>
+      <level>TRACE</level>


Just for testing, revert later

ekaterinadimitrova2 · 2024-09-18T23:31:18Z

test/unit/org/apache/cassandra/index/sai/plan/QueryPlanRedactionTest.java

+public class QueryPlanRedactionTest extends SAITester
+{
+    @Before
+    public void setup()
+    {
+        requireNetwork();
+    }
+
+    private static final String KEYSPACE = "prepared_stmt_cleanup";
+    private static final String createKsStatement = "CREATE KEYSPACE " + KEYSPACE +
+                                                    " WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };";
+
+    @Test
+    public void testQuery()
+    {
+        String selectCQL2 = "SELECT * FROM t WHERE val : 'missing' OR (val : ? AND val : ?)";
+        Session session = getSession(ProtocolVersion.V5);
+        session.execute(createKsStatement);
+        session.execute("Use prepared_stmt_cleanup");
+        session.execute("CREATE TABLE t (id text PRIMARY KEY, val text)");
+        session.execute("CREATE CUSTOM INDEX ON t(val) USING 'StorageAttachedIndex' WITH OPTIONS = {" +
+                        "'index_analyzer': '{\n" +
+                        "\t\"tokenizer\":{\"name\":\"ngram\", \"args\":{\"minGramSize\":\"2\", \"maxGramSize\":\"3\"}}," +
+                        "\t\"filters\":[{\"name\":\"lowercase\"}]\n" +
+                        "}'," +
+                        "'query_analyzer': '{\n" +
+                        "\t\"tokenizer\":{\"name\":\"whitespace\"},\n" +
+                        "\t\"filters\":[{\"name\":\"porterstem\"}]\n" +
+                        "}'};");
+
+        PreparedStatement preparedSelect2 = session.prepare(selectCQL2);
+        session.execute(preparedSelect2.bind("'quick'", "'dog'")).all().size();
+    }
+}


This is not a proper test, it was just to help me print while working on the patch. Proper testing will be added when we agree on the actual queue (latest vs slowest, singleton, etc)

This prints the following example for redacting in the logs:

TRACE [ReadStage-1] 2024-09-18 18:50:32,200 Plan.java:361 - Optimizing plan: Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) │ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []} └─ Intersection (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) │ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []} └─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []} TRACE [ReadStage-1] 2024-09-18 18:50:32,201 Plan.java:372 - Candidate query plan: Limit 5000 (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity) └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: 0.000000000) (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity) └─ Fetch (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity) └─ Everything (keys: 0.0, cost/key: 0.0, cost: Infinity..Infinity) TRACE [MemtableFlushWriter:2] 2024-09-18 18:50:32,201 LifecycleTransaction.java:239 - Committing transaction over [] staged: [obsolete: [], update: []], logged: [obsolete: [], update: []] TRACE [ReadStage-1] 2024-09-18 18:50:32,202 Plan.java:372 - Candidate query plan: Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) │ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []} └─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []} TRACE [ReadStage-1] 2024-09-18 18:50:32,202 Plan.java:372 - Candidate query plan: Limit 5000 (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity) └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: 0.000000000) (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity) └─ Fetch (rows: 0.0, cost/row: 0.0, cost: Infinity..Infinity) └─ Everything (keys: 0.0, cost/key: 0.0, cost: Infinity..Infinity) TRACE [ReadStage-1] 2024-09-18 18:50:32,202 Plan.java:379 - Optimized plan: Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) │ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []} └─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []} TRACE [MemtableFlushWriter:2] 2024-09-18 18:50:32,202 LogTransaction.java:455 - Completing txn [nb_txn_flush_68b82e70-7610-11ef-a854-a9f24348a0d4.log in /Users/ekaterina.dimitrova/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3] with last record COMMIT:[,0,0][2613697770] TRACE [MemtableFlushWriter:2] 2024-09-18 18:50:32,202 LogTransaction.java:320 - Removing files for transaction [nb_txn_flush_68b82e70-7610-11ef-a854-a9f24348a0d4.log in /Users/ekaterina.dimitrova/cassandra/build/test/cassandra/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3] TRACE [ReadStage-1] 2024-09-18 18:50:32,203 QueryController.java:369 - Query execution plan: Limit 5000 (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Filter (val : <redacted> OR (val : <redacted> AND val : <redacted>)) (sel: NaN) (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Fetch (rows: 0.0, cost/row: 0.0, cost: 0.0..0.0) └─ Union (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) ├─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) │ predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []} └─ LiteralIndexScan of t_val_idx (sel: 0.000000000, step: 0.0) (keys: 0.0, cost/key: 0.0, cost: 0.0..0.0) predicate: Expression{name: val, op: MATCH, lower: <redacted>, upper: <redacted>, exclusions: []}

ekaterinadimitrova2 · 2024-09-19T01:24:22Z

test/unit/org/apache/cassandra/index/sai/plan/PlanTest.java

@@ -576,6 +578,34 @@ public void prettyPrint()
                     "                    predicate: RANGE(pred4)\n", prettyStr);
    }

+    @Test
+    public void prettyPrintRedacted()


Mocking got on the way so I will probably remove this one

pkolaczk

A good start!

At a high level, I got one major concern: we're treating every query execution individually and not aggregating them in any way. So I'm afraid, it is quite likely the whole slow log will fill up with multiple executions of a single slow query (maybe with different parameters; but a query based on the same CQL string template) and all other slightly faster queries will be pushed out. That would severely limit usefulness of this feature.

Maybe consider grouping them by prepared statement ID and keep at most one (or a few?) most costly execution plans per prepared statement id in the log? you could then also keep the max / average runtime per prepared statement.

Other than that, I left some minor comments / suggestions in details of my review.

pkolaczk · 2024-09-19T06:43:55Z

src/java/org/apache/cassandra/db/filter/RowFilter.java

+                if (!redacted)
+                    sb.append(expressions.get(i));
+                else
+                    sb.append(expressions.get(i).toStringRedacted());
+


Could use sb.append(expressions.get(i).toStringInternal(redacted).

pkolaczk · 2024-09-19T06:44:26Z

src/java/org/apache/cassandra/db/filter/RowFilter.java

+                if (!redacted)
+                    sb.append(children.get(i));
+                else
+                    sb.append(children.get(i).toStringRedacted());


Same as above

pkolaczk · 2024-09-19T06:54:55Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+
+    public synchronized void setEnabled(boolean newValue)
+    {
+        boolean oldValue = this.enabled.get();


Hint: use getAndSet

pkolaczk · 2024-09-19T06:57:11Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+            {
+                throw new IllegalArgumentException("sai_slow_log_num_slowest_queries must be > 0");
+            }


Suggestion: you can make 0 mean "disabled". Then you don't need a separate enabled/disabled configuration variable and this check wouldn't be needed too.

I wanted to, but it turned out MinMaxPriorityQueue has this restriction:
https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/MinMaxPriorityQueue.java#L200

pkolaczk · 2024-09-19T07:00:56Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+        }
+    }
+
+    private void resizeMinMaxHeap()


Well, I'm not so sure here. I can imagine you may wish to generally keep this feature off (for performance / resource usage reasons), or record just a few slow queries, but then if running into trouble you may want to enable it / keep more slow queries when debugging a particular performance problem? Considering this is not adding a lot of code complexity, it is a nice usability feature to be able to tweak settings without restarting the nodes.

pkolaczk · 2024-09-19T08:33:06Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowestQueriesQueue.java

+     */
+    public void addAsync(SaiSlowLog.SlowSaiQuery slowSaiQuery)
+    {
+        // if the buffer is full - tough luck; we loose some slowSaiQueries


Optimization idea: If you published the fastest (not the slowest) time of a query in the queue seen so far in an atomic variable, then you could consult that variable and discard any queries faster than that immediately. No point in pushing queries which have no chance of getting recorded. That would likely remove a lot of load from that queue.

We should also wakeup the queue processor task here, so we don't have to accumulate queries for 1s.
That would likely decrease the probability of overflowing the buffer.

I totally missed here adding the lower bound check that we should not log anything below, let's say, 200ms (I think this is the default in DSE, for example).

pkolaczk · 2024-09-19T08:36:36Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowestQueriesQueue.java

+        ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries.size());
+        queriesList.addAll(slowestQueries);


Suggested change

ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries.size());

queriesList.addAll(slowestQueries);

ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries);

pkolaczk · 2024-09-19T08:42:24Z

src/java/org/apache/cassandra/index/sai/QueryContext.java

@@ -81,13 +83,19 @@ public QueryContext(long executionQuotaMs)
    {
        this.executionQuotaNano = TimeUnit.MILLISECONDS.toNanos(executionQuotaMs);
        this.queryStartTimeNanos = System.nanoTime();
+        this.optimizedPlan = "";


nit: Technically would better like a null here to represent "no plan yet". Empty string would suggest something went wrong with plan toStringRecursive.

pkolaczk · 2024-09-19T08:45:49Z

src/java/org/apache/cassandra/index/sai/plan/Plan.java

+    /**
+     * Returns the redacted string representation of this node only (used in slow query logging)
+     */
+    public final String toStringRedacted()


toStringRecursive is for human readable stuff. I don't think we need a "redacted" flag here - I mean, I can't see a good reason for non-redacted strings. We have toString and debuggers for that. The same rules apply for logging - why would we log full vectors of size 1500. And we definitely should not log passwords or other stuff from sensitive columns (which is an interesting problem on its own: how do we reliably tell if a value contains sensitive data that must not be logged?)

Maybe it should be named toStringPretty?
And if you insist on really pushing the formatting options down the call chain, maybe we should have a nicer argument than a boolean - e.g. what about toStringPretty(FormattingOptions options)?

pkolaczk · 2024-09-19T10:24:08Z

src/java/org/apache/cassandra/db/filter/RowFilter.java

+        public String toStringRedacted()
+        {
+            return "<redacted>";
+        }
+


Question: Why are we redacting all parameters? Parameter values contain important information about the query.

maybe we can keep the numbers, I think that the only "privacy sensitive" data are strings or collections/structures that may contain strings.
we can keep the numbers and vectors for instance, but if we don't know the datatype then it is fine to redact everything

what happend with query with binding variables ? are they represented here with the actual value ?

what happend with query with binding variables ? are they represented here with the actual value ?
Yes, they are; the "test" that I added was intentional with a prepared statement.

maybe we can keep the numbers, I think that the only "privacy sensitive" data are strings or collections/structures that may contain strings. we can keep the numbers and vectors for instance, but if we don't know the datatype then it is fine to redact everything

That sounds like a good idea. I will run it through Pushkala just in case

eolivelli · 2024-09-19T12:12:48Z

src/java/org/apache/cassandra/db/filter/RowFilter.java

+        public String toStringRedacted()
+        {
+            return "<redacted>";
+        }
+


maybe we can keep the numbers, I think that the only "privacy sensitive" data are strings or collections/structures that may contain strings.
we can keep the numbers and vectors for instance, but if we don't know the datatype then it is fine to redact everything

eolivelli · 2024-09-19T12:13:42Z

src/java/org/apache/cassandra/db/filter/RowFilter.java

+        public String toStringRedacted()
+        {
+            return "<redacted>";
+        }
+


what happend with query with binding variables ? are they represented here with the actual value ?

eolivelli · 2024-09-19T12:15:53Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+    // a structure that keeps the N slowest queries - don't use getter/setter so that it's not exposed over JMX
+    private final SaiSlowestQueriesQueue slowestQueries;
+    // KATE: make this false by default and decide on yaml property, -D flag, or JMX to enable; hook setEnabled() to change
+    private final AtomicBoolean enabled = new AtomicBoolean(true);


quick way to start/stop using new things

I understand your point, but if you want to go this way then we have to add an HTTP endpoint in CNDB in order to toggle this flag.

eolivelli · 2024-09-19T12:20:45Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+        }
+    }
+
+    private void resizeMinMaxHeap()


I would prefer to not keep code that we won't use.

as said in other comments:

we can restart the JVM to change the value (and I am not sure that we will often change it)

we should add new http endpoints to change the configuration and add docs about how to use this, for now we don't have a framework to handle this easily

if we want to keep the code to do the dynamic reconfig I am fine with it, but we cannot left the code there without the full story end-to-end.

eolivelli · 2024-09-19T12:21:23Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+            return;
+
+        // KATE: Remove below line to avoid logging every query; it is used just during development/testing
+        logger.debug("Recording slow SAI query");


also in CNDB the logger is by default configured to run at DEBUG level, if you want to not print something by default you have to use TRACE

Yeah, it is an option but in this case I don't think I have to keep this one. It was really just to see what I am doing in the logs as I had some bug

eolivelli · 2024-09-19T12:22:04Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+        private final long duration;
+        private final String optimizedPlan;
+        private final String tracingSessionId;


eolivelli · 2024-09-19T12:26:21Z

src/java/org/apache/cassandra/db/monitoring/SaiSlowLog.java

+
+    public List<SlowSaiQuery> retrieveRecentSlowestCqlQueries()
+    {
+        List<SlowSaiQuery> queries = slowestQueries.getAndReset();


why do we reset the queue ? what's the use case ?

this API will be used from Autobot, with a few users (maybe 10 people on a incident call) who are looking at the problem
if the first gets the list and then the list is lost then it is enough to refresh the page and you would have lost this precious information about the system

Reset is needed in order to get most recent slow queries.
But I see your point, so I would suggest we have both - getAndReset and only get. WDYT?
Otherwise, we get stuck with old slow queries until next restart. During debugging, we need recent data.

Slow SAI queries logging

8791cc3

Configuration/tooling needs to be finished; tests should be updated as per the comments. Any comment starting with KATE: is a placeholder for update to be done

ekaterinadimitrova2 self-assigned this Sep 16, 2024

eolivelli reviewed Sep 18, 2024

View reviewed changes

WIP redaction

e9f120a

ekaterinadimitrova2 commented Sep 18, 2024

View reviewed changes

ekaterinadimitrova2 commented Sep 19, 2024

View reviewed changes

pkolaczk reviewed Sep 19, 2024

View reviewed changes

eolivelli reviewed Sep 19, 2024

View reviewed changes

		ArrayList<SaiSlowLog.SlowSaiQuery> queriesList = new ArrayList<>(slowestQueries.size());
		queriesList.addAll(slowestQueries);

Sai slow logging WIP DO NOT REVIEW OR MERGE #1285

Are you sure you want to change the base?

Sai slow logging WIP DO NOT REVIEW OR MERGE #1285

Conversation

ekaterinadimitrova2 commented Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekaterinadimitrova2 Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkolaczk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekaterinadimitrova2 commented Sep 16, 2024 •

edited

Loading

ekaterinadimitrova2 Sep 18, 2024 •

edited

Loading