Performance Metrics #38

jeisenman23 · 2024-11-14T22:35:03Z

This PR adds time tracking capability within ELM so that we can measure how long queries to the database take, how long chat completions take, and how long the chat function takes to invoke.

grantbuster · 2024-11-18T17:15:17Z

elm/wizard.py

+                       token_budget=None,
+                       new_info_threshold=0.7,
+                       convo=False,
+                       timeit=False):


please add this to the parameters (input args) docstrings. Please make sure there are line breaks before Parameters and Returns (looks like those line breaks got removed here, not sure why)

grantbuster · 2024-11-18T17:16:20Z

elm/wizard.py

@@ -87,6 +90,9 @@ def engineer_query(self, query, token_budget=None, new_info_threshold=0.7,
        references : list
            The list of references (strs) used in the engineered prompt is
            returned here
+        vector_query_time : float


I know you didnt do this but can you add a docstring for used_index here? not sure why we dont have one.

grantbuster · 2024-11-18T17:17:41Z

elm/wizard.py

@@ -184,7 +192,8 @@ def chat(self, query,
            valid ref_col.
        return_chat_obj : bool
            Flag to only return the ChatCompletion from OpenAI API.
-
+        timeit : bool
+            Flag to return the performance metrics on API calls.
        Returns


Same general comments about docstrings here.

grantbuster · 2024-11-18T17:21:27Z

elm/wizard.py

+        total_chat_time = start_chat_time - end_time
+        performance = {
+            "total_chat_time": total_chat_time,
+            "chat_completion_time": chat_completion_time,


am i understanding correctly that the chat completion time is time to get first response from the LLM? That's great and important but if the LLM is in "stream" mode, does this actually work or does the API return something immediately but it only starts responding in the for chunk in response generator loop? If this is the case, maybe we should have the end time be calculated at the first chunk generation in the response object.

grantbuster · 2024-11-18T17:22:39Z

elm/wizard.py

@@ -152,10 +160,10 @@ def chat(self, query,
             token_budget=None,
             new_info_threshold=0.7,
             print_references=False,
-             return_chat_obj=False):
+             return_chat_obj=False,
+             timeit=False):


why don't we just always have timeit=True? I think we should just remove this as a kwarg and always return performance metrics. This will break anything that expects a certain number of outputs but the compute is basically free and it's useful information.

jeisenman23 added 9 commits November 14, 2024 15:15

adds timeit functionality to chat and query functions

5a2e0be

reformatting time metrics

86e2a51

trying to fix lint

93a7d97

fixing lint

4e4dc4b

reducing complexity

fab34d9

removing white spaces

785ec4d

fixing lint

b0429a4

fixing return statement

a41b030

fixing docs

dba79f8

grantbuster self-requested a review November 15, 2024 20:20

grantbuster requested changes Nov 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Metrics #38

Performance Metrics #38

jeisenman23 commented Nov 14, 2024

grantbuster Nov 18, 2024

grantbuster Nov 18, 2024

grantbuster Nov 18, 2024

grantbuster Nov 18, 2024

grantbuster Nov 18, 2024

Performance Metrics #38

Are you sure you want to change the base?

Performance Metrics #38

Conversation

jeisenman23 commented Nov 14, 2024

grantbuster Nov 18, 2024

Choose a reason for hiding this comment

grantbuster Nov 18, 2024

Choose a reason for hiding this comment

grantbuster Nov 18, 2024

Choose a reason for hiding this comment

grantbuster Nov 18, 2024

Choose a reason for hiding this comment

grantbuster Nov 18, 2024

Choose a reason for hiding this comment