Add cost command #240

bshifaw · 2023-01-24T15:22:01Z

The cost command will provide the cost for running a workflow, and have a "detailed" option that breaks down the cost per task of a workflow

lbergelson

@bshifaw Cool! I've reviewed it for you. Quick turnaround but lots of comments.

lbergelson · 2023-01-24T15:41:15Z

src/cromshell/utilities/config_options_file_utils.py

@@ -5,7 +5,9 @@
 LOGGER = logging.getLogger(__name__)


-CONFIG_FILE_TEMPLATE = {"cromwell_server": "str", "requests_timeout": "int"}
+CONFIG_FILE_TEMPLATE = {


This is a silly nitpick, but I might add line breaks so each attribute pair was on it's own line. I start to get lost in lists like this when there are more than a few elements.

lbergelson · 2023-01-24T15:42:20Z

src/cromshell/cost/command.py

+    """
+    Get the cost for a workflow."
+    Only works for workflows that completed more than 8 hours ago on GCS."
+    Requires the 'gcp_bq_cost_table.config' configuration file to exist and contain"


Is this still its own file? Or is it in the general config file now?

general config now. fixed

lbergelson · 2023-01-24T15:43:40Z

src/cromshell/cost/command.py

+    )
+
+    # check if workflow id exists
+    if not workflow_id_utils.workflow_id_exists(


Future refactoring note, I assume we need this check pretty much every time we get the workflow id so maybe we could bake the error checking into the get function. (Or another function that wraps i and does the error checking).

lbergelson · 2023-01-24T15:44:23Z

src/cromshell/cost/command.py

+        )
+
+    # check that bq setting in json config
+    check_cost_table_is_configured(config_options=config.cromshell_config_options)


Should this really be configurable per cromwell server? I assume some servers log to different BQ tables than others?

the BQ tables are configured by the gcp project being used. There could be a situation where someone sets up or has access to multiple cromwell servers using different billing projects but I don't know how likely that would be.

lbergelson · 2023-01-24T15:47:17Z

src/cromshell/cost/command.py

+    from cromshell.utilities import http_utils
+
+    formatted_metadata_parameter = metadata_command.format_metadata_params(
+        list_of_keys=config.METADATA_KEYS_TO_OMIT,


We should probably set this metadata call to get only the key we need for the timestamps instead of using the general metadata settings.

Done, using the following keys : ["start", "status", "id", "end", "backend", "workflowProcessingEvents"]
The "backend" key lists the backend used to run the workflow. This is useful to confirm that the workflow was executed on the cloud (Papi, Tes) and later to check for cost in google or azure. The not-so-great part about this key is nested in "calls" key so we endup getting all the task keys aswell.

I wonder why they do it that way, is it possible to execute cromwell with multiple backends simultaenously?

lbergelson · 2023-01-24T16:46:32Z

src/cromshell/cost/command.py

+            if row["cost"] is not None:
+                if row["cost"] != "cost":
+                    cost = round(float(row["cost"]), 2)
+                    if cost >= .01:


Why is it necessary to put a .01 in? Is it to remove 0's so it's clear that there was SOME cost?

It seems like this if/else should be replaced with max(cost, .01)

yes, that's what I assume. Here it is in CS1
replaced with max

lbergelson · 2023-01-24T17:47:38Z

src/cromshell/cost/command.py

+
+def check_cost_table_is_configured(config_options: dict) -> None:
+    if "bq_cost_table" not in config_options:
+        raise KeyError('Cromshell config file is missing "bq_cost_table" Key')


This should probably be a custom error of some sort? It's something correctable by the user. (maybe print the path to the config file that's in use as well?)

Moved the function to the cromshell_config_options_file_utils, added the custom error, and generalized the function such that it will check whatever key provided

lbergelson · 2023-01-24T17:49:20Z

src/cromshell/cost/command.py

+
+    # Todo: remove LIMIT in query
+    if detailed:
+        query_job = client.query(


is there a structured query builder AP so you're not doing direct string interpolation? It's probably not an issue but queries built like this are insanely vulnerable to sql injection type attacks. Unless of course python is clever enough to use a custom interpolation engine here which performs escaping. I have heard about for some languages but I'm not sure which.

Good call! There's some stuff about it here that can be implemented. I was able to parameterize all user input except table name, which the article expertly says it can't do.

lbergelson · 2023-01-24T17:50:10Z

src/cromshell/cost/command.py

+    :param end_date:
+    :return:
+    """
+    from google.cloud import bigquery


Is it considered good practice to do imports as locally as possible? It always seems weird to me because it means it's hard to tell what a file depends on and it won't fail for missing imports until later but maybe that's part of the python style?

import statements can be executed just about anywhere. It's often useful to place them inside functions to restrict their visibility and/or reduce initial startup time. Although Python's interpreter is optimized to not import the same module multiple times, repeatedly executing an import statement can seriously affect performance in some circumstances.

I was thinking local would be savvy because we could do some checks (has it been 24 hours, does workflow id exist) before importing the library. But I can move it back up to the top for visibility, I don't think its that big of an import.

lbergelson · 2023-01-24T17:50:45Z

src/cromshell/cost/command.py

+        detailed=detailed
+    )
+
+    temp_query_result_csv = NamedTemporaryFile()


Why are we writing all these temp files? Can't we just do this processing in memory?

…e_utils.py. Specified variable types in cromshellconfig

…requirement

…mat metadata keys and retrieve workflow metadata. To make function more reusable because other commands would only want the metadata and not print it.

…, removed description and workflowid columns from detailed report

…utils.py enum call

…me, first round of unit tests added, additional versions of helloworld metadata jsons added.

…t functions

…ow id

bshifaw · 2023-03-10T14:49:47Z

src/cromshell/utilities/cromshellconfig.py

@@ -64,32 +64,6 @@ def override_requests_cert_parameters(skip_certs: bool) -> None:
        )


-class WorkflowStatuses(Enum):


moved to its own file called workflow_status_utils.py

… cost detailed made sure 2 decimal points are printed. updated unit tests

lbergelson

@bshifaw very minor comments
looks good after those

lbergelson · 2023-03-30T18:20:03Z

README.md

+   #### Get cost for a workflow
+   * `cost [-c] [-d] [workflow-id] [[workflow-id]...]`
+     * Get the cost for a workflow.
+     * Only works for workflows that completed more than 8 hours ago on GCS.


8 hours here, 24 below?

lbergelson · 2023-03-30T18:20:58Z

src/cromshell/cost/command.py

+    Requires the 'bq_cost_table' key in the cromshell configuration file to be
+    set to the big query cost table for your organization.
+
+    Costs here DO NOT include any call cached tasks.


It would be useful to at least annotate which tasks were call cached.

created issue ticket
#250

lbergelson · 2023-03-30T18:21:25Z

src/cromshell/cost/command.py

+def main(config, workflow_ids: str or int, detailed: bool, color: bool):
+    """
+    Get the cost for a workflow.
+    Only works for workflows that completed more than 24 hours ago on GCS.


lbergelson · 2023-03-30T18:21:46Z

src/cromshell/cost/command.py

+    """
+    Get the cost for a workflow.
+    Only works for workflows that completed more than 24 hours ago on GCS.
+    Requires the 'bq_cost_table' key in the cromshell configuration file to be


this is listed as bq_cost_table but gcp_bq_cost_table in the readme

bshifaw · 2023-03-30T18:49:40Z

README.md

+   #### Get cost for a workflow
+   * `cost [-c] [-d] [workflow-id] [[workflow-id]...]`
+     * Get the cost for a workflow.
+     * Only works for workflows that completed more than 8 hours ago on GCS.


bshifaw self-assigned this Jan 24, 2023

lbergelson requested changes Jan 24, 2023

View reviewed changes

bshifaw mentioned this pull request Feb 8, 2023

Create a convention for how we report errors. #243

Open

bshifaw added 9 commits February 10, 2023 16:38

Add cost draft

890adcc

Add cost draft

0a38dac

refactored checking of workflow id in submisson file

66e2226

parameterized sql query to help avoid sql-injection

137663e

moved function to check key in cromshell config to config_options_fil…

de5cda4

…e_utils.py. Specified variable types in cromshellconfig

Added better handling for when start time is not present, bq package …

a0d4460

…requirement

Removed Tempfiles in favor of lists/dictionaries

6f73cb5

Separated printing from metadata command function that is used to for…

c085a4c

…mat metadata keys and retrieve workflow metadata. To make function more reusable because other commands would only want the metadata and not print it.

Add option to color cost outliers

02a0163

bshifaw force-pushed the bs_cost_command branch from 8bfd7e4 to 02a0163 Compare February 10, 2023 21:45

Linting fixes

44a2462

bshifaw added the Cromshell 2 Issues related to Cromshell 2.0 label Feb 15, 2023

bshifaw added 8 commits February 15, 2023 10:56

Linting fixes for typing in metadata

392500e

Linting fixes for typing in cost and cromshellconfig

38d938f

Refractor cost functions

a1f6477

Moved workflow statuses to workflow_status_utils.py, added total cost…

d562dae

…, removed description and workflowid columns from detailed report

Renamed printed table headers, Added Total cost, fix workflow_status_…

b1d05cc

…utils.py enum call

consideration for workflow recently submitted with no start or end ti…

482def3

…me, first round of unit tests added, additional versions of helloworld metadata jsons added.

Adding additional cost unit tests, better error handling for some cos…

683a29e

…t functions

linting fixes

cbd854a

bshifaw marked this pull request as ready for review March 2, 2023 19:11

bshifaw added 2 commits March 10, 2023 09:03

Added tests, made cost command be able to handle more than one workfl…

fffeed5

…ow id

lint fix

ecd557a

bshifaw commented Mar 10, 2023

View reviewed changes

bshifaw added 3 commits March 17, 2023 09:59

workflow id is printed when more than one workflow is given, also for…

f739002

… cost detailed made sure 2 decimal points are printed. updated unit tests

Added Cost details to README.md

1e3d8ea

added turtles

bd0ee9d

lbergelson requested changes Mar 30, 2023

View reviewed changes

README.md fixes regarding cost details

f53ea6d

bshifaw commented Mar 30, 2023

View reviewed changes

Merge branch 'cromshell_2.0' into bs_cost_command

fd34b8e

bshifaw merged commit 3a323e4 into cromshell_2.0 Mar 30, 2023

bshifaw deleted the bs_cost_command branch March 30, 2023 19:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cost command #240

Add cost command #240

bshifaw commented Jan 24, 2023

lbergelson left a comment

lbergelson Jan 24, 2023

bshifaw Feb 8, 2023

lbergelson Jan 24, 2023

bshifaw Feb 8, 2023

lbergelson Jan 24, 2023

bshifaw Feb 8, 2023

lbergelson Jan 24, 2023

bshifaw Feb 8, 2023

lbergelson Jan 24, 2023

bshifaw Feb 8, 2023

lbergelson Feb 10, 2023

lbergelson Jan 24, 2023

bshifaw Feb 8, 2023

lbergelson Jan 24, 2023

bshifaw Feb 8, 2023

lbergelson Jan 24, 2023

bshifaw Feb 8, 2023

lbergelson Jan 24, 2023

bshifaw Feb 8, 2023

lbergelson Jan 24, 2023

bshifaw Feb 10, 2023

bshifaw Mar 10, 2023

lbergelson left a comment

lbergelson Mar 30, 2023

lbergelson Mar 30, 2023

bshifaw Mar 30, 2023

lbergelson Mar 30, 2023

lbergelson Mar 30, 2023

bshifaw Mar 30, 2023

		@@ -64,32 +64,6 @@ def override_requests_cert_parameters(skip_certs: bool) -> None:
		)


		class WorkflowStatuses(Enum):

Add cost command #240

Add cost command #240

Conversation

bshifaw commented Jan 24, 2023

lbergelson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lbergelson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment