Skip to content

Commit

Permalink
Merge pull request #45 from newrelic-experimental/feat/workspace-scop…
Browse files Browse the repository at this point in the history
…ed-usage

feat: redesign consumption collection to remove account API dependencies
  • Loading branch information
sdewitt-newrelic authored Dec 2, 2024
2 parents 3bb7a62 + e4d8b97 commit 49dde0d
Show file tree
Hide file tree
Showing 9 changed files with 222 additions and 461 deletions.
75 changes: 2 additions & 73 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -430,56 +430,6 @@ parameter is unused in that scenario.
The workspace host can also be specified using the `DATABRICKS_HOST`
environment variable.

**NOTE:** The `DATABRICKS_HOST` environment variable can not be used to specify
_both_ the [instance name](https://docs.databricks.com/en/workspace/workspace-details.html#workspace-instance-names-urls-and-ids)
and the accounts API endpoint. To account for this, the environment variables
`DATABRICKS_WORKSPACEHOST` and `DATABRICKS_ACCOUNTHOST` environment variables
can be alternately used either separately or in combination with the
`DATABRICKS_HOST` environment variable to specify the
[instance name](https://docs.databricks.com/en/workspace/workspace-details.html#workspace-instance-names-urls-and-ids)
and the accounts API endpoint, respectively.

###### `accountHost`

| Description | Valid Values | Required | Default |
| --- | --- | --- | --- |
| Databricks accounts API endpoint | string | conditional | N/a |

This parameter specifies the accounts API endpoint. This is
used by the integration when constructing the URLs for account-level
[ReST API](https://docs.databricks.com/api/workspace/introduction) calls. Note
that unlike the value of [`workspaceHost`](#workspacehost), the value of this
parameter _must_ include the `https://` prefix, e.g.
`https://accounts.cloud.databricks.com`.

This parameter is required when the collection of Databricks [consumption and cost data](#consumption--cost-data)
data is [enabled](#databricks-usage-enabled).

The account host can also be specified using the `DATABRICKS_HOST`
environment variable.

**NOTE:** The `DATABRICKS_HOST` environment variable can not be used to specify
_both_ the [instance name](https://docs.databricks.com/en/workspace/workspace-details.html#workspace-instance-names-urls-and-ids)
and the accounts API endpoint. To account for this, the environment variables
`DATABRICKS_WORKSPACEHOST` and `DATABRICKS_ACCOUNTHOST` environment variables
can be alternately used either separately or in combination with the
`DATABRICKS_HOST` environment variable to specify the
[instance name](https://docs.databricks.com/en/workspace/workspace-details.html#workspace-instance-names-urls-and-ids)
and the accounts API endpoint, respectively.

###### `accountId`

| Description | Valid Values | Required | Default |
| --- | --- | --- | --- |
| Databricks account ID for the accounts API | string | conditional | N/a |

This parameter specifies the Databricks account ID. This is used by the
integration when constructing the URLs for account-level
[ReST API](https://docs.databricks.com/api/workspace/introduction) calls.

This parameter is required when the collection of Databricks [consumption and cost data](#consumption--cost-data)
data is [enabled](#databricks-usage-enabled).

###### `accessToken`

| Description | Valid Values | Required | Default |
Expand All @@ -496,12 +446,6 @@ field in a Databricks [configuration profile](https://docs.databricks.com/en/dev

See the [authentication section](#authentication) for more details.

**NOTE:** Databricks personal access tokens can only be used to collect data
at the workspace level. They can not be used to collect account-level data using
the account-level [ReST APIs](https://docs.databricks.com/api/workspace/introduction).
To collect account level data such as [consumption and cost data](#consumption--cost-data),
OAuth authentication must be used instead.

###### `oauthClientId`

| Description | Valid Values | Required | Default |
Expand Down Expand Up @@ -966,12 +910,6 @@ For convenience purposes, the following parameters can be used in the
SDK to explicitly use [Databricks personal access token authentication](https://docs.databricks.com/en/dev-tools/auth/pat.html).
The SDK will _not_ attempt to try other authentication mechanisms and instead
will fail immediately if personal access token authentication fails.

**NOTE:** Databricks personal access tokens can only be used to collect data
at the workspace level. They can not be used to collect account-level data
using the account-level [ReST APIs](https://docs.databricks.com/api/workspace/introduction).
To collect account level data such as [consumption and cost data](#consumption--cost-data),
OAuth authentication must be used instead.
- [`oauthClientId`](#oauthclientid) - When set, the integration will instruct
the SDK to explicitly [use a service principal to authenticate with Databricks (OAuth M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html).
The SDK will _not_ attempt to try other authentication mechanisms and instead
Expand Down Expand Up @@ -1021,16 +959,6 @@ collect Job cost data.
In order for the New Relic Databricks integration to collect consumption and
cost related data from Databricks, there are several requirements.

1. OAuth [authentication](#authentication) must be used. This is required even
when the integration is
[deployed on the driver node of a Databricks cluster](#deploy-the-integration-on-the-driver-node-of-a-databricks-cluster)
using the provided [init script](./init/cluster_init_integration.sh) because
the integration leverages account-level [ReST APIs](https://docs.databricks.com/api/workspace/introduction)
when collecting consumption and cost related data. These APIs can only be
accessed when OAuth [authentication](#authentication) is used.
1. An [account ID](#accountid) and [account host](#accounthost) must be provided
since the account-level [ReST APIs](https://docs.databricks.com/api/workspace/introduction)
require them.
1. The SQL [warehouse ID](#warehouseid) of a [SQL warehouse](https://docs.databricks.com/en/compute/sql-warehouse/index.html)
within the workspace associated with the configured [workspace host](#workspacehost)
must be specified. The Databricks SQL queries used to collect consumption and
Expand Down Expand Up @@ -1065,7 +993,8 @@ compute.
|---|---|
| `account_id` | ID of the account this usage record was generated for |
| `workspace_id` | ID of the Workspace this usage record was associated with |
| `workspace_name` | Name of the Workspace this usage record was associated with |
| `workspace_url` | URL of the Workspace this usage record was associated with |
| `workspace_instance_name` | [Instance name](https://docs.databricks.com/en/workspace/workspace-details.html#workspace-instance-names-urls-and-ids) of the Workspace this usage record was associated with |
| `record_id` | Unique ID for this usage record |
| `sku_name` | Name of the SKU associated with this usage record |
| `cloud` | Name of the Cloud this usage record is relevant for |
Expand Down
2 changes: 0 additions & 2 deletions configs/config.template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,6 @@ log:
fileName: trace.log
databricks:
workspaceHost: [YOUR_DATABRICKS_WORKSPACE_INSTANCE_NAME]
accountHost: [YOUR_DATABRICKS_ACCOUNTS_CONSOLE_HOST_NAME]
accountId: [YOUR_DATABRICKS_ACCOUNT_ID]
accessToken: [YOUR_DATABRICKS_PERSONAL_ACCESS_TOKEN]
oauthClientId: [YOUR_DATABRICKS_SERVICE_PRINCIPAL_OAUTH_CLIENT_ID]
oauthClientSecret: [YOUR_DATABRICKS_SERVICE_PRINCIPAL_OAUTH_CLIENT_SECRET]
Expand Down
Binary file modified examples/consumption-cost-dashboard-job-cost.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 9 additions & 9 deletions examples/consumption-cost-dashboard.json
Original file line number Diff line number Diff line change
Expand Up @@ -534,7 +534,7 @@
"accountIds": [
0
],
"query": "SELECT sum(list_cost)\nWITH usage_quantity * list_price AS list_cost\nFROM DatabricksUsage\nJOIN (\n SELECT list_price, sku_name AS price_sku_name\n FROM lookup(DatabricksListPrices)\n) ON sku_name = price_sku_name\nWHERE sku_name LIKE '%JOBS%'\nSINCE 3 months ago\nCOMPARE WITH 6 months ago\nFACET workspace_name\nLIMIT 100\nTIMESERIES 1 week"
"query": "SELECT sum(list_cost)\nWITH usage_quantity * list_price AS list_cost\nFROM DatabricksUsage\nJOIN (\n SELECT list_price, sku_name AS price_sku_name\n FROM lookup(DatabricksListPrices)\n) ON sku_name = price_sku_name\nWHERE sku_name LIKE '%JOBS%'\nSINCE 3 months ago\nCOMPARE WITH 6 months ago\nFACET workspace_instance_name\nLIMIT 100\nTIMESERIES 1 week"
}
],
"platformOptions": {
Expand Down Expand Up @@ -575,7 +575,7 @@
"accountIds": [
0
],
"query": "SELECT sum(list_cost)\nWITH usage_quantity * list_price AS list_cost\nFROM DatabricksUsage\nJOIN (\n SELECT list_price, sku_name AS price_sku_name\n FROM lookup(DatabricksListPrices)\n) ON sku_name = price_sku_name\nWHERE sku_name LIKE '%JOBS%' AND job_id IS NOT NULL\nSINCE 31 days ago UNTIL today\nFACET job_name, workspace_name, run_as\nLIMIT 20"
"query": "SELECT sum(list_cost)\nWITH usage_quantity * list_price AS list_cost\nFROM DatabricksUsage\nJOIN (\n SELECT list_price, sku_name AS price_sku_name\n FROM lookup(DatabricksListPrices)\n) ON sku_name = price_sku_name\nWHERE sku_name LIKE '%JOBS%' AND job_id IS NOT NULL\nSINCE 31 days ago UNTIL today\nFACET job_name, workspace_instance_name, run_as\nLIMIT 20"
}
],
"platformOptions": {
Expand Down Expand Up @@ -607,7 +607,7 @@
"accountIds": [
0
],
"query": "SELECT sum(list_cost)\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_list_cost_per_job'\nSINCE 8 days ago UNTIL today\nCOMPARE with 15 days ago\nFACET job_name, workspace_name\nLIMIT 20\nTIMESERIES 1 day"
"query": "SELECT sum(list_cost)\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_list_cost_per_job'\nSINCE 8 days ago UNTIL today\nCOMPARE with 15 days ago\nFACET job_name, workspace_instance_name\nLIMIT 20\nTIMESERIES 1 day"
}
],
"platformOptions": {
Expand Down Expand Up @@ -662,7 +662,7 @@
"accountIds": [
0
],
"query": "SELECT sum(list_cost) as 'Last 7 Day Cost',\n latest(list_cost_14_day) as 'Last 14 day cost',\n sum(list_cost) - latest(list_cost_14_day) as 'Last 7 Day Growth'\nFROM DatabricksJobCost\nJOIN (\n SELECT sum(list_cost) as list_cost_14_day\n FROM DatabricksJobCost\n WHERE query_id = 'jobs_cost_list_cost_per_job'\n SINCE 15 days ago until 8 days ago\n FACET workspace_id, job_id, job_name, sku_name, run_as\n LIMIT 100\n) ON job_id\nWHERE query_id = 'jobs_cost_list_cost_per_job'\nSINCE 8 days ago UNTIL today\nFACET workspace_id, job_id, job_name, sku_name, run_as\nLIMIT 100"
"query": "SELECT sum(list_cost) as 'Last 7 Day Cost',\n latest(list_cost_14_day) as 'Last 14 day cost',\n sum(list_cost) - latest(list_cost_14_day) as 'Last 7 Day Growth'\nFROM DatabricksJobCost\nJOIN (\n SELECT sum(list_cost) as list_cost_14_day\n FROM DatabricksJobCost\n WHERE query_id = 'jobs_cost_list_cost_per_job'\n SINCE 15 days ago until 8 days ago\n FACET workspace_id, job_id, job_name, run_as\n LIMIT 100\n) ON job_id\nWHERE query_id = 'jobs_cost_list_cost_per_job'\nSINCE 8 days ago UNTIL today\nFACET workspace_id, job_id, job_name, run_as\nLIMIT 100"
}
],
"platformOptions": {
Expand Down Expand Up @@ -702,7 +702,7 @@
"accountIds": [
0
],
"query": "SELECT sum(list_cost), latest(runs), latest(last_seen_date)\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_list_cost_per_job'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_name, job_id, job_name, run_as\nLIMIT 100"
"query": "SELECT sum(list_cost), latest(runs), latest(last_seen_date)\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_list_cost_per_job'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_instance_name, job_id, job_name, run_as\nLIMIT 100"
}
],
"platformOptions": {
Expand Down Expand Up @@ -738,7 +738,7 @@
"accountIds": [
0
],
"query": "SELECT sum(list_cost), latest(last_seen_date)\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_list_cost_per_job_run'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_name, job_id, job_name, run_id, run_as, last_seen_date\nLIMIT 100\n"
"query": "SELECT sum(list_cost), latest(last_seen_date)\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_list_cost_per_job_run'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_instance_name, job_id, job_name, run_id, run_as\nLIMIT 100\n"
}
],
"platformOptions": {
Expand Down Expand Up @@ -782,7 +782,7 @@
"accountIds": [
0
],
"query": "SELECT sum(runs), sum(failures), sum(list_cost) AS 'Failure Cost', max(last_seen_date) AS 'Last Seen Date'\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_frequent_failures'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_name, job_id, job_name, run_as\nLIMIT 100"
"query": "SELECT sum(runs), sum(failures), sum(list_cost) AS 'Failure Cost', max(last_seen_date) AS 'Last Seen Date'\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_frequent_failures'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_instance_name, job_id, job_name, run_as\nLIMIT 100"
}
],
"platformOptions": {
Expand Down Expand Up @@ -811,7 +811,7 @@
"accountIds": [
0
],
"query": "SELECT latest(run_as), sum(repairs), sum(repair_time_seconds), sum(list_cost) AS 'Repair Cost'\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_most_retries'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_name, job_id, job_name, run_id\nLIMIT 100"
"query": "SELECT latest(run_as), sum(repairs), sum(repair_time_seconds), sum(list_cost) AS 'Repair Cost'\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_most_retries'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_instance_name, job_id, job_name, run_id\nLIMIT 100"
}
],
"platformOptions": {
Expand Down Expand Up @@ -872,7 +872,7 @@
"accountIds": [
0
],
"query": "SELECT\n count(run_id) as 'Runs',\n sum(list_cost),\n average(list_cost),\n max(list_cost),\n percentile(list_cost, 90),\n max(list_cost) - percentile(list_cost, 90) AS 'List Cost Deviation'\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_list_cost_per_job_run'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_name, job_id, job_name\nLIMIT 100 "
"query": "SELECT\n count(run_id) as 'Runs',\n sum(list_cost),\n average(list_cost),\n max(list_cost),\n percentile(list_cost, 90),\n max(list_cost) - percentile(list_cost, 90) AS 'List Cost Deviation'\nFROM DatabricksJobCost\nWHERE query_id = 'jobs_cost_list_cost_per_job_run'\nSINCE 31 days ago UNTIL today\nFACET workspace_id, workspace_instance_name, job_id, job_name\nLIMIT 100 "
}
],
"platformOptions": {
Expand Down
Loading

0 comments on commit 49dde0d

Please sign in to comment.