Skip to content

Commit

Permalink
Reduced UN prod cluster resources (#4508)
Browse files Browse the repository at this point in the history
Metrics links:
- [Requests per
second](https://console.cloud.google.com/monitoring/metrics-explorer;duration=P14D?pageState=%7B%22xyChart%22:%7B%22constantLines%22:%5B%5D,%22dataSets%22:%5B%7B%22plotType%22:%22LINE%22,%22targetAxis%22:%22Y1%22,%22timeSeriesFilter%22:%7B%22aggregations%22:%5B%7B%22crossSeriesReducer%22:%22REDUCE_SUM%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22perSeriesAligner%22:%22ALIGN_RATE%22%7D%5D,%22apiSource%22:%22DEFAULT_CLOUD%22,%22crossSeriesReducer%22:%22REDUCE_SUM%22,%22filter%22:%22metric.type%3D%5C%22loadbalancing.googleapis.com%2Fhttps%2Frequest_count%5C%22%20resource.type%3D%5C%22https_lb_rule%5C%22%20resource.label.%5C%22project_id%5C%22%3D%5C%22datcom-recon-autopush%5C%22%20resource.label.%5C%22url_map_name%5C%22%3D%5C%22k8s2-um-35faua5t-website-website-ingress-2uz3r82p%5C%22%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22minAlignmentPeriod%22:%2260s%22,%22perSeriesAligner%22:%22ALIGN_RATE%22%7D%7D%5D,%22options%22:%7B%22mode%22:%22COLOR%22%7D,%22y1Axis%22:%7B%22label%22:%22%22,%22scale%22:%22LINEAR%22%7D%7D%7D&project=datcom-recon-autopush&e=13803378&hl=en&inv=1&invt=AbXICg&mods=-monitoring_api_staging)
-
[Latency](https://console.cloud.google.com/monitoring/metrics-explorer;startTime=2024-07-09T04:02:03Z;endTime=2024-07-16T04:02:03Z?pageState=%7B%22xyChart%22:%7B%22constantLines%22:%5B%5D,%22dataSets%22:%5B%7B%22legendTemplate%22:%22$%7Bresource.labels.url_map_name%7D:%2099th%20Percentile%22,%22plotType%22:%22LINE%22,%22targetAxis%22:%22Y1%22,%22timeSeriesFilter%22:%7B%22aggregations%22:%5B%7B%22crossSeriesReducer%22:%22REDUCE_PERCENTILE_99%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%5D,%22apiSource%22:%22DEFAULT_CLOUD%22,%22crossSeriesReducer%22:%22REDUCE_PERCENTILE_99%22,%22filter%22:%22metric.type%3D%5C%22loadbalancing.googleapis.com%2Fhttps%2Ftotal_latencies%5C%22%20resource.type%3D%5C%22https_lb_rule%5C%22%20resource.label.%5C%22project_id%5C%22%3D%5C%22datcom-recon-autopush%5C%22%20resource.label.%5C%22url_map_name%5C%22%3D%5C%22k8s2-um-35faua5t-website-website-ingress-2uz3r82p%5C%22%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22minAlignmentPeriod%22:%2260s%22,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%7D,%7B%22legendTemplate%22:%22$%7Bresource.labels.url_map_name%7D:%2095th%20Percentile%22,%22plotType%22:%22LINE%22,%22targetAxis%22:%22Y1%22,%22timeSeriesFilter%22:%7B%22aggregations%22:%5B%7B%22crossSeriesReducer%22:%22REDUCE_PERCENTILE_95%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%5D,%22apiSource%22:%22DEFAULT_CLOUD%22,%22crossSeriesReducer%22:%22REDUCE_PERCENTILE_95%22,%22filter%22:%22metric.type%3D%5C%22loadbalancing.googleapis.com%2Fhttps%2Ftotal_latencies%5C%22%20resource.type%3D%5C%22https_lb_rule%5C%22%20resource.label.%5C%22project_id%5C%22%3D%5C%22datcom-recon-autopush%5C%22%20resource.label.%5C%22url_map_name%5C%22%3D%5C%22k8s2-um-35faua5t-website-website-ingress-2uz3r82p%5C%22%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22minAlignmentPeriod%22:%2260s%22,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%7D,%7B%22legendTemplate%22:%22$%7Bresource.labels.url_map_name%7D:%2050th%20Percentile%22,%22plotType%22:%22LINE%22,%22targetAxis%22:%22Y1%22,%22timeSeriesFilter%22:%7B%22aggregations%22:%5B%7B%22crossSeriesReducer%22:%22REDUCE_PERCENTILE_50%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%5D,%22apiSource%22:%22DEFAULT_CLOUD%22,%22crossSeriesReducer%22:%22REDUCE_PERCENTILE_50%22,%22filter%22:%22metric.type%3D%5C%22loadbalancing.googleapis.com%2Fhttps%2Ftotal_latencies%5C%22%20resource.type%3D%5C%22https_lb_rule%5C%22%20resource.label.%5C%22project_id%5C%22%3D%5C%22datcom-recon-autopush%5C%22%20resource.label.%5C%22url_map_name%5C%22%3D%5C%22k8s2-um-35faua5t-website-website-ingress-2uz3r82p%5C%22%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22minAlignmentPeriod%22:%2260s%22,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%7D,%7B%22legendTemplate%22:%22$%7Bresource.labels.url_map_name%7D:%205th%20Percentile%22,%22plotType%22:%22LINE%22,%22targetAxis%22:%22Y1%22,%22timeSeriesFilter%22:%7B%22aggregations%22:%5B%7B%22crossSeriesReducer%22:%22REDUCE_PERCENTILE_05%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%5D,%22apiSource%22:%22DEFAULT_CLOUD%22,%22crossSeriesReducer%22:%22REDUCE_PERCENTILE_05%22,%22filter%22:%22metric.type%3D%5C%22loadbalancing.googleapis.com%2Fhttps%2Ftotal_latencies%5C%22%20resource.type%3D%5C%22https_lb_rule%5C%22%20resource.label.%5C%22project_id%5C%22%3D%5C%22datcom-recon-autopush%5C%22%20resource.label.%5C%22url_map_name%5C%22%3D%5C%22k8s2-um-35faua5t-website-website-ingress-2uz3r82p%5C%22%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22minAlignmentPeriod%22:%2260s%22,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%7D,%7B%22legendTemplate%22:%22$%7Bresource.labels.url_map_name%7D:%20Mean%22,%22plotType%22:%22LINE%22,%22targetAxis%22:%22Y1%22,%22timeSeriesFilter%22:%7B%22aggregations%22:%5B%7B%22crossSeriesReducer%22:%22REDUCE_MEAN%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%5D,%22apiSource%22:%22DEFAULT_CLOUD%22,%22crossSeriesReducer%22:%22REDUCE_MEAN%22,%22filter%22:%22metric.type%3D%5C%22loadbalancing.googleapis.com%2Fhttps%2Ftotal_latencies%5C%22%20resource.type%3D%5C%22https_lb_rule%5C%22%20resource.label.%5C%22project_id%5C%22%3D%5C%22datcom-recon-autopush%5C%22%20resource.label.%5C%22url_map_name%5C%22%3D%5C%22k8s2-um-35faua5t-website-website-ingress-2uz3r82p%5C%22%22,%22groupByFields%22:%5B%22resource.label.%5C%22url_map_name%5C%22%22%5D,%22minAlignmentPeriod%22:%2260s%22,%22perSeriesAligner%22:%22ALIGN_SUM%22%7D%7D%5D,%22options%22:%7B%22mode%22:%22COLOR%22%7D,%22y1Axis%22:%7B%22label%22:%22%22,%22scale%22:%22LINEAR%22%7D%7D%7D&project=datcom-recon-autopush&e=13803378&hl=en&inv=1&invt=AbXICg&mods=-monitoring_api_staging)
- Resource usage for
[dc-mixer-observation](https://console.cloud.google.com/kubernetes/service/us-central1-a/datacommons-us-central1-a/website/dc-mixer-observation/overview?e=13803378&inv=1&invt=AbXICw&mods=-monitoring_api_staging&project=datcom-recon-autopush)
- Resource usage for
[dc-mixer-svg](https://console.cloud.google.com/kubernetes/service/us-central1-a/datacommons-us-central1-a/website/dc-mixer-svg/overview?e=13803378&inv=1&invt=AbXICw&mods=-monitoring_api_staging&project=datcom-recon-autopush)
- Resource usage for
[website-mixer-service](https://console.cloud.google.com/kubernetes/service/us-central1-a/datacommons-us-central1-a/website/website-mixer-service/overview?e=13803378&inv=1&invt=AbXICw&mods=-monitoring_api_staging&project=datcom-recon-autopush)
- Resource usage for
[dc-mixer-node](https://console.cloud.google.com/kubernetes/service/us-central1-a/datacommons-us-central1-a/website/dc-mixer-node/overview?e=13803378&inv=1&invt=AbXICw&mods=-monitoring_api_staging&project=datcom-recon-autopush)
- Resource usage for
[dc-mixer-default](https://console.cloud.google.com/kubernetes/service/us-central1-a/datacommons-us-central1-a/website/dc-mixer-default/overview?e=13803378&inv=1&invt=AbXICw&mods=-monitoring_api_staging&project=datcom-recon-autopush)
- Resource usage for
[website-service](https://console.cloud.google.com/kubernetes/service/us-central1-a/datacommons-us-central1-a/website/website-service/overview?e=13803378&inv=1&invt=AbXICw&mods=-monitoring_api_staging&project=datcom-recon-autopush)
- Monitoring dashboard: [GKE Compute Resources Monitoring view:
dc-mixer-observation](https://pantheon.corp.google.com/monitoring/dashboards/integration/gke.gke-compute-resources-workload-view;filters=var:project_id%2Bvar:location%2Bvar:cluster_name%2Bvar:namespace_name,val:website%2Bvar:top_level_controller_name,val:dc-mixer-observation;duration=P1D?e=13803378&hl=en&inv=1&invt=AbXICg&mods=-monitoring_api_staging&project=datcom-recon-autopush&pageState=(%22events%22:(%22active%22:%5B%22CLOUD_ALERTING_ALERT%22,%22GKE_WORKLOAD_DEPLOYMENT%22%5D,%22inactive%22:%5B%5D)))

Changes
- Website: Drop from 50 nodes to 10 nodes (5gb/node) -> 50gb
- Mixer svg service: Drop from 5 nodes to 3 nodes (8gb/node) -> 24 gb 
- Mixer node service: Drop from 30 nodes to 3 nodes (3gb/node ) -> 9gb
- Mixer observation service: Drop from 30 nodes to 3 nodes (2gb/node) ->
6gb
- Default node: Drop from 20 nodes to 3 nodes (8gb/ node memory
requirement) 24gb
- Replace 18x e2-highmem-8 nodes with another 3x e2-highmem-4 nodes
- Old cluster total memory: 672gb
- Old cluster estimated memory allocation: 600gb
- New cluster total memory: 192gb
- New cluster estimated memory allocation: 113gb
  • Loading branch information
dwnoble authored Sep 4, 2024
1 parent 0207c1d commit 6895e89
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions deploy/helm_charts/envs/unsdg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ namespace:

website:
flaskEnv: unsdg
replicas: 50
replicas: 10
redis:
enabled: true
configFile: |
Expand Down Expand Up @@ -56,13 +56,13 @@ nl:
serviceGroups:
recon: null
svg:
replicas: 5
replicas: 3
node:
replicas: 30
replicas: 3
observation:
replicas: 30
replicas: 3
default:
replicas: 20
replicas: 3
cacheSVG: true # For REST api support
resources:
memoryRequest: "8G"
Expand Down

0 comments on commit 6895e89

Please sign in to comment.