Skip to content

Commit

Permalink
Init action for installing ops agent (#1124)
Browse files Browse the repository at this point in the history
  • Loading branch information
vamshikrishna-g authored Jan 12, 2024
1 parent 38a0843 commit 9907743
Show file tree
Hide file tree
Showing 3 changed files with 113 additions and 0 deletions.
40 changes: 40 additions & 0 deletions opsagent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Ops Agent

With [Dataproc 2.2 image version](https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.2), we recommend installing [Google Cloud Ops Agent](https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent) to obtain system metrics.

This initialization action will install the Ops Agent on a [Google Cloud Dataproc](https://cloud.google.com/dataproc) cluster and provide similar metrics as the [`--metric-sources=monitoring-agent-defaults`](https://cloud.google.com/dataproc/docs/guides/dataproc-metrics#monitoring_agent_metrics) setting which was supported until Dataproc 2.1.
[This page](https://cloud.google.com/monitoring/api/metrics_agent#oagent-vs-magent) highlights differences in metric collection between the Ops Agent and the legacy monitoring agent.

We provide two variants of this initialization action:
- `opsagent.sh` installs the Ops Agent. [By default](https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/configuration#default), it collects syslogs and system (node) metrics.
- `opsagent_nosyslog.sh` installs the Ops Agent and also specifies a user configuration in order to skip syslogs collection from your cluster nodes. If the user configuration is not specified, Ops Agent will collect syslogs besides the system (node) metrics. You can further customize this configuration to collect logs and metrics from other third-party applications.

If you are looking to match the behavior of Dataproc image versions up to 2.1 with `--metric-sources=monitoring-agent-defaults`, which did not ingest syslogs from Dataproc cluster nodes, please use `opsagent_nosyslog.sh`.

## Using this initialization action

**:warning: NOTICE:** See
[best practices](/README.md#how-initialization-actions-are-used) of using
initialization actions in production.

## Install the Ops Agent collecting system metrics only (no syslogs)

```bash
REGION=<region>
CLUSTER_NAME=<cluster_name>
gcloud dataproc clusters create ${CLUSTER_NAME} \
--image-version=2.2 \
--region=${REGION} \
--initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/opsagent/opsagent_nosyslog.sh
```

## Install the Ops Agent with default configuration

```bash
REGION=<region>
CLUSTER_NAME=<cluster_name>
gcloud dataproc clusters create ${CLUSTER_NAME} \
--image-version=2.2 \
--region=${REGION} \
--initialization-actions=gs://goog-dataproc-initialization-actions-${REGION}/opsagent/opsagent.sh
```
30 changes: 30 additions & 0 deletions opsagent/opsagent.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS-IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This script installs the Google Cloud Ops Agent on each node in the cluster.
# See https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/configuration#default
# for built-in configuration of Ops Agent.

# Detect dataproc image version from its various names
if (! test -v DATAPROC_IMAGE_VERSION) && test -v DATAPROC_VERSION; then
DATAPROC_IMAGE_VERSION="${DATAPROC_VERSION}"
fi

if [[ $(echo "${DATAPROC_IMAGE_VERSION} < 2.2" | bc -l) == 1 ]]; then
echo "This Dataproc cluster node runs image version ${DATAPROC_IMAGE_VERSION} with pre-installed legacy monitoring agent. Skipping Ops Agent installation."
exit 0
fi

curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
bash add-google-cloud-ops-agent-repo.sh --also-install
43 changes: 43 additions & 0 deletions opsagent/opsagent_nosyslog.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/bin/bash
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS-IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# This script installs the Google Cloud Ops Agent on each node in the cluster.
# It also provides an override to the built-in logging config to set empty
# receivers i.e. not collect any logs.
# If you need to collect syslogs, you can use the other script in this directory,
# opsagent.sh which uses the built-in configuration of Ops Agent.
# See https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent/configuration#default.

# Detect dataproc image version from its various names
if (! test -v DATAPROC_IMAGE_VERSION) && test -v DATAPROC_VERSION; then
DATAPROC_IMAGE_VERSION="${DATAPROC_VERSION}"
fi

if [[ $(echo "${DATAPROC_IMAGE_VERSION} < 2.2" | bc -l) == 1 ]]; then
echo "This Dataproc cluster node runs image version ${DATAPROC_IMAGE_VERSION} with pre-installed legacy monitoring agent. Skipping Ops Agent installation."
exit 0
fi

curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
bash add-google-cloud-ops-agent-repo.sh --also-install

cat <<EOF >> /etc/google-cloud-ops-agent/config.yaml
logging:
service:
pipelines:
default_pipeline:
receivers: []
EOF

systemctl restart google-cloud-ops-agent

0 comments on commit 9907743

Please sign in to comment.