Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

v5.0.0

Compare
Choose a tag to compare
@jchengsfx jchengsfx released this 28 Feb 16:57

Major and Breaking Changes

Most of the changes that were released with 5.0.0 are breaking, since we have
been releasing non-breaking changes continuously into the 4.x line of releases.

Note that the 5.0.0 release is the first release for which the DEB and RPM
packages are (only available in the new Splunk repository) (see
https://github.com/signalfx/signalfx-agent/blob/master/docs/deb-rpm-repo-migration.md).
If you are using the Docker image, Windows release, or tar.gz bundle, nothing
has changed.

Kubernetes Node Dimension

On Kubernetes, the global dimension kubernetes_node (which was the name of
the node on which the agent is running) has been replaced with
kubernetes_node_uid to deal with situations where node names are reused.

This will result in all time series emitted by the agent being recreated in our
backend, which might result in time series creation throttling. The best way
to mitigate this is to roll out the upgrade node-by-node slowly by using the
Daemonset’s rollingUpdate strategy with a maxUnavailable of 1 and a
minReadySeconds of 60 or more (see Perform a Rolling Update on a
DaemonSet
).
Our Helm chart fully supports these options, but they must be set in your
custom values when deploying. Make sure you are using the latest version of
the Helm chart.

Note that this also means that the agent requires connectivity to the K8s API
server on startup in order to get the Node UID (it cannot be gotten from the
node itself).

Monitor removals

The following monitors were removed and have had newer equivalents in the agent
for quite some time now:

  • collectd/docker (replaced by docker-container-stats, which is much more performant)
  • collectd/haproxy (replaced by haproxy)

If you need to migrate from these, you can enable both the old and new monitor
types in your existing 4.x agent, do the detector/chart migration, remove the
old monitors types, and then do the upgrade to 5.x.

Monitor-level metricsToExclude

The monitor-level metricsToExclude config option has been removed. Please
migrate any of these filters to the datapointsToExclude config option. The
filter behavior is a bit different (see
filtering
and legacy
filtering
,
but will be the same for simple filters that just blacklist certain values.

Legacy Whitelist Filtering Removal

The old whitelist.json file has been removed from the agent, so any references
to it in your agent config will cause the agent to fail at startup. You should
remove the reference to it and set enableBuiltInFiltering: true (or just omit
the config and it will default to true). If you had not made any
modifications to the whitelist.json file, then everything should send as it was
before. If you have made changes to that file, or if you have config for the
top-level metricsToInclude (also deprecated), then you will need to migrate
that to the new, more powerful, filtering
mechanism
.

Misc Monitor Changes

Timeout Config

The following monitors had an option like timeoutSeconds: 5 before which is
now httpTimeout: 5 or httpTimeout: 5s (note the s). The durations can be
explicitly specified as part of the value using Go's ParseDuration
syntax
. If the duration is left
off it will be interpreted as seconds:

  • conviva
  • ecs-metadata
  • logstash

conviva monitor

  • The pulseUsername option is now username
  • The pulsePassword option is now password

haproxy monitor

The sslVerify option (default true) is now skipVerify and defaults to
false. The defaults are equivalent but if you had the option set the value
is now inverse.

docker-container-stats monitor

The metric memory.usage.total no longer accounts for the buffer cache memory
used by the container. The buffer cache has always been available in the
memory.stats.total_cache metric also emitted by this monitor (though not
enabled by default).

kubernetes-cluster monitor

Removed the useNodeName config option. We previously used the node name
dimension to sync properties, however we now use a unique kubernetes_node_uid
making this config option obsolete.

kubernetes-events monitor

If an event was emitted by a Node, we will now attach kubernetes_node and
kubernetes_node_uid on the event. Previously, we used kubernetes_name and
kubernetes_uid.

Host Infrastructure Monitor Changes

With the 5.0.0 release, all of the host infrastructure metrics in our default
configuration are now written in Go and are not dependent on collectd. We are
deprecating collectd within the agent in general, but it is still there in
5.0.0, along with all of the existing collectd/* host infrastructure
monitors. However, we recommend that you upgrade your configurations to use
the new host infrastructure monitors soon after upgrading to 5.0.0+ since
collectd will be going away at some point.

These are the monitors that have been reimplemented within the agent core:

  • collectd/df ---> filesystems (filesystem usage)
  • collectd/interface ---> net-io (network interface traffic)
  • collectd/vmem ---> vmem (virtual memory subsystem)
  • collectd/cpu ---> cpu (cpu usage overview)
  • collectd/memory ---> memory (high-level system memory)
  • collectd/load ---> load (Linux system load)
  • collectd/disk ---> disk-io (disk IO usage)

All of these new monitors have removed the plugin and dsname dimension from
all datapoints emitted, so any filtering in charts/detectors needs to remove
these as well.

collectd/df ---> filesystems Monitor (Filesystem Usage Stats)

The collectd/df monitor is being deprecated in favor of the filesystems
monitor. While the collectd/df monitor will still be available in 5.0, it is
recommended that you switch to the filesystems monitor soon after upgrading.
See the docs for the filesystems monitor for a guide on
migrating
.

collectd/interface ---> net-io Monitor (Network Interface Stats)

The old collectd/interface monitor is deprecated in favor of the new net-io
monitor. The old monitor will still be available in 5.0, but it is recommended
that you switch to the net-io monitor soon. This new monitor emits the same
metrics except that the plugin_instance dimension has been renamed to the
interface dimension. The values of the dimension should be the same.

Virtual Memory Stats

The collectd/vmem monitor has been deprecated in favor of the vmem monitor.
It is recommended that you switch to the new monitor soon. The only difference
is that the new monitor does not have the dsname and plugin dimensions.

Smooth Migration

To smoothly migrate detectors for the collectd/df, collectd/interface, and
collectd/disk monitors, you can temporarily enable both the old and new
monitor and send in both sets of metrics with the two different sets of
dimensions. This is only relevant for the monitors that identify specific
subcomponents by dimensions.

Python 3

The agent now ships with a Python 3.8 runtime for both collectd monitors and
custom scripts that use the python-monitor monitor. If you have custom
scripts, make sure they are Python 3 compatible. The included collectd Python
monitors have been upgraded to work properly with Python 3.

If you are using the python-monitor and do not want to make custom scripts
work with Python 3, you can use a custom Python binary by setting the
pythonBinary config option. If using the collectd/custom monitor, you have
no option but to make the scripts Python 3 compatible or convert them to the
python-monitor format.

Helm

If you have used our Helm chart to deploy a previous agent version to
Kubernetes and you are not providing the agentConfig value, then you will
automatically be switched to the new host infrastructure monitors.

In order to seamlessly transition to the new host infrastructure monitors, you
can follow the same process described above under "Smooth Migration". You will
need to use the new configureStandardMonitors config option and set it to
false and then add the old set of monitors you were using under the
monitors key of your Helm values, along with the new monitors so that you
temporarily emit duplicated metrics with the distinct dimension sets.

Discovery Rule Evaluation

We have switched from the govaluate
expression runner to the expr library.
This allows much more expressive discovery rules and a simpler syntax. This
should not cause any issues for the vast majority of discovery rules as the
syntax is virtually identical (and we have added a compatibility layer for
syntax that differs), but if you use complex rules you will definitely want to
test them with the new version before deploying to production systems.

Docker Image: quay.io/signalfx/signalfx-agent:5.0.0 (digest: sha256:4c5881200973bc0225a848c9550388bb626c2e0dd5080c04e2fa56326eace4bc)