v5.0.0
Major and Breaking Changes
Most of the changes that were released with 5.0.0 are breaking, since we have
been releasing non-breaking changes continuously into the 4.x line of releases.
Note that the 5.0.0 release is the first release for which the DEB and RPM
packages are (only available in the new Splunk repository) (see
https://github.com/signalfx/signalfx-agent/blob/master/docs/deb-rpm-repo-migration.md).
If you are using the Docker image, Windows release, or tar.gz bundle, nothing
has changed.
Kubernetes Node Dimension
On Kubernetes, the global dimension kubernetes_node
(which was the name of
the node on which the agent is running) has been replaced with
kubernetes_node_uid
to deal with situations where node names are reused.
This will result in all time series emitted by the agent being recreated in our
backend, which might result in time series creation throttling. The best way
to mitigate this is to roll out the upgrade node-by-node slowly by using the
Daemonset’s rollingUpdate
strategy with a maxUnavailable
of 1
and a
minReadySeconds
of 60
or more (see Perform a Rolling Update on a
DaemonSet).
Our Helm chart fully supports these options, but they must be set in your
custom values when deploying. Make sure you are using the latest version of
the Helm chart.
Note that this also means that the agent requires connectivity to the K8s API
server on startup in order to get the Node UID (it cannot be gotten from the
node itself).
Monitor removals
The following monitors were removed and have had newer equivalents in the agent
for quite some time now:
collectd/docker
(replaced bydocker-container-stats
, which is much more performant)collectd/haproxy
(replaced byhaproxy
)
If you need to migrate from these, you can enable both the old and new monitor
types in your existing 4.x agent, do the detector/chart migration, remove the
old monitors types, and then do the upgrade to 5.x.
Monitor-level metricsToExclude
The monitor-level metricsToExclude
config option has been removed. Please
migrate any of these filters to the datapointsToExclude
config option. The
filter behavior is a bit different (see
filtering
and legacy
filtering,
but will be the same for simple filters that just blacklist certain values.
Legacy Whitelist Filtering Removal
The old whitelist.json file has been removed from the agent, so any references
to it in your agent config will cause the agent to fail at startup. You should
remove the reference to it and set enableBuiltInFiltering: true
(or just omit
the config and it will default to true
). If you had not made any
modifications to the whitelist.json file, then everything should send as it was
before. If you have made changes to that file, or if you have config for the
top-level metricsToInclude
(also deprecated), then you will need to migrate
that to the new, more powerful, filtering
mechanism.
Misc Monitor Changes
Timeout Config
The following monitors had an option like timeoutSeconds: 5
before which is
now httpTimeout: 5
or httpTimeout: 5s
(note the s). The durations can be
explicitly specified as part of the value using Go's ParseDuration
syntax. If the duration is left
off it will be interpreted as seconds:
- conviva
- ecs-metadata
- logstash
conviva monitor
- The
pulseUsername
option is nowusername
- The
pulsePassword
option is nowpassword
haproxy monitor
The sslVerify
option (default true
) is now skipVerify
and defaults to
false
. The defaults are equivalent but if you had the option set the value
is now inverse.
docker-container-stats monitor
The metric memory.usage.total
no longer accounts for the buffer cache memory
used by the container. The buffer cache has always been available in the
memory.stats.total_cache
metric also emitted by this monitor (though not
enabled by default).
kubernetes-cluster monitor
Removed the useNodeName
config option. We previously used the node name
dimension to sync properties, however we now use a unique kubernetes_node_uid
making this config option obsolete.
kubernetes-events monitor
If an event was emitted by a Node, we will now attach kubernetes_node
and
kubernetes_node_uid
on the event. Previously, we used kubernetes_name
and
kubernetes_uid
.
Host Infrastructure Monitor Changes
With the 5.0.0 release, all of the host infrastructure metrics in our default
configuration are now written in Go and are not dependent on collectd. We are
deprecating collectd within the agent in general, but it is still there in
5.0.0, along with all of the existing collectd/*
host infrastructure
monitors. However, we recommend that you upgrade your configurations to use
the new host infrastructure monitors soon after upgrading to 5.0.0+ since
collectd will be going away at some point.
These are the monitors that have been reimplemented within the agent core:
collectd/df
--->filesystems
(filesystem usage)collectd/interface
--->net-io
(network interface traffic)collectd/vmem
--->vmem
(virtual memory subsystem)collectd/cpu
---> cpu (cpu usage overview)collectd/memory
--->memory
(high-level system memory)collectd/load
--->load
(Linux system load)collectd/disk
--->disk-io
(disk IO usage)
All of these new monitors have removed the plugin
and dsname
dimension from
all datapoints emitted, so any filtering in charts/detectors needs to remove
these as well.
collectd/df ---> filesystems Monitor (Filesystem Usage Stats)
The collectd/df
monitor is being deprecated in favor of the filesystems
monitor. While the collectd/df
monitor will still be available in 5.0, it is
recommended that you switch to the filesystems
monitor soon after upgrading.
See the docs for the filesystems
monitor for a guide on
migrating.
collectd/interface ---> net-io Monitor (Network Interface Stats)
The old collectd/interface
monitor is deprecated in favor of the new net-io
monitor. The old monitor will still be available in 5.0, but it is recommended
that you switch to the net-io
monitor soon. This new monitor emits the same
metrics except that the plugin_instance
dimension has been renamed to the
interface
dimension. The values of the dimension should be the same.
Virtual Memory Stats
The collectd/vmem
monitor has been deprecated in favor of the vmem
monitor.
It is recommended that you switch to the new monitor soon. The only difference
is that the new monitor does not have the dsname
and plugin
dimensions.
Smooth Migration
To smoothly migrate detectors for the collectd/df
, collectd/interface
, and
collectd/disk
monitors, you can temporarily enable both the old and new
monitor and send in both sets of metrics with the two different sets of
dimensions. This is only relevant for the monitors that identify specific
subcomponents by dimensions.
Python 3
The agent now ships with a Python 3.8 runtime for both collectd monitors and
custom scripts that use the python-monitor
monitor. If you have custom
scripts, make sure they are Python 3 compatible. The included collectd Python
monitors have been upgraded to work properly with Python 3.
If you are using the python-monitor
and do not want to make custom scripts
work with Python 3, you can use a custom Python binary by setting the
pythonBinary
config option. If using the collectd/custom
monitor, you have
no option but to make the scripts Python 3 compatible or convert them to the
python-monitor
format.
Helm
If you have used our Helm chart to deploy a previous agent version to
Kubernetes and you are not providing the agentConfig
value, then you will
automatically be switched to the new host infrastructure monitors.
In order to seamlessly transition to the new host infrastructure monitors, you
can follow the same process described above under "Smooth Migration". You will
need to use the new configureStandardMonitors
config option and set it to
false
and then add the old set of monitors you were using under the
monitors
key of your Helm values, along with the new monitors so that you
temporarily emit duplicated metrics with the distinct dimension sets.
Discovery Rule Evaluation
We have switched from the govaluate
expression runner to the expr library.
This allows much more expressive discovery rules and a simpler syntax. This
should not cause any issues for the vast majority of discovery rules as the
syntax is virtually identical (and we have added a compatibility layer for
syntax that differs), but if you use complex rules you will definitely want to
test them with the new version before deploying to production systems.
Docker Image:
quay.io/signalfx/signalfx-agent:5.0.0
(digest:sha256:4c5881200973bc0225a848c9550388bb626c2e0dd5080c04e2fa56326eace4bc
)