Releases · m-lab/script-exporter-support

30 Apr 17:40

nkinkade

b3c3467

Fixes a bug in the apply-tc-rules.service

The systemd unit apply-tc-rules.service gets run once a day by the systemd timer apply-tc-rules.timer, but the service ExecStart directive of the service was using the -ti flags for docker. Since this is run daily by a mechanism that cannot start a pty, the timer was failing. This release include a commit which simply removes the -ti flags, which should resolve this issue.

Assets 2

04 Apr 17:30

nkinkade

production/1.6

ce61d31

Explicitly starts docker.service on boot

Fixes this bug: m-lab/ops-tracker#348

... which is summed up in this comment:

coreos/bugs#369 (comment)

Assets 2

03 Apr 17:22

nkinkade

production/1.5

17968e5

Updates tc traffic shaping rules on a daily basis

The script_exporter Docker container must have tc traffic shaping rules in place to throttle the ndt-e2e test. If they are not in place for a given IP address, then the test will refuse to run. Previously, the rules were getting out of sync with reality (i.e., sites.py), and updating them would either be manual or require a redeployment of the entire VM. This release adds a new systemd timer on the host VM that should run a docker-exec command on a daily basis (at midnight) to update the tc rules.

Assets 2

26 Feb 17:34

nkinkade

production/1.4

52138ca

Spreads out NDT e2e tests over cache expiry interval

This release fixes an issue whereby the NDT e2e tests were being run at that same time rather than spread out. This was causing large spikes in resource usage on the VM, and in some extreme cases was causing the OOM killer to kill the Docker container running script_exporter.

The ndt_e2e.sh script caches the results of the end to end test in a local file for each node. It was previously setting the expiry of the cache file to exactly 10 minutes. However, since Prom probes it every minute, all cache files were expiring between 0 and 60 seconds apart, meaning floods of tests. This release brings in a fix that randomizes the expiry of each cache file between 0 and 10 minutes, which should roughly spread the testing out over a period of 10, which should also spread the load caused by nodejs.

Assets 2

20 Feb 21:37

nkinkade

production/1.3

e19a30d

Always [re]start Docker containers

This release contains two updates:

bumps the GCE machine type for the mlab-oti project to n1-standard-8, since n1-highmem-4 was apparently not enough CPU.
adds a --restart=always to the docker-run commands so that containerd will always start the containers if they stop for any reason (e.g., the GCE instance was restarted).

Assets 2

12 Feb 22:02

nkinkade

production/1.2

bd77add

Adds node_exporter to GCE instance

The main purpose of this release is to run a Prometheus node_exporter instance in the GCE VM so that we can easily monitor resource usage of the VM.

The release also set the GCE machine type for the mlab-oti project to n1-highmem-4 to account for how resource hungry nodejs is.

Assets 2

06 Feb 00:12

nkinkade

production/1.1

26b7678

Adds ndt_e2e result caching + ndt_queue now returns actual RC

This PR includes several improvements:

The ndt_e2e script now caches the result of the previous test. If the test result is a pass (return code 0) then the script will return the cached value for 10 minutes. If the return code is not 0, then the script will run the end to end test on every probe (once a minute). This allows us to probe the ndt_e2e script every minute without overloading any servers by testing every minute, and has the added benefit that tests will run more frequently when NDT is down so that a recovery will be detected much sooner.
Some changes were made to the script_exporter code such that it now returns a new metric named script_exit_code. This is useful especially for the ndt_queue script to help us differentiate actual queueing from just a failed test (e.g., a DNS error, a transient network condition, etc.).
The script_exporter fork was moved from nkinkade's personal Github account into the m-lab account.

Assets 2

29 Jan 22:37

nkinkade

production/1.0

2c55250

First production release

This is the first production release of this service. As a start is makes available two probes:

ndt_e2e: Runs a rate-limited NDT test against the specified target.
ndt_queue: Checks whether the specified target (and NDT server) is queueing.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: m-lab/script-exporter-support

Fixes a bug in the apply-tc-rules.service

Explicitly starts docker.service on boot

Updates tc traffic shaping rules on a daily basis

Spreads out NDT e2e tests over cache expiry interval

Always [re]start Docker containers

Adds node_exporter to GCE instance

Adds ndt_e2e result caching + ndt_queue now returns actual RC

First production release