runtimemetrics

-- import "github.com/KarelKubat/runtime-metrics"

Package runtimemetrics implements the collection of in-program metrics, a server to publish these, and a client to collect them.

What Is It

The design principle of runtime-metrics is that it's light-weight enough to be embedded into a server, and that it won't hinder performance. It can collect and report simple, one-dimensional metrics: counters, sums and averages - either over the lifetime of the metric, or over a given period (e.g., counts per minute).

It cannot handle multi-dimensional data, such as lists of numbers or strings. It doesn't retain series; it's designed so that the impact of a metric is known a-priori. If you need to retain lists of counters, sums or averages (e.g., to analyze trends), then scrape them with an external client and retain there.

Basic In-Program Usage

To collect metrics inside a program and to act on changes; i.e., without publishing the metrics using a server and without scraping them using a client, the following can be used. As an example, we wish to track the ratio of failures, and log something when this ratio exceeds 1% over a period of 30 seconds.

import "github.com/KarelKubat/runtime-metrics/base"
...

// Create the metrics. We have:
// - NewCount() for incremental counting
// - NewSum() for totalling float64 values
// - NewAverage() for averaging float64 values
// and metrics per given a period:
// - NewCountPerDuration(d time.Duration)
// - NewSumPerDuration(d time.Duration)
// - NewAveragePerDuration(d time.Duration)

errorRatio = base.NewAveragePerDuration(30 * time.Second)

// Check failures vs. totals and do something when there is >= 1% failures.
// Poll every 30 seconds, a shorter period won't help because the average
// cannot change any quicker.
go func() {
  // average is the recorded value, n is the number of cases,
  // until is the up-to timestamp of the calculation
  average, n, until := errorRatio.Report()
  if average >= 0.01 {
    log.Printf("WARNING %v percent of lookups is failing " +
      "over a period of 30 seconds until %v, %v cases ",
      average * 100.0, until, n)
    }
  }
  time.Sleep(time.Second * 30)
}()

// Loop and track totals and failures
for {
  err := lookupSomething()  // hypothetical function
  if err != nil {
    errorRatio.Mark(1.0)    // mark error (and increase #-cases)
  } else {
    errorRatio.Mark(0.0)    // mark success (only increase #-cases)
}

It should be noted here that there are different ways to solve this. One could also use two counters, one for the total loops and one for the failures, and divide them to get a ratio.

In this case, it's also good to limit the collection of metrics and their reporting to a given duration; otherwise, a long run of successes might mask suddenly occurring errors until it's too late.

The metric types all have a somewhat similar API: New*() instantiates a metric, Mark() registers an event, and Report() returns some result. In the case of an average, Mark() expects one float64 argument, and returns three values: average, number of cases, and a timestamp.

Publishing Metrics

In order to publish metrics, they are added to a registry, and a server is started:

import "github.com/KarelKubat/runtime-metrics/base"
import "github.com/KarelKubat/runtime-metrics/registry"
import "github.com/KarelKubat/runtime-metrics/reporter"
...

errorRatio := base.NewAveragePerDuration(30 * time.Second)
err := registry.AddAveragePerDuration("lookup-error-ratio-per-30-sec", errorRatio)
if err != nil { ... }                 // collision of name

go func() {
  err := reporter.StartReporter(":1234")
  if err != nil { ... }               // probably port 1234 is already in use
}()

Scraping Metrics

Published metrics can be scraped by a client:

import "github.com/KarelKubat/runtime-metrics/reporter"

c, err := reporter.NewClient(":1234")
if err != nil { ... }  // connection error

av, n, until, err := c.AveragePerDuration("lookup-error-ratio-per-30-sec")
if err != nil { ... }  // metric doesn't exist

if av > 0.01 {
  log.Printf("WARNING %v percent of lookups is failing", av * 100)
}

There is also discovery: a client can get a list of all the names of counts, sums, and averages, and query these. See demo/demosrc/client_allnames.go for an example.

License

This software is distributed under GPLv3, see LICENSE.md. Basically that means that you can use it as you like, there's no cost and no guarantee, you are free no modify the code as you see fit, but if you do, you must make your changes available to others under the same conditions.

As a courtesy, I'd love to be informed of any changes that you make - I could incorporate them into a next release, with full credits if you like.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
api		api
base		base
demo		demo
namedset		namedset
registry		registry
reporter		reporter
rtmerror		rtmerror
tools		tools
LICENSE.md		LICENSE.md
README.md		README.md
doc.go		doc.go
doc_test.go		doc_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

runtimemetrics

What Is It

Basic In-Program Usage

Publishing Metrics

Scraping Metrics

Further Reading

License

Usage

About

Releases

Packages

Languages

License

KarelKubat/runtime-metrics

Folders and files

Latest commit

History

Repository files navigation

runtimemetrics

What Is It

Basic In-Program Usage

Publishing Metrics

Scraping Metrics

Further Reading

License

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages