-
Notifications
You must be signed in to change notification settings - Fork 0
/
doc.go
142 lines (109 loc) · 5.16 KB
/
doc.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
/*
Package runtimemetrics implements the collection of in-program metrics,
a server to publish these, and a client to collect them.
What Is It
The design principle of runtime-metrics is that it's light-weight
enough to be embedded into a server, and that it won't hinder
performance. It can collect and report simple, one-dimensional
metrics: counters, sums and averages - either over the lifetime of the
metric, or over a given period (e.g., counts per minute).
It cannot handle multi-dimensional data, such as lists of numbers or
strings. It doesn't retain series; it's designed so that the impact of
a metric is known a-priori. If you need to retain lists of counters,
sums or averages (e.g., to analyze trends), then scrape them with an
external client and retain there.
Basic In-Program Usage
To collect metrics inside a program and to act on changes; i.e.,
without publishing the metrics using a server and without scraping
them using a client, the following can be used. As an example, we
wish to track the ratio of failures, and log something when this
ratio exceeds 1% over a period of 30 seconds.
import "github.com/KarelKubat/runtime-metrics/base"
...
// Create the metrics. We have:
// - NewCount() for incremental counting
// - NewSum() for totalling float64 values
// - NewAverage() for averaging float64 values
// and metrics per given a period:
// - NewCountPerDuration(d time.Duration)
// - NewSumPerDuration(d time.Duration)
// - NewAveragePerDuration(d time.Duration)
errorRatio = base.NewAveragePerDuration(30 * time.Second)
// Check failures vs. totals and do something when there is >= 1% failures.
// Poll every 30 seconds, a shorter period won't help because the average
// cannot change any quicker.
go func() {
// average is the recorded value, n is the number of cases,
// until is the up-to timestamp of the calculation
average, n, until := errorRatio.Report()
if average >= 0.01 {
log.Printf("WARNING %v percent of lookups is failing " +
"over a period of 30 seconds until %v, %v cases ",
average * 100.0, until, n)
}
}
time.Sleep(time.Second * 30)
}()
// Loop and track totals and failures
for {
err := lookupSomething() // hypothetical function
if err != nil {
errorRatio.Mark(1.0) // mark error (and increase #-cases)
} else {
errorRatio.Mark(0.0) // mark success (only increase #-cases)
}
It should be noted here that there are different ways to solve this.
One could also use two counters, one for the total loops and one for
the failures, and divide them to get a ratio.
In this case, it's also good to limit the collection of metrics and
their reporting to a given duration; otherwise, a long run of
successes might mask suddenly occurring errors until it's too late.
The metric types all have a somewhat similar API: New*() instantiates
a metric, Mark() registers an event, and Report() returns some result.
In the case of an average, Mark() expects one float64 argument, and
returns three values: average, number of cases, and a timestamp.
Publishing Metrics
In order to publish metrics, they are added to a registry, and a
server is started:
import "github.com/KarelKubat/runtime-metrics/base"
import "github.com/KarelKubat/runtime-metrics/registry"
import "github.com/KarelKubat/runtime-metrics/reporter"
...
errorRatio := base.NewAveragePerDuration(30 * time.Second)
err := registry.AddAveragePerDuration("lookup-error-ratio-per-30-sec", errorRatio)
if err != nil { ... } // collision of name
go func() {
err := reporter.StartReporter(":1234")
if err != nil { ... } // probably port 1234 is already in use
}()
Scraping Metrics
Published metrics can be scraped by a client:
import "github.com/KarelKubat/runtime-metrics/reporter"
c, err := reporter.NewClient(":1234")
if err != nil { ... } // connection error
av, n, until, err := c.AveragePerDuration("lookup-error-ratio-per-30-sec")
if err != nil { ... } // metric doesn't exist
if av > 0.01 {
log.Printf("WARNING %v percent of lookups is failing", av * 100)
}
There is also discovery: a client can get a list of all the names of
counts, sums, and averages, and query these. See
demo/demosrc/client_allnames.go for an example.
Further Reading
Be sure to read the docs of the base/ package to understand what the
metrics do. Particularly make sure you understand how *PerDuration
metrics work, they always report results that "lag" a
duration.
For information how to publish metrics, read the server section in
package reporter/. For scraping, read the client section. A runnable
example is provided in demo/ with sources in demo/demosrc/.
Packages tools/, namedset/ and api/ are for internal usage.
License
This software is distributed under GPLv3, see LICENSE.md. Basically that means
that you can use it as you like, there's no cost and no guarantee, you are
free no modify the code as you see fit, but if you do, you must make
your changes available to others under the same conditions.
As a courtesy, I'd love to be informed of any changes that you make - I could
incorporate them into a next release, with full credits if you like.
*/
package runtimemetrics