06-AttributeData.Rmd

# Control charts for count, proportion, or rate data {#attribute}

| If your data involve... | use a ... | based on the ... distribution. | 
| -------------------------------------- | --------- | ------------------------ | 
| Rates  | *u* chart | Poisson | 
| Counts (with equal sampling units) | *c* chart | Poisson |
| Proportions  | *p* chart | binomial |
| Proportions (with equal denominators) | *np* chart | binomial | 
| Rare events | *g* chart | geometric | 

- For count, rate, or proportion data, carefully define your numerator and denominator. Evaluate each separately over time to see whether there are any unusual features or patterns. Sometimes patterns can occur in one or the other, then disappear or are obscured when coalesced into a rate or proportion.  

- For count data, prefer *u*-charts to *c*-charts. In most cases, we do not have a constant denominator, so c-charts would not be appropriate. Even when we do, using a *u*-chart helps reduce audience confusion because you are explicitly stating the "per *x*".    

- For proportion data, prefer *p*-charts to *np*-charts. Again, we almost never have a constant denominator, so *np*-charts would not be appropriate. Even when we do, using a *p*-chart helps reduce audience confusion by explicitly stating the "per *x*".   

- Rare events can be evaluated either by *g*-charts (this chapter) for discrete events/time steps, or *t*-charts ([next chapter](#tchart)) for continuous time.   

## *u*-chart example

The majority of healthcare metrics of concern are rates, so the most common control chart is the *u*-chart.  

Sometimes, a KPI is based on counts. This is obviously problematic for process monitoring in most healthcare situations because it ignores the risk exposure---for example, counting the number of infections over time is meaningless if you don't account for the change in the number of patients in that same time period. When KPIs are measuring counts with a denominator that is *truly fixed*, technically a *c*-chart can be used. This makes sense in manufacturing, but not so much in healthcare, where the definition of the denominator can be very important. You should always use a context-relevant denominator, so in basically all cases a *u*-chart should be preferred to a *c*-chart.    

**Mean for rates (*u*):** &nbsp;&nbsp; $u = {\frac{\Sigma{c_i}}{{\Sigma{n_i}}}}$

**3$\sigma$ control limits for rates (*u*):** &nbsp;&nbsp; $3\sqrt{\frac{u}{n_i}}$   

<br/>  

*Infections per 1000 central line days*   

``` {r uex}
# Generate fake infections data
set.seed(72)
dates = seq(as.Date("2013/10/1"), by = "day", length.out = 730)
linedays = sample(30:60,length(dates), replace = TRUE)
infections = rpois(length(dates), 2/1000*linedays)

months = as.Date(strftime(dates, "%Y-%m-01"))

dfi = data.frame(months, linedays, infections)

infections.agg = aggregate(dfi$infections, by = list(months), FUN = sum, na.rm = TRUE)$x
linedays.agg = aggregate(dfi$linedays, by = list(months), FUN = sum, na.rm = TRUE)$x

# Calculate u chart inputs
subgroup.u = unique(months)
point.u = infections.agg / linedays.agg * 1000
central.u = sum(infections.agg) / sum(linedays.agg) * 1000
sigma.u = sqrt(central.u / linedays.agg * 1000)
```
```{r uchart_example, fig.height = 3.5}
# Plot u chart
spc.plot(subgroup.u, point.u, central.u, sigma.u, k = 3, lcl.min  = 0,
         label.x = "Month", label.y = "Infections per 1000 line days")
```

See Appendix for details of the spc.plots() function.

<br/>  

## *p*-chart example

When your metric is a true proportion (and not a rate, e.g., a count per 100), a *p*-chart is the appropriate control chart to use.  

**Mean for proportions (*p*):** &nbsp;&nbsp; $p = {\frac{\Sigma{y_i}}{\Sigma{n_i}}}$

**3$\sigma$ control limits for proportions (*p*):** &nbsp;&nbsp; $3\sqrt{\frac {p (1 - p)}{n_i}}$  

<br/>  

*Proportion of patients readmitted*  

``` {r pex, fig.height = 3.5}
# Generate sample data
discharges = sample(300:500, 24)
readmits = rbinom(24, discharges, .2)
dates = seq(as.Date("2013/10/1"), by = "month", length.out = 24)

# Calculate p chart inputs
subgroup.p = dates
point.p = readmits / discharges
central.p = sum(readmits) / sum(discharges)
sigma.p = sqrt(central.p*(1 - central.p) / discharges)

# Plot p chart
spc.plot(subgroup.p, point.p, central.p, sigma.p,
         label.x = "Month", label.y = "Proportion readmitted")
```

<br/>  

## Rare events (*g*-charts) {#gchart}

There are important KPIs in healthcare related to rare events, such as is common in patient safety and infection control. These commonly have 0 values for several subgroups within the process time-period. In these cases, you need to use *g*-charts for a discrete time scale (e.g., days between events) or *t*-charts for a continuous time scale (e.g., time between events). See [Chapter 10](#tchart)) for details on using *t*-charts to evaluate the time between events.  

### *g*-chart example

**Mean for infrequent counts (*g*):** &nbsp;&nbsp; $g = {\frac{\Sigma{g_i}}{\Sigma{n_i}}}$
&nbsp;&nbsp;&nbsp;&nbsp; *where*  
&nbsp;&nbsp;&nbsp;&nbsp; $g$ = units/opportunities between events    

**3$\sigma$ limits for infrequent counts (*g*):** &nbsp;&nbsp; $3\sqrt{g (g + 1)}$    

<br/>  

*Days between infections*    
 
``` {r gex}
# Generate fake data using u-chart example data
infections.index = which(infections > 0)[1:30]
dfind = data.frame(start = head(infections.index, length(infections.index) - 1) + 1, 
                   end = tail(infections.index, length(infections.index) - 1))

linedays.btwn = matrix( , length(dfind$start))

for (i in 1:length(linedays.btwn)) {
    sumover = seq(dfind$start[i], dfind$end[i])
    linedays.btwn[i] = sum(linedays[sumover])
}

# Calculate g chart inputs
subgroup.g = seq(2, length(infections.index))
point.g = linedays.btwn
central.g = mean(point.g)
sigma.g = rep(sqrt(central.g*(central.g+1)), length(point.g))
```

```{r gchart_example, fig.height = 3.5}
# Plot g chart
spc.plot(subgroup.g, point.g, central.g, sigma.g, lcl.show = FALSE, 
         band.show = FALSE, rule.show = FALSE,
         lcl.min = 0, k = 3, label.x = "Infection number",
         label.y = "Line days between infections")
```

<br/>   


## *c*- and *np*-chart details  

Simply for completeness, means and control limits for *c*- and *np*-charts are presented here. To emphasize that *u*- and *p*-charts should be preferred (respectively), no examples are given.    

**Mean for counts (*c*):** &nbsp;&nbsp; $\frac{\Sigma{c_i}}{n}$

**3$\sigma$ control limits for counts (*c*)(not shown):** &nbsp;&nbsp; $3\sqrt{c}$   

**Mean for equal-opporuntity proportions (*np*):** &nbsp;&nbsp; $np = {\frac{\Sigma{y_i}}{n}}$  
&nbsp;&nbsp;&nbsp;&nbsp; *where*  
&nbsp;&nbsp;&nbsp;&nbsp; $n$ is a constant  

**3$\sigma$ control limits for equal-opporuntity proportions (*np*):** &nbsp;&nbsp; $3\sqrt{np (1 - p)}$  
&nbsp;&nbsp;&nbsp;&nbsp; *where*  
&nbsp;&nbsp;&nbsp;&nbsp; $n$ is a constant