-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy path90-MiscHelper.Rmd
277 lines (201 loc) · 10.8 KB
/
90-MiscHelper.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
# (APPENDIX) Appendix {.unnumbered}
# DEA Helper Functions
## Introduction
```{r, include=FALSE, eval=FALSE}
library(bookdown); library(rmarkdown); rmarkdown::render("90-MiscHelper.Rmd", "pdf_book")
```
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This chapter explains some functions or commands that may be helpful to the DEA researcher and also describes some that don't exist yet but could be helpful.
## Filtering Nonpositive Columns
This function is a quick and dirty shortcut that will eliminate columns that do not contain a positive value. The primary use for this in a DEA setting is to eliminate columns of DMUs from a lambda matrix that are never used for comparison. In traditional DEA models, this would mean eliminating the columns of inefficient DMUs so that only the efficient ones are shown.
This can be incorporated as a single line of code but is tricky enough, we will also make it into a function.
Let's revisit the Baker Hughes Corporation Directional Drilling example from Chapter 9.
Let's start by looking at the full envelopment results of the Baker Hughes example from earlier in the book. This table shows both the efficiency score and every value of the matrix of $\lambda$ values.
```{r BHCorp, echo=FALSE, warning=FALSE}
library (MultiplierDEA)
library (kableExtra)
library (TRA)
NX <- 1; NY <- 4; ND <- 14
names <- TRA::DEAnames (NX, NY, ND)
XBH <- matrix(c(1.07, 1.06, 0.325, 1.60, 0.55, 0.2, 0.35,
0.53, 0.21, 0.16, 0.07, 1.95, 5.59, 3.10),
ncol=NX,dimnames=c(list(names$DMUnamesbyletter),
list(names$Xnames)))
YBH <- matrix(c( 32, 50, 40, 30, 25, 8, 2,
12, 10, 0.8, 3, 300, 60, 240,
8.2, 7.6, 7.6, 7.1, 7.0, 6.0, 5.9,
5.8, 5.8, 5.4, 5.3, 6.8, 6.2, 6.2,
7.5, 7.2, 7.1, 7.2, 7.0, 6.1, 6.2,
5.8, 5.8, 5.6, 5.4, 6.1, 6.9, 6.6,
8.0, 6.4, 5.3, 5.5, 5.1, 6.9, 6.6,
5.4, 4.7, 6.1, 6.5, 6.4, 6.8, 7.1),
ncol=NY,dimnames=c(list(names$DMUnamesbyletter),
list(names$Ynames)))
resBH<-DeaMultiplierModel(XBH,YBH,rts = "vrs", orientation="input")
# Rename some of the results row and column labels
dimnames(resBH$Lambda)<-c(list(names$DMUnamesbyletter),
list(names$LambdanamesbyletterLX))
dimnames(resBH$vx)<-c(list(names$DMUnamesbyletter),
list(names$VnamesLX))
dimnames(resBH$uy)<-c(list(names$DMUnamesbyletter),
list(names$UnamesLX))
kbl (cbind(resBH$Efficiency,resBH$Lambda),
booktabs=T, escape=F,
caption="Envelopment results for Baker Hughes Corporation analysis.") |>
kable_styling (latex_options = c("HOLD_position", "scale_down"))
```
Notice that the data table is messy. In particular, an inefficient unit such as J is never used for creating a target for any other unit. Therefore the entire column (L_J) is zero. We want to be careful about simply removing columns since the full matrix makes further analysis simpler but for viewing, we don't need to include that data. Chester Ismay contributed an elegant line of code based on testing to see if the column sum is zero and if so, remove the column. The following line of code modifies this idea to add every column and then only pass columns that are larger than a cutoff value.
In theory, a cutoff could be set to exactly zero but sometimes small deviations from zero will arise from floating point arithmetic calculation. A cutoff value can then be used to guard against these very small positive or negative values where the result should actually be zero.
```{r displaysmallertable1}
df <- cbind(resBH$Efficiency,resBH$Lambda)
cutoff <- 0.000001
kbl (tempdf <- df[, colSums(df) > cutoff],
booktabs=T, escape=F,
caption = "Envelopment results with columns containing
positive values.") |>
kable_styling (latex_options = c("HOLD_position", "scale_down"))
```
Notice that the second table is much easier to interpret. Simply removing unused columns that cause the table to be broken up allows the table to fit much better on a screen or page.
While the code is elegant, it does decrease readability of the code. Given that lambda matrices are frequently examined, let's make it easier for the reader by compartmentalizing this as a function. This is a good chance to review the use of function.
The `TRA` package contains a function named \index{TRA!poscol} `poscol` that takes a dataframe and drops the non-positive columns. It can optionally take a `cutoff` parameter that can be more robust than the default value of 0.
## Define Names
For many datasets, we simply want to have DMUs named with sequential letters, numbered inputs, and numbered outputs. Defining lists of names every time for these items and the resulting items results can become time consuming. Instead, the `TRA` package has a function \index{TRA!DEAnames} `DEAnames` that will create common DEA object names.
This function simply needs the number of DMUs, inputs, and outputs.
Let's see what the function does.
```{r nameexample}
NX <- 2; NY <- 3; ND <- 4
modelnames <-TRA::DEAnames(2,2,4)
modelnames
```
These are names that can be used as row names or column names. They categorized as regular text names. Names that end with an `LX` suffix are structured in the format needed for \index{LaTeX} LaTeX rendering of tables in RMarkdown and \index{kable} `kable`.
```{r}
x <- matrix(c(10,20,30,50,60,70,80,90),ncol=2)
y <- matrix(c(75,100,300,400,90,110,320,430), ncol=2)
```
```{r}
t1 <- cbind (x,y)
t2 <- cbind (x,y)
rownames(t2) <- modelnames$DMUnames
colnames(t2) <- c(modelnames$Xnames, modelnames$Ynames)
t3 <- cbind (x,y)
rownames(t3) <- modelnames$DMUnames
colnames(t3) <- c(modelnames$XnamesLX, modelnames$YnamesLX)
```
```{r}
library(kableExtra)
kbl(t1, booktabs=TRUE, escape=FALSE,
caption="A Table without Row or Column Names.") |>
kable_styling (latex_options = c("HOLD_position"))
kbl(t2, booktabs=TRUE, escape=FALSE,
caption="A Table with Plain Text Names.") |>
kable_styling (latex_options = c("HOLD_position"))
kbl(t3, booktabs=TRUE, escape=FALSE,
caption="A Table with LaTeX Names.") |>
kable_styling (latex_options = c("HOLD_position"))
```
## Function for Drawing Input-Output Diagram
I find visual models sometimes clarify the model. Let's draw an input-output diagram \index{Input-output diagram} DEA model. This diagram has inputs feeding into a box and outputs coming out. I used the very rich \index{DiagrammeR} `DiagrammeR` package for this function. For a comprehensive description of options in the package, this site is very useful. <http://rich-iannone.github.io/DiagrammeR/graphviz_and_mermaid.html>
Let's start by hard coding a one-input, four-output CCR input-oriented model. We will retain the `Xnames` and `Ynames` list of input and output names respectively from the previous section.
```{r TryingIODiagram1, warning = FALSE, message = FALSE}
library(DiagrammeR)
library(DiagrammeRsvg)
library(rsvg)
library(htmltools)
a <- "'"
Xlabels <- c(mapply(paste0, a, names$Xnames, a))
# Wraps names with ' (backtick)
Ylabels <- c(mapply(paste0, a, names$Ynames, a))
Modellabel <- c(mapply(paste0, a, "\n\nCCRIO\n\n ", a))
# Pads model name with two carriage returns
# to make it a taller rectangle and look better
# Define labels for the inputs and outputs
nodelabels <- paste(c(Xlabels,Ylabels), collapse="; ")
# Define edge
edges <- paste0(
Xlabels[1], "->", Modellabel [1], " ",
Modellabel [1], "->", Ylabels[1], " ",
Modellabel [1], "->", Ylabels[2], " ",
Modellabel [1], "->", Ylabels[3], " ",
Modellabel [1], "->", Ylabels[4], " "
)
# Build Diagram:
demoIOdiagram <- grViz(
paste0(" digraph hardcoded_IO_Diagram {
# a 'graph' statement
graph [overlap = true, fontsize = 10, rankdir = LR]
# several 'node' statements
node [shape = plaintext,
fontname = Helvetica]
", nodelabels, "
node [shape = box, fillcolor = green,
fontname = Helvetica]
", Modellabel, "
# several 'edge' statements
", edges, "
}"
)
)
#stnds.qa.d2 <- grViz(stnds.qa.d)
tmp<-capture.output(rsvg_png(charToRaw(export_svg(demoIOdiagram)),
'demoIOdiagram.png', height = 1440))
```
![Demo IO Diagram](demoIOdiagram.png){#fig:demoIOdiagram width="25%"}\
Okay. Now let's generalize it for an arbitrary number of inputs and outputs.
```{r TryingIODiagram2, dpi=144}
library(DiagrammeR)
a <- "'"
NX <- lengths(names$Xnames) # Number of inputs
NY <- lengths(names$Ynames) # Number of outputs
Xlabels <- c(mapply(paste0, a, names$Xnames, a))
Ylabels <- c(mapply(paste0, a, names$Ynames, a))
nboxpad <- max(floor(1+NX/2), floor(1+NY/2))
boxpadding <- paste(rep("\n",nboxpad),collapse='')
# Build line returns
modellabel <- paste(boxpadding, "CCRIO", boxpadding,
collapse="")
# Want to vertically pad returns, \n's top and bottom
# to balance the height of the boxspace figure
# Define the nodes for the inputs and outputs
nodelabels <- paste(c(Xlabels,Ylabels), collapse="; ")
# Define the edges (arrows) for inputs and outputs
Xedges <- paste(c(mapply(paste0, Xlabels,"->",
Modellabel [1], " " )), collapse="")
Yedges <- paste(c(mapply(paste, Modellabel [1], "->",
Ylabels," " )),collapse="")
# Combine edges for inputs and outputs
edges <- paste0( Xedges, Yedges)
# All edges or arrows are for inputs or outputs
## Plot graph:
generalIO <- grViz(
paste0(" digraph Input_Output_Diagram {
# a 'graph' statement
graph [overlap = false, # True or false both work fine
fontsize = 10,
rankdir = LR, # Left to right structure
layout = dot] # dot layout is best for this,
# neato, twopi, circo are not suitable
# several 'node' statements
node [shape = plaintext, # Avoids any outline or shape
fontname = Helvetica]
", nodelabels, "
node [shape = box, fillcolor = green,
fontname = Helvetica]
", Modellabel, "
# several 'edge' statements
", edges, "
}"
)
)
tmp<-capture.output(rsvg_png(charToRaw(export_svg(generalIO)),
'generalIO.png', height = 1440))
```
Now let's try loading it from Rmarkdown.
![General IO Diagram](generalIO.png){#fig:generalIO}
#### Optional Future Features for Drawing Input-Output Diagrams
In the future, these might be nice features to add:
- Optional: to generate a table listing inputs and outputs. This actually should be easy.
- Optional: parenthetical information (abbreviation, units, or comments)
- Optional text to include in the box
- Long-term optional network diagramming where outputs from one stage serve as inputs to other stage(s)