-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.qmd
266 lines (195 loc) · 8.27 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
---
format: gfm
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
message = FALSE,
warning = FALSE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "80%"
)
```
# ergmito: Exponential Random Graph Models for Small Networks <img src="man/figures/logo.png" align="right" width="180px"/>
<!-- badges: start -->
<!-- [![status](https://tinyverse.netlify.com/badge/ergmito)](https://CRAN.R-project.org/package=ergmito) -->
[![CRAN status](https://www.r-pkg.org/badges/version/ergmito)](https://cran.r-project.org/package=ergmito)
[![Lifecycle: maturing](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)
[![Travis build status](https://travis-ci.org/muriteams/ergmito.svg?branch=master)](https://travis-ci.org/muriteams/ergmito)
[![AppVeyor Build status](https://ci.appveyor.com/api/projects/status/nl1irakr2g6y6w03?svg=true)](https://ci.appveyor.com/project/gvegayon/ergmito)
[![codecov](https://codecov.io/gh/muriteams/ergmito/branch/master/graph/badge.svg)](https://codecov.io/gh/muriteams/ergmito)
![](http://cranlogs.r-pkg.org/badges/grand-total/ergmito)
[![R-CMD-check](https://github.com/muriteams/ergmito/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/muriteams/ergmito/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
This R package (which has been developed on top of the fantastic work that the
[Statnet](https://github.com/statnet) team has done) implements estimation and
simulation methods for Exponential Random Graph Models of small networks, in particular, up to 5 vertices for directed graphs and 7 for undirected networks.
In the case of small networks, the calculation of the likelihood of ERGMs
becomes computationally feasible, which allows us to avoid approximations and
do exact calculations, ultimately obtaining MLEs directly.
Checkout the <a href="#examples">examples section</a>, and specially the <a href="#using-interaction-effects">Using interaction effects</a> example.
## Support
This material is based upon work support by, or in part by, the U.S.
Army Research Laboratory and the U.S. Army Research Office under
grant number W911NF-15-1-0577
Computation for the work described in this paper was supported by
the University of Southern California's Center for High-Performance
Computing (hpcc.usc.edu).
## Citation
```{r echo=FALSE, results='asis'}
citation(package="ergmito")
```
## Installation
The development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("muriteams/ergmito")
```
This requires compilation. Windows users can download the latest compiled version from appveyor [here](https://ci.appveyor.com/project/gvegayon/ergmito/build/artifacts). The file to download is the one named `ergmito_[version number].zip`. Once downloaded, you can install typing the following:
```r
install.packages("[path to the zipfile]/ergmito_[version number].zip", repos = FALSE)
```
In the case of Mac users, and in particular, those with the Mojave version, they may need to install
the following https://github.com/fxcoudert/gfortran-for-macOS/releases
# Examples
## Quick run
In the following example, we simulate a small network with four vertices and estimate the model parameters using `ergm` and `ergmito`. We start by generating the graph
```{r net2}
# Generating a small graph
library(ergmito)
library(ergm)
library(sna)
set.seed(12123)
n <- 4
net <- rbernoulli(n, p = .3)
gplot(net)
```
To estimate the model
```{r model2-ergmito}
model <- net ~ edges + istar(2)
# ERGMito (estimation via MLE)
ans_ergmito <- ergmito(model)
```
```{r model2-ergm}
# ERGM (estimation via MC-MLE)
ans_ergm <- ergm(
model, control = control.ergm(
main.method = "MCMLE",
seed = 444
)
)
# The ergmito should have a larger value
ergm.exact(ans_ergmito$coef, model) > ergm.exact(ans_ergm$coef, model)
summary(ans_ergmito)
summary(ans_ergm)
```
## Estimating data with known parameters
The following example shows the estimation of a dataset included in the package, `fivenets`. This set of five networks was generated using the `new_rergmito` function, which creates a function to draw random ERGMs with a fixed set of parameters, in this case, `edges = -2.0` and `nodematch("female") = 2.0`
```{r fivenets}
data(fivenets)
model1 <- ergmito(fivenets ~ edges + nodematch("female"))
summary(model1) # This data has know parameters equal to -2.0 and 2.0
```
We can also compute GOF
```{r fivenets-gof}
fivenets_gof <- gof_ergmito(model1)
fivenets_gof
plot(fivenets_gof)
```
## Fitting block-diagonal models
The pooled model can be compared to a block-diagonal ERGM. The package includes
three functions to help with this task: `blockdiagonalize`, `splitnetwork`, and
`ergm_blockdiag`.
```{r blockdiag1-dgp}
data("fivenets")
# Stacking matrices together
fivenets_blockdiag <- blockdiagonalize(fivenets, "block_id")
fivenets_blockdiag # It creates the 'block_id' variable
```
```{r blockdiag-ergm1}
# Fitting the model with ERGM
ans0 <- ergm(
fivenets_blockdiag ~ edges + nodematch("female"),
constraints = ~ blockdiag("block_id")
)
```
```{r blockdiag-ergm2}
ans1 <- ergm_blockdiag(fivenets ~ edges + nodematch("female"))
```
```{r blockdiag-ergmito}
# Now with ergmito
ans2 <- ergmito(fivenets ~ edges + nodematch("female"))
# All three are equivalent
cbind(
ergm = coef(ans0),
ergm_blockdiag = coef(ans1),
ergmito = coef(ans2)
)
```
The benefit of ergmito:
```{r blockdiag-benchmark-ergm, eval=FALSE}
t_ergm <- system.time(ergm(
fivenets_blockdiag ~ edges + nodematch("female") + ttriad,
constraints = ~ blockdiag("block_id")
))
t_ergmito <- system.time(
ergmito(fivenets ~ edges + nodematch("female") + ttriad)
)
```
```{r, echo=FALSE}
t_ergm <- structure(c(user.self = 5.236, sys.self = 0.072, elapsed = 5.312,
user.child = 0.00800000000000001, sys.child = 0.02), class = "proc_time")
t_ergmito <-structure(c(user.self = 0.0359999999999996, sys.self = 0.012,
elapsed = 0.0479999999999983, user.child = 0, sys.child = 0), class = "proc_time")
```
```{r blockdiag-benchmark-vs}
# Relative elapsed time
(t_ergm/t_ergmito)[3]
```
## Fitting a large model
Suppose that we have a large sample of small networks (ego from Facebook, Twitter, etc.), 20,000 which account for 80,000 vertices:
```{r large-dgp}
set.seed(123)
bignet <- rbernoulli(sample(3:5, 20000, replace = TRUE))
# Number of vertices
sum(nvertex(bignet))
```
We can fit this model in a memory-efficient way.
```{r large-fit, cache=TRUE}
system.time(ans0 <- ergmito(bignet ~ edges + mutual))
summary(ans0)
```
## Using interaction effects
One advantage of using exact statistics is the fact that we have significantly more flexibility when it comes to specifying sufficient statistics. Just like one would do when working with Generalized Linear Models in R (the `glm` function), users can alter the specified formula by adding arbitrary offsets (using the offset function) or creating new terms by using the "I" function. In this brief example, where we estimate a model that includes networks of size four and five, we will add an interaction effect between the edge-count statistic and the indicator function that equals one if the network is of size 5. This way, while poling the data, we can still obtain different edge-count estimates depending on the number of vertices in the graph.
```{r}
# Simulating networks of different sizes
set.seed(12344)
nets <- rbernoulli(c(rep(4, 10), rep(5, 10)), c(rep(.2, 10), rep(.1, 10)))
```
Fitting an ergmito under the Bernoulli model
```{r}
ans0 <- ergmito(nets ~ edges)
summary(ans0)
```
Fitting the model with a reference term for networks of size five.
Notice that the variable -n- and other graph attributes can be used
with -model_update-.
```{r}
ans1 <- ergmito(nets ~ edges, model_update = ~ I(edges * (n == 5)))
summary(ans1)
```
The resulting parameter for the edge count is smaller for networks
of size five
```{r}
plogis(coef(ans1)[1])
plogis(sum(coef(ans1)))
```
We can see that the difference in edge count matters.
```{r}
library(lmtest)
lrtest(ans0, ans1)
```
# Contributing
The 'ergmito' project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.