multinomial.Rmd

# Multinomial Regression

For more detail on these types of models, see [my document](https://m-clark.github.io/docs/logregmodels.html).  In general we can use
multinomial models for multi-category target variables, or more generally,
multi-count data.

## Standard (Categorical) Model

### Data Setup

First, lets get some data. 200 entering high school students make program
choices: general program, vocational program, and academic program. We will be
interested in their choice, using their writing score as a proxy for scholastic
ability and their socioeconomic status, a categorical variable of low, middle,
and high values.

```{r multinomial-setup}
library(haven)
library(tidyverse)
library(mlogit)

program = read_dta("https://stats.idre.ucla.edu/stat/data/hsbdemo.dta") %>%
  as_factor() %>%
  mutate(prog = relevel(prog, ref = "academic"))

head(program[, 1:5])


# convert to long form for mlogit
program_long = program %>%
  select(id, prog, ses, write) %>%
  mlogit.data(
    data = ,
    shape = 'wide',
    choice = 'prog',
    id.var = 'id'
  )

head(program_long)
```

We go ahead and run a model via <span class="func" style = "">mlogit</span> for later comparison.

```{r mlogit}
fit_mlogit   = mlogit(prog ~ 1| write + ses, data = program_long)
mlogit_coefs = coef(fit_mlogit)[c(1,5,7,3,2,6,8,4)]  # reorder
```


### Function

Multinomial model via maximum likelihood

```{r multinom_ml-func}
multinom_ml <- function(par, X, y) {
  levs = levels(y)
  ref  = levs[1]                # reference level (category label 1)
  
  y0 = y == ref
  y1 = y == levs[2]             # category 2
  y2 = y == levs[3]             # category 3
  
  beta = matrix(par, ncol = 2)
  
  # more like mnlogit package depiction in its function
  # V1 = X %*% beta[ ,1]
  # V2 = X %*% beta[ ,2]
  # ll = sum(-log(1 + exp(V1) + exp(V2))) + sum(V1[y1], V2[y2])
  
  V = X %*% beta                           # a vectorized approach
  baseProbVec = 1 / (1 + rowSums(exp(V)))  # reference group probabilities
  
  loglik = sum(log(baseProbVec))  + crossprod(c(V), c(y1, y2))
  
  loglik
}
```


```{r multinom-est}
fit = optim(
  runif(8,-.1, .1),
  multinom_ml,
  X = model.matrix(prog ~ ses + write, data = program),
  y = program$prog,
  control = list(
    maxit   = 1000,
    reltol  = 1e-12,
    ndeps   = rep(1e-8, 8),
    trace   = TRUE,
    fnscale = -1,
    type    = 3
  ),
  method = 'BFGS'
)

# fit$par
```


### Comparison

An initial comparison.

```{r multinom-compare-1, echo=FALSE}
cbind(fit_coefs = fit$par, mlogit_coefs) %>% 
  kable_df(4)
```


The following uses <span class="func" style = "">dmultinom</span> for the likelihood, similar to other modeling demonstrations in this document. 

```{r dmultinom-setup}
X = model.matrix(prog ~ ses + write, data = program)
y = program$prog
pars = matrix(fit$par, ncol = 2)
V = X %*% pars
acadprob   = 1 / (1+rowSums(exp(V)))
fitnonacad = exp(V) * matrix(rep(acadprob, 2), ncol = 2)
fits = cbind(acadprob, fitnonacad)
yind = model.matrix( ~ -1 + prog, data = program)
```


```{r multinom-ll}
# because dmultinom can't take matrix for prob
ll = 0

for (i in 1:200){
  ll = ll + dmultinom(yind[i, ], size = 1, prob = fits[i, ], log  = TRUE)
}

ll

fit$value

logLik(fit_mlogit)
```


## Alternative specific and constant variables

Now we add *alternative specific* and *alternative constant* variables to the
previous *individual specific* covariates.. In this example, `price` is
alternative invariant (`Z`) `income` is individual/alternative specific (`X`),
and `catch` is alternative specific (`Y`).

We can use the `fish` data from the <span class="pack" style = "">mnlogit</span>
package.

```{r mlogit-fish-data}
library(mnlogit)  # note we are now using mnlogit

data(Fish)
head(Fish)

fm  = formula(mode ~ price | income | catch)

fit_mnlogit = mnlogit(fm, Fish)
# fit_mnlogit = mlogit(fm, Fish)
# summary(fit_mnlogit)
```

The likelihood function.

```{r multinom_ml}
multinom_ml <- function(par, X, Y, Z, respVec, choice) {

  # Args-
  # X dim nrow(Fish)/K x p + 1 (intercept)
  # Z, Y nrow(N); Y has alt specific coefs; then for Z ref group dropped so nrow = nrow*(K-1)/K
  # for ll everything through previous X the same
  # then calc probmat for Y and Z, add to X probmat, and add to base
  
  N = sum(choice)
  K = length(unique(respVec))
  levs = levels(respVec)
  
  xpar = matrix(par[1:6],  ncol = K-1)
  ypar = matrix(par[7:10], ncol = K)
  zpar = matrix(par[length(par)], ncol = 1)
  
  # Calc X
  Vx  = X %*% xpar
  
  # Calc Y (mnlogit finds N x 1 results by going through 1:N, N+1:N*2 etc; then
  # makes 1 vector, then subtracts the first 1:N from whole vector, then makes
  # Nxk-1 matrix with N+1:end values (as 1:N are just zero)); creating the
  # vector and rebuilding the matrix is unnecessary though
  Vy = sapply(1:K, function(alt) 
    Y[respVec == levs[alt], , drop = FALSE] %*% ypar[alt])
  
  Vy = Vy[,-1] - Vy[,1]
  
  # Calc Z
  Vz = Z %*% zpar
  Vz = matrix(Vz, ncol = 3)
  
  # all Vs must fit into N x K -1 matrix where N is nobs (i.e. individuals)
  V = Vx + Vy + Vz
  
  ll0 = crossprod(c(V), choice[-(1:N)])
  baseProbVec <- 1 / (1 + rowSums(exp(V)))
  loglik = sum(log(baseProbVec)) + ll0
  
  loglik
  
  # note fitted values via
  # fitnonref = exp(V) * matrix(rep(baseProbVec, K-1), ncol = K-1)
  # fitref = 1-rowSums(fitnonref)
  # fits = cbind(fitref, fitnonref)
}
```


```{r multinom2-initialize}
inits = runif(11, -.1, .1)
mdat  = mnlogit(fm, Fish)$model  # this data already ordered!
```

As `X` has a constant value across alternatives, the coefficients regard the selection of the alternative relative to reference.

```{r multinom2-X-model-matrix}
X = cbind(1, mdat[mdat$`_Alt_Indx_` == 'beach', 'income'])
dim(X)
head(X)
```


`Y` will use the complete data to start.  Coefficients will be differences from the reference alternative coefficient.

```{r multinom2-Y-model-matrix}
Y = as.matrix(mdat[, 'catch', drop = FALSE])
dim(Y)
```


`Z` are difference scores from reference group.

```{r multinom2-Z-model-matrix}
Z = as.matrix(mdat[mdat$`_Alt_Indx_` != 'beach', 'price', drop = FALSE])
Z = Z - mdat[mdat$`_Alt_Indx_` == 'beach', 'price']
dim(Z)

respVec = mdat$`_Alt_Indx_` # first 10 should be 0 0 1 0 1 0 0 0 1 1 after beach dropped
```

```{r multinom2-est}
multinom_ml(inits, X, Y, Z, respVec, choice = mdat$mode)

fit = optim(
  par = rep(0, 11),
  multinom_ml,
  X = X,
  Y = Y,
  Z = Z,
  respVec = respVec,
  choice  = mdat$mode,
  control = list(
    maxit   = 1000,
    reltol  = 1e-12,
    ndeps   = rep(1e-8, 11),
    trace   = TRUE,
    fnscale = -1,
    type    = 3
  ),
  method = 'BFGS'
)
```


### Comparison

Compare fits.

```{r multinom2-compare, echo=FALSE}
bind_cols(fit_coefs = fit$par, mnlogit_coefs = coef(fit_mnlogit)) %>% 
  kable_df()

bind_cols(fit_ll = fit$value, mnlogit_ll = logLik(fit_mnlogit)) %>% 
  kable_df()
```


## Source

Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/multinomial.R