Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small updates from teaching Monday and Tuesday #8

Merged
merged 3 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 4 additions & 9 deletions bootstrap.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ The procedure is to simulate `n` response vectors from `m` using the estimated p
and refit `m` to these responses in turn, accumulating the statistics of interest at each iteration.

The parameters of a `LinearMixedModel` object are the fixed-effects
parameters, `β`, the standard deviation, `σ`, of the per-observation noise, and the covariance
parameter, `θ`, that defines the variance-covariance matrices of the random effects.
parameters (`β`), the standard deviation (`σ`), of the per-observation noise, and the covariance
parameter (`θ`), that defines the variance-covariance matrices of the random effects.
A technical description of the covariance parameter can be found in the [MixedModels.jl docs](https://juliastats.org/MixedModels.jl/v4.7.1/optimization/).
Lisa Schwetlick and Daniel Backhaus have provided a more beginner-friendly description of the covariance parameter in the [documentation for MixedModelsSim.jl](https://repsychling.github.io/MixedModelsSim.jl/v0.2.6/simulation_tutorial/).
For today's purposes -- looking at the uncertainty in the estimates from a fitted model -- we can simply use values from the fitted model, but we will revisit the parametric bootstrap as a convenient way to simulate new data, potentially with different parameter values, for power analysis.
Expand Down Expand Up @@ -73,11 +73,6 @@ contrasts = Dict(:spkr => EffectsCoding(),
```

The `EffectsCoding` contrast is used with these to create a ±1 encoding.
Furthermore, `Grouping` contrasts are assigned to the `subj` and `item` factors.
This is not a contrast per-se but an indication that these factors will be used as grouping factors for random effects and, therefore, there is no need to create a contrast matrix.
For large numbers of levels in a grouping factor, an attempt to create a contrast matrix may cause memory overflow.

It is not important in these cases but a good practice in any case.

We can look at an initial fit of moderate complexity:

Expand Down Expand Up @@ -145,11 +140,11 @@ draw(plt; figure=(;supertitle="Parametric bootstrap estimates of variance compon

The bootstrap sample can be used to generate intervals that cover a certain percentage of the bootstrapped values.
We refer to these as "coverage intervals", similar to a confidence interval.
The shortest such intervals, obtained with the `shortestcovint` extractor, correspond to a highest posterior density interval in Bayesian inference.
The shortest such intervals, obtained with the `confint` extractor, correspond to a highest posterior density interval in Bayesian inference.

We generate these for all random and fixed effects:
```{julia}
confint(samp)
confint(samp; method=:shortest)
```

```{julia}
Expand Down
4 changes: 4 additions & 0 deletions live_coding_sessions/bootstrap.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
form = @formula(rt_trunc ~ 1 + spkr * prec * load +
(1 + spkr * prec * load | subj) +
(1 + spkr * prec * load | item))
yolo = fit(MixedModel, form, kb07; contrasts, progress)
48 changes: 48 additions & 0 deletions live_coding_sessions/intro_to_julia.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
using Statistics

v = [1, 2, 3]
for i in v
println(i)
end

using DataFrames
df = DataFrame(;a=[1,2],b=["a","b"])
describe(df)

select(df, :a)
select(df, "a")

select(df, a)
a = "b"
# select(df, a)
transform(df, :a => ByRow(abs2) => :c)
transform(df, :a => ByRow(abs2) => :a)
transform(df, :a => ByRow(abs2); renamecols=false)

combine(groupby(df, :b), :a => mean; renamecols=false)

# column access
df.a

df[2, :]

df[:, :a]

# mutating variants
transform(df, :a => ByRow(abs2) => :c)
# this adds in the column to the original frame
transform!(df, :a => ByRow(abs2) => :c)

function square(x)
return x^2
end

square(x) = x^2 # like lambda in python

function factor_name(x::AbstractString)
return x
end

function factor_name(x)
return string(x)
end
6 changes: 2 additions & 4 deletions shrinkageplot.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ using MixedModelsMakie
using Random
using ProgressMeter

const progress=false
const progress = false
```

Load the kb07 data set (don't tell Reinhold that I used these data).
Expand All @@ -32,8 +32,6 @@ kb07 = MixedModels.dataset(:kb07)

```{julia}
contrasts = Dict(
:subj => Grouping(),
:item => Grouping(),
:spkr => HelmertCoding(),
:prec => HelmertCoding(),
:load => HelmertCoding(),
Expand Down Expand Up @@ -116,7 +114,7 @@ X1 * X1'

## How to interpret a shrinkage plot

- Extreme shrinkage (shrunk to a line or to a point) is easy to interpret - the term is not providing benefit and can be removed.
- Extreme shrinkage (shrunk to a line or to a point) is easy to interpret -- the term is not providing benefit and can be removed.
- When the range of the blue dots (shrunk values) is comparable to those of the red dots (unshrunk) it indicates that the term after shrinkage is about as strong as without shrinkage.
- By itself, this doesn't mean that the term is important. In some ways you need to get a feeling for the absolute magnitude of the random effects in addition to the relative magnitude.
- Small magnitude and small relative magnitude indicate you can drop that term
Expand Down