The KRMM package provides advanced tools for solving kernel ridge regression within the following mixed model framework:
where
The package offers flexibility in kernel choice, including linear, polynomial, Gaussian, Laplacian, and ANOVA kernels. The RR-BLUP (Random Regression BLUP) or GBLUP (Genomic BLUP) method
is associated with the linear kernel, while the RKHS (Reproducing Kernel Hilbert Space) method
is associated with the other kernels, with the Gaussian kernel set as the default.
The package utilizes the expectation-maximization (EM) algorithm for estimating model components (fixed and random effects) and variance parameters. Additionally, it provides the ability to obtain estimates such as the BLUP of random effects for the covariates of the linear kernel, also known as RR-BLUP of random effects, and BLUP of dual variables within the kernel ridge regression context.
You can install the latest version of the KRMM package with:
install.packages("devtools")
library(devtools)
install_github("ljacquin/KRMM")
Kernel Regression: Solving ridge regression with a variety of kernels.
Mixed Models: Integrating fixed and random effects into the model.
EM Algorithm: Utilizing the EM algorithm for parameter estimation.
BLUP Estimates: Ability to obtain BLUP of dual variables and random predictor effects for the linear kernel (RR-BLUP).
krmm: Main function for fitting the KRMM model.
predict_krmm: Predicting values for new data.
tune_krmm: Tuning the kernel decay rate parameter through K-folds cross-validation.
em_reml_mm: EM algorithm for restricted maximum likelihood estimation (REML) within the mixed model framework.
Here's a simple example illustrating the use of the krmm function:
# load libraries
library(KRMM)
# simulate data
set.seed(123)
p <- 500 # e.g. number of SNP markers
n <- 300 # e.g. number of phenotypes for a trait
gamma <- rnorm(p, mean = 0, sd = 0.5) # e.g. random effects of SNP markers
M <- matrix(runif(p * n, min = 0, max = 1), ncol = p, byrow = T) # matrix of covariates, e.g. genotype matrix
f <- tcrossprod(gamma, M) # data generating process (DGP), e.g. true genetic model
eps <- rnorm(n, mean = 0, sd = 0.1) # add residuals
Y <- f + eps # e.g. measured phenotypes (i.e. genetic values
# + residuals)
# split data into training and test set
n_train <- floor(n * 0.67)
idx_train <- sample(1:n, size = n_train, replace = F)
# get train data
M_train <- M[idx_train, ]
y_train <- Y[idx_train]
# train krmm with linear kernel (i.e. dot product)
linear_krmm_model <- krmm(Y = y_train, Matrix_covariates = M_train, method = "RR-BLUP")
summary(linear_krmm_model)
print(linear_krmm_model$beta_hat)
hist(linear_krmm_model$gamma_hat)
hist(linear_krmm_model$vect_alpha)
# train krmm with non linear gaussian kernel (gaussian is the default kernel for RKHS method)
non_linear_krmm_model <- krmm(Y = y_train, Matrix_covariates = M_train, method = "RKHS")
summary(non_linear_krmm_model)
print(non_linear_krmm_model$beta_hat)
hist(non_linear_krmm_model$vect_alpha)
Here's an example of how to use the predict_krmm function to make predictions on new data:
# get test data from matrix of covariates for prediction
M_test <- M[-idx_train, ]
# get unknown true value generated by DGP we want to predict using predict_krmm
f_test <- f[-idx_train]
# -- prediction with linear kernel
# without fixed effects
f_hat_test <- predict_krmm(linear_krmm_model, Matrix_covariates = M_test)
dev.new()
plot(f_hat_test, f_test, main = "Linear RKHS regression without fixed effects")
cor(f_hat_test, f_test)
# with added fixed effects
f_hat_test <- predict_krmm(linear_krmm_model, Matrix_covariates = M_test, add_fixed_effects = T)
dev.new()
plot(f_hat_test, f_test, main = "Linear RKHS regression with fixed effects added")
cor(f_hat_test, f_test)
# -- prediction with non linear gaussian kernel
# without fixed effects
f_hat_test <- predict_krmm(non_linear_krmm_model, Matrix_covariates = M_test)
dev.new()
plot(f_hat_test, f_test, main = "Gaussian RKHS regression without fixed effects,
and default rate of decay (not optimized)")
cor(f_hat_test, f_test)
# with added fixed effects
f_hat_test <- predict_krmm(non_linear_krmm_model, Matrix_covariates = M_test, add_fixed_effects = T)
dev.new()
plot(f_hat_test, f_test, main = "Gaussian RKHS regression with fixed effects added,
and default rate of decay (not optimized)")
cor(f_hat_test, f_test)
Here's an example of how to use the tune_krmm function for tuning the rate of decay of Gaussian, Laplacian or ANOVA kernels (see documentation for these kernels):
# -- tune krmm model with a gaussian kernel and make predictions for test data
non_linear_opt_krmm_obj <- tune_krmm(
Y = y_train, Matrix_covariates = M_train,
rate_decay_grid = seq(0.01, 0.1, length.out = 5), nb_folds = 3,
method = "RKHS", kernel = "Gaussian",
)
print(non_linear_opt_krmm_obj$optimal_h)
plot(non_linear_opt_krmm_obj$rate_decay_grid,
non_linear_opt_krmm_obj$mean_loss_grid, type = "l")
# get the optimized krmm model
non_linear_opt_krmm_model <- non_linear_opt_krmm_obj$optimized_model
# without fixed effects
f_hat_test <- predict_krmm(non_linear_opt_krmm_model, Matrix_covariates = M_test, add_fixed_effects = F)
dev.new()
plot(f_hat_test, f_test, main = "Gaussian RKHS regression with optimized rate of decay")
cor(f_hat_test, f_test)
# with added fixed effects
f_hat_test <- predict_krmm(non_linear_opt_krmm_model, Matrix_covariates = M_test, add_fixed_effects = T)
dev.new()
plot(f_hat_test, f_test, main = "Gaussian RKHS regression with optimized rate of decay")
cor(f_hat_test, f_test)
- Author: Laval Jacquin
- Maintainer: Laval Jacquin [email protected]
- Jacquin et al. (2016). A Unified and Comprehensible View of Parametric and Kernel Methods for Genomic Prediction with Application to Rice. Front. Genet. 7:145
- Robinson, G. K. (1991). That blup is a good thing: the estimation of random effects. Statistical Science, 534 15-32
- Foulley, J.-L. (2002). Algorithme em: théorie et application au modèle mixte. Journal de la Société française de Statistique 143, 57-109