-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathreadme.Rmd
63 lines (41 loc) · 3.01 KB
/
readme.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
title: "Nearest-neighbor Projected-Distance Regression (NPDR)"
output: github_document
---
## tldr
NPDR is a nearest-neighbor feature selection algorithm that fits a generalized linear model for _projected distances_ of a given attribute over all pairs of instances in a neighborhood.
In the NPDR model, the predictor is the attribute distance between neighbors projected onto the attribute dimension, and the outcome is the projected phenotype distance (for quantitative traits) or hit/miss (for case/control) between all pairs of nearest neighbor instances.
NPDR can fit any combination of predictor data types (categorical or numeric) and outcome data types (case-control or quantitative) as well as adjust for covariates that may be confounding.
As with [STIR](https://insilico.github.io/stir/) (STatistical Inference Relief), NDPR allows for the calculation of statistical significance of importance scores and adjustment for multiple testing.
## Install
You can install the development version from GitHub with remotes:
```{r eval=FALSE}
# install.packages("remotes") # uncomment to install remotes
remotes::install_github("insilico/npdr")
library(npdr)
# data(package = "npdr")
```
### Dependencies
To set `fast.reg = TRUE` or `fast.dist = TRUE` or `use.glmnet = TRUE`, please install the `speedglm` and `glmnet` packages:
```{r eval=FALSE}
install.packages(c("speedglm", "wordspace", "glmnet"))
```
If an issue arises with updating `openssl`, try updating it on your own system, e.g. for MacOS `brew install [email protected]`.
<!-- Old issues with Rcpp and Rcpp Armadillo -->
<!-- If you still have trouble installing Rcpp and RcppArmadillo, please make sure you have `fortran` installed. -->
<!-- Also, you may [need](https://gallery.rcpp.org/articles/first-steps-with-C++11/) to explicitly enable C++11 support which we can do here from R: -->
<!-- ```{r eval=FALSE} -->
<!-- Sys.setenv("PKG_CXXFLAGS"="-std=c++11") -->
<!-- ``` -->
## Details
Relief-based methods are nearest-neighbor machine learning feature selection algorithms that compute the importance of attributes that may involve interactions in high-dimensional data.
Previously we introduced STIR, which extended Relief-based methods to compute statistical significance of attributes in case-control data by reformulating the Relief score as a pseudo t-test.
Here we extend the statistical formalism of STIR to a generalized linear model (glm) formalism to handle quantitative and case-control outcome variables, any predictor data type (continuous or categorical), and adjust for covariates while computing statistical significance of attributes.
## Contact
## Websites
- [insilico Github Organization](https://github.com/insilico)
- [insilico McKinney Lab](http://insilico.utulsa.edu/)
## Related references
- [2017 STIR paper in Bioinformatics](https://doi.org/10.1093/bioinformatics/bty788)
- [2013 Gene-Wise Adaptive-Neighbors paper in PLoS One](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0081527)