Skip to content

Commit

Permalink
add description of #67
Browse files Browse the repository at this point in the history
  • Loading branch information
wush978 committed Mar 29, 2015
1 parent 1fbd4aa commit fa3d503
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 2 deletions.
14 changes: 13 additions & 1 deletion R/hashed.model.matrix.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@
#'@param hash.size positive integer. The hash size of feature hashing.
#'@param transpose logical value. Indicating if the transpose should be returned. It affects the space
#'of the returned object when the dimension is imbalanced. Please see the details.
#'@param create.mapping logical value. The indicator of whether storing the hash mapping or not.
#'@param create.mapping logical value. The indicator of whether storing the hash mapping or not.
#'The mapping might miss some interaction terms which involves \code{split}ed features.
#'Please see the details.
#'@param is.dgCMatrix logical value. Indicating if the result is \code{dgCMatrix} or \code{CSCMatrix}
#'@param signed.hash logical value. Indicating if the hashed value is multipled by random sign.
#'This will reduce the impact of collision. Disable it will enhance the speed.
Expand Down Expand Up @@ -47,6 +49,16 @@
#'the \code{\link{dgCMatrix-class}}. Note that the \code{rownames} of the returned \code{dgCMatrix}
#'is \code{character(0)} so the space complexity does not contain the term \eqn{O(m)}.
#'
#'The \code{mapping} created by enabling \code{create.mapping} might miss the interaction term which
#'involves \code{split}ed features. For example, suppose there are two columns \code{a} and \code{b}
#'while the value are 1 and 1,2,3 respectively. The user marks the column \code{b} with
#'\code{split}. If the hashed value of \code{b1} and \code{b2} are collided, then the interaction
#'\code{a1:b1} will not appear in the returned mapping table.
#'
#'This package is originally designed for predictive analysis and the mapping should not play an
#'important role of predictive analysis. If you have a test case and want to ask us to fix it,
#'please provide us a test case in \url{https://github.com/wush978/FeatureHashing/issues/67}.
#'
#'@references
#'H. B. McMahan, G. Holt, D. Sculley, et al. "Ad click
#'prediction: a view from the trenches". In: _The 19th ACM SIGKDD
Expand Down
14 changes: 13 additions & 1 deletion man/hashed.model.matrix.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@ hashed.model.matrix(formula, data, hash.size = 2^18, transpose = FALSE,
\item{transpose}{logical value. Indicating if the transpose should be returned. It affects the space
of the returned object when the dimension is imbalanced. Please see the details.}

\item{create.mapping}{logical value. The indicator of whether storing the hash mapping or not.}
\item{create.mapping}{logical value. The indicator of whether storing the hash mapping or not.
The mapping might miss some interaction terms which involves \code{split}ed features.
Please see the details.}

\item{is.dgCMatrix}{logical value. Indicating if the result is \code{dgCMatrix} or \code{CSCMatrix}}

Expand Down Expand Up @@ -57,6 +59,16 @@ For a \eqn{m \times n} matrix with \eqn{k} non-zero elements, the returned \code
\eqn{O(n) + O(k)} space. For details, please check the documentation of
the \code{\link{dgCMatrix-class}}. Note that the \code{rownames} of the returned \code{dgCMatrix}
is \code{character(0)} so the space complexity does not contain the term \eqn{O(m)}.

The \code{mapping} created by enabling \code{create.mapping} might miss the interaction term which
involves \code{split}ed features. For example, suppose there are two columns \code{a} and \code{b}
while the value are 1 and 1,2,3 respectively. The user marks the column \code{b} with
\code{split}. If the hashed value of \code{b1} and \code{b2} are collided, then the interaction
\code{a1:b1} will not appear in the returned mapping table.

This package is originally designed for predictive analysis and the mapping should not play an
important role of predictive analysis. If you have a test case and want to ask us to fix it,
please provide us a test case in \url{https://github.com/wush978/FeatureHashing/issues/67}.
}
\examples{
# The following scripts show how to fit a logistic regression
Expand Down

0 comments on commit fa3d503

Please sign in to comment.