Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Invalid parameter format for num_class expect int but value = 'NA' #55

Open
yongfanbeta opened this issue Jan 19, 2018 · 7 comments

Comments

@yongfanbeta
Copy link

hello,

When I use MIBayesOpt to optimize xgboost model to solve a linear regression problem like predict house price, I choose objectfun = "reg:linear , this is not a classification problem means no classes parameter, but it seems i have to give a num_class?

hope for u reply!

@shakfu
Copy link

shakfu commented May 26, 2018

I had exactly the same problem. Is linear regression not supported in this case?

@ymattu
Copy link
Owner

ymattu commented May 28, 2018

@VictorFY @shakfu

Thank you for using MlBayesOpt. I had the same error in this package.
For now, it's a bag of this package, so I will fix it in the next version.

I'm very sorry...
Please wait for some time until I fix it, or I welcome your PULL REQUEST.

@Edward-Aidi
Copy link

I also encountered the same problem! look forward to your next updates! Thanks!

Error in xgb.iter.update(fd$bst, fd$dtrain, iteration - 1, obj) :
Invalid Parameter format for num_class expect int but value='NA'
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Timing stopped at: 11.87 12.9 26.54

@SimonTopp
Copy link

Thanks for the great package! Any update on this issue or workarounds for running xgb_opt with reg:linear?

@TwZhou0
Copy link

TwZhou0 commented Apr 2, 2019

I also encountered the same problem! look forward to your next updates! Thanks!

Error in xgb.iter.update(fd$bst, fd$dtrain, iteration - 1, obj) :
Invalid Parameter format for num_class expect int but value='NA'
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Timing stopped at: 11.87 12.9 26.54

I also encountered the same problem when I tried fitting a regression model. Have you figured out how to fix it?

@carolinart
Copy link

Still remains the same problem for reg:linear.

@msmith01
Copy link

You need to comment out the num_class = num_classes in the "#about classes" else statement of the xgb_cv_opt function . The statement goes:

  if (grepl("logi", objectfun) == TRUE){
    xgb_cv <- function(object_fun,
                       eval_met,
                       num_classes,

So if the objective function is binary:logistic then it correctly uses the num_classes object. However when the function does not correspond to logi or binary:logistic it uses the else part which also contains the num_classes object and reg:linear doesn't use the num_classes object.

The num_classes object appears in both the if and the else part of the code. I pushed a git request to highlight the issue of where the error is occuring. However, I still get a warning message on a unrelated issue.

Running the following should solve the issue (however I have only checked it on the iris data set):

xgb_cv_opt <- function(data,
                       label,
                       objectfun,
                       evalmetric,
                       n_folds,
                       eta_range = c(0.1, 1L),
                       max_depth_range = c(4L, 6L),
                       nrounds_range = c(70, 160L),
                       subsample_range = c(0.1, 1L),
                       bytree_range = c(0.4, 1L),
                       init_points = 4,
                       n_iter = 10,
                       acq = "ei",
                       kappa = 2.576,
                       eps = 0.0,
                       optkernel = list(type = "exponential", power = 2),
                       classes = NULL,
                       seed = 0
)
{
  if(class(data)[1] == "dgCMatrix")
  {dtrain <- xgb.DMatrix(data,
                         label = label)
  xg_watchlist <- list(msr = dtrain)

  cv_folds <- KFold(label, nfolds = n_folds,
                    stratified = TRUE, seed = seed)
  }
  else
  {
    quolabel <- enquo(label)
    datalabel <- (data %>% select(!! quolabel))[[1]]

    mx <- sparse.model.matrix(datalabel ~ ., data)

    if (class(datalabel) == "factor"){
      dtrain <- xgb.DMatrix(mx, label = as.integer(datalabel) - 1)
    } else{
      dtrain <- xgb.DMatrix(mx, label = datalabel)
      }

    xg_watchlist <- list(msr = dtrain)

    cv_folds <- KFold(datalabel, nfolds = n_folds,
                      stratified = TRUE, seed = seed)
  }

  #about classes
  if (grepl("logi", objectfun) == TRUE){
    xgb_cv <- function(object_fun,
                       eval_met,
                       num_classes,
                       eta_opt,
                       max_depth_opt,
                       nrounds_opt,
                       subsample_opt,
                       bytree_opt) {

      object_fun <- objectfun
      eval_met <- evalmetric

      cv <- xgb.cv(params = list(booster = "gbtree",
                                 nthread = 1,
                                 objective = object_fun,
                                 eval_metric = eval_met,
                                 eta = eta_opt,
                                 max_depth = max_depth_opt,
                                 subsample = subsample_opt,
                                 colsample_bytree = bytree_opt,
                                 lambda = 1, alpha = 0),
                   data = dtrain, folds = cv_folds,
                   watchlist = xg_watchlist,
                   prediction = TRUE, showsd = TRUE,
                   early_stopping_rounds = 5, maximize = TRUE, verbose = 0,
                   nrounds = nrounds_opt)

      if (eval_met %in% c("auc", "ndcg", "map")) {
        s <- max(cv$evaluation_log[, 4])
      } else {
        s <- max(-(cv$evaluation_log[, 4]))
      }
      list(Score = s,
           Pred = cv$pred)
    }
  } else{
    xgb_cv <- function(object_fun,
                       eval_met,
                       num_classes,
                       eta_opt,
                       max_depth_opt,
                       nrounds_opt,
                       subsample_opt,
                       bytree_opt) {

      object_fun <- objectfun
      eval_met <- evalmetric

      num_classes <- classes

      cv <- xgb.cv(params = list(booster = "gbtree",
                                 nthread = 1,
                                 objective = object_fun,
                                 #num_class = num_classes,
                                 eval_metric = eval_met,
                                 eta = eta_opt,
                                 max_depth = max_depth_opt,
                                 subsample = subsample_opt,
                                 colsample_bytree = bytree_opt,
                                 lambda = 1, alpha = 0),
                   data = dtrain, folds = cv_folds,
                   watchlist = xg_watchlist,
                   prediction = TRUE, showsd = TRUE,
                   early_stopping_rounds = 5, maximize = TRUE, verbose = 0,
                   nrounds = nrounds_opt)

      if (eval_met %in% c("auc", "ndcg", "map")) {
        s <- max(cv$evaluation_log[, 4])
      } else {
        s <- max(-(cv$evaluation_log[, 4]))
      }
      list(Score = s,
           Pred = cv$pred)
    }
  }

  opt_res <- BayesianOptimization(xgb_cv,
                                  bounds = list(eta_opt = eta_range,
                                                max_depth_opt = max_depth_range,
                                                nrounds_opt = nrounds_range,
                                                subsample_opt = subsample_range,
                                                bytree_opt = bytree_range),
                                  init_points,
                                  init_grid_dt = NULL,
                                  n_iter,
                                  acq,
                                  kappa,
                                  eps,
                                  optkernel,
                                  verbose = TRUE)

  return(opt_res)

}


library(MlBayesOpt)
library(dplyr)
library(Matrix)
library(xgboost)
library(rBayesianOptimization)
df <- iris
label_Species <- iris$Species
xgb_cv_opt(data = df,
           label = label_Species,
           objectfun = "reg:linear", evalmetric = "rmse", n_folds = 2, eta_range = c(0.1, 1L),
           max_depth_range = c(4L, 6L), nrounds_range = c(70, 160L),
           subsample_range = c(0.1, 1L), bytree_range = c(0.4, 1L),
           init_points = 4, n_iter = 10, acq = "ucb", kappa = 2.576, eps = 0,
           optkernel = list(type = "exponential", power = 2), classes = NULL,
           seed = 0)

I get the following warning message:

Warning messages:
1: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
  data length [15] is not a sub-multiple or multiple of the number of rows [8]
2: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
  data length [43] is not a sub-multiple or multiple of the number of rows [22]
3: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
  data length [109] is not a sub-multiple or multiple of the number of rows [55]
4: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
  data length [107] is not a sub-multiple or multiple of the number of rows [54]
5: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
  data length [133] is not a sub-multiple or multiple of the number of rows [67]

Which I have located to this part of the code:

    cv_folds <- KFold(datalabel, nfolds = n_folds,
                      stratified = TRUE, seed = seed)

I had this solved but lost the unsaved changes when I changed project in R. If I recall correctly I set the datalabel or label to a new numeric or Matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants