Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue with large dataset #71

Open
bblodfon opened this issue Jan 16, 2025 · 3 comments
Open

Memory issue with large dataset #71

bblodfon opened this issue Jan 16, 2025 · 3 comments

Comments

@bblodfon
Copy link

bblodfon commented Jan 16, 2025

Hi Byron,

Could you please take a look at this example? I had come across similar issues 1 year ago:

library(mlr3proba)
#> Loading required package: mlr3
library(mlr3extralearners)
library(mlr3pipelines)

task = readRDS(file = gzcon(url('https://github.com/bblodfon/pdac-efs-bench2024/blob/main/data/wissel2023/gex_task.rds?raw=True')))
task$n_features
#> [1] 19870
task$nrow
#> [1] 81

learner = lrn("surv.aorsf", n_tree = 500, control_type = "fast", importance = "permute")
learner$train(task)
#> Error: protect(): protection stack overflow

Created on 2025-01-16 with reprex v2.1.1

With half features (~10K with the higher variance) it seems to work - still very high-dim settings so I could run with that, but maybe worth checking out?

@bcjaeger
Copy link
Collaborator

Thanks, John! This is definitely worth checking out and I'll get to it as soon as I can.

@bcjaeger
Copy link
Collaborator

It looks like this memory issue is caused by my use of stats::terms(x = self$formula, data = self$data) in cases where the formula is too long. I will work on getting a fix in place. The main reason I use the terms function is to get the names of all the relevant predictor variables in cases where a user formula has used the . shortcut to capture all variables in a dataframe, e.g., outcome ~ ., but there is probably a more direct way to do this.

@bblodfon
Copy link
Author

Nice! You can't imagine how many issues the use of formula has created in my ML R development life :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants